Python Pandas 1
Introduction to Pandas
Pandas is a powerful Python library for data manipulation and analysis. It provides data structures and functions needed to work with structured data seamlessly.
Key Features
- Data structures for handling structured data
- Data alignment and integrated handling of missing data
- Flexible reshaping and pivoting of datasets
- Intelligent label-based slicing and indexing
Pandas Series
A Series is a one-dimensional labeled array capable of holding any data type.
import pandas as pd
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)
# Series with custom index
s2 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s2)
# From dictionary
data = {'x': 100, 'y': 200, 'z': 300}
s3 = pd.Series(data)
print(s3)DataFrames
A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Tokyo']
}
df = pd.DataFrame(data)
print(df)
# DataFrame info
print(df.info())
print(df.describe())Basic Operations
Learn essential operations for data manipulation and analysis.
# Viewing data
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.shape) # Dimensions
print(df.columns) # Column names
# Adding new column
df['Salary'] = [50000, 60000, 70000]
# Dropping column
df_new = df.drop('City', axis=1)
print(df_new)Indexing & Selection
Various methods to select and filter data from DataFrames.
# Selecting columns
print(df['Name']) # Single column
print(df[['Name', 'Age']]) # Multiple columns
# Selecting rows
print(df.loc[0]) # By label
print(df.iloc[0]) # By position
# Conditional selection
young = df[df['Age'] < 30]
print(young)