Python Pandas 1

Introduction to Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures and functions needed to work with structured data seamlessly.

Key Features

  • Data structures for handling structured data
  • Data alignment and integrated handling of missing data
  • Flexible reshaping and pivoting of datasets
  • Intelligent label-based slicing and indexing

Pandas Series

A Series is a one-dimensional labeled array capable of holding any data type.

import pandas as pd

# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)

# Series with custom index
s2 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s2)

# From dictionary
data = {'x': 100, 'y': 200, 'z': 300}
s3 = pd.Series(data)
print(s3)

DataFrames

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Tokyo']
}

df = pd.DataFrame(data)
print(df)

# DataFrame info
print(df.info())
print(df.describe())

Basic Operations

Learn essential operations for data manipulation and analysis.

# Viewing data
print(df.head())    # First 5 rows
print(df.tail())    # Last 5 rows
print(df.shape)     # Dimensions
print(df.columns)   # Column names

# Adding new column
df['Salary'] = [50000, 60000, 70000]

# Dropping column
df_new = df.drop('City', axis=1)
print(df_new)

Indexing & Selection

Various methods to select and filter data from DataFrames.

# Selecting columns
print(df['Name'])           # Single column
print(df[['Name', 'Age']])  # Multiple columns

# Selecting rows
print(df.loc[0])            # By label
print(df.iloc[0])           # By position

# Conditional selection
young = df[df['Age'] < 30]
print(young)