From Pandas to Polars: A Migration Guide
Polars is fast. Really fast. Here's how to migrate without losing your mind.
Why Polars?
Benchmark on 10M rows:
| Operation | Pandas | Polars | Speedup |
|---|---|---|---|
| groupby-agg | 2.3s | 0.18s | 12.8x |
| join | 1.8s | 0.21s | 8.6x |
| filter | 0.9s | 0.05s | 18x |
Plus: Better memory usage, lazy evaluation, proper null handling.
Key Differences
1. Expressions, Not Methods
python# Pandas df["new_col"] = df["old_col"].str.lower() # Polars df = df.with_columns( pl.col("old_col").str.to_lowercase().alias("new_col") )
2. Lazy Evaluation
python# Build a query plan, execute once result = ( pl.scan_parquet("data/*.parquet") .filter(pl.col("value") > 100) .group_by("category") .agg(pl.col("value").mean()) .collect() # Execute here )
3. No Index
Polars doesn't have an index. Use regular columns:
python# Pandas df.loc["2024-01-01"] # Polars df.filter(pl.col("date") == "2024-01-01")
Common Patterns
Reading Data
python# Pandas df = pd.read_csv("data.csv") # Polars (eager) df = pl.read_csv("data.csv") # Polars (lazy - recommended for large files) df = pl.scan_csv("data.csv")
Groupby Operations
python# Pandas df.groupby("category").agg({"value": ["mean", "std"]}) # Polars df.group_by("category").agg( pl.col("value").mean().alias("value_mean"), pl.col("value").std().alias("value_std") )
Window Functions
python# Pandas df["rolling_mean"] = df.groupby("id")["value"].transform( lambda x: x.rolling(7).mean() ) # Polars df = df.with_columns( pl.col("value") .rolling_mean(window_size=7) .over("id") .alias("rolling_mean") )
When to Stick with Pandas
- Small data (<100K rows): Overhead not worth it
- Heavy scipy/sklearn integration: Better Pandas support
- Quick exploration: Pandas is more forgiving
Migration Strategy
- Start with data loading and ETL pipelines
- Keep Pandas for notebooks/exploration
- Convert to Polars for production code
- Use
df.to_pandas()when needed
The performance gains are real.