In case you’ve spent any time with Python for information science or machine studying, you’ve most likely met Pandas. And let’s be actual — it’s sort of a love-hate relationship. Pandas is insanely highly effective when you understand how to make use of it, however attending to that degree? It may well really feel like fixing a Rubik’s dice in the dead of night.
This submit is for you if
- You’ve completed some fundamental EDA
- You need to go past that and actually unlock what Pandas can do
Let’s stroll via a few of the most helpful superior features in Pandas, so you may cease Googling the identical 5 StackOverflow solutions and begin slicing and dicing you information like a professional.
Fast EDA Refresher
Earlier than we dive in, right here’s a fast recap of fundamental EDA instruments in Pandas:
df.head() # First 5 Rows
df.tail() # Final 5 Rows
df.data() # Column Sorts + Non-Nul Counts
df.describe() # Abstract Stats for Numeric Columns
df.columns # Column Names
df.form # (Rows, Columns)
.apply() — When You Want Customized Logic
.apply() permits you to run a customized operate throughout a Collection or DataFrame. Consider it as Python’s for loop in disguise.
df["Column_Name"].apply(lambda x: x * 2)
Use Case: Normalise a column, extract a substring or flag values
Tip: It’s not at all times the quickest. Strive vectorised operations (like df[“col”] * 2) when potential
.map() — Made For Collection, Nice For Mapping
.map() is much like .apply() however works solely on Collection. It’s good for mapping or changing values.
df["Gender"].map({'M': "Male", 'F': "Feminine"})
Use Case: Clear categorical values or apply easy transformations quick.
.exchange() — Lik Discover & Substitute in Excel
Swap out values throughout your DataFrame or Collection in a single clear line.
df["Status"].exchange(['P', 'F'], ["Pass", "Fail"])
Use Case: Standardise values, repair typos or make your information extra readable.
.soften() — Flip Vast Information Into Tidy Lengthy Format
.soften() is the key weapon for remodeling vast datasets into tidy, long-form ones.
df.soften(
id_vars=["Name"],
value_vars=["Math", "Science", "English"],
var_name="Topic",
value_name="Rating"
)
Use Case: Bought columns which could be stacked into one? Use .soften()
.pivot() & .pivot_table() — Reshape Like A Professional
Each reshape information however with a key distinction:
- pivot() fails with duplicate entries
- pivot_table() handles duplicates with aggregation
df.pivot(index="id", columns="month", values="gross sales")df.pivot_table(index="area", columns="month", values="gross sales", aggfunc="sum")
Use Case: Create spreadsheet-style summaries from lengthy information.
.groupby() + .agg() — Group & Summarise
This combo is the way you do severe information summarising:
df.groupby("division")["salary"].agg(["mean", "max", "count"])
Use Case: Common gross sales per retailer? Depend of customers per nation? That is it
Tip: You’ll be able to group by a number of columns too.
.remodel() — Add Calculations With out Altering Form
In contrast to .agg(), .remodel() returns a Collection of the identical form. Tremendous helpful once you need to maintain your DataFrame construction.
df["Team Avg"] = df.groupby("Workforce")["Salary"].remodel("imply")
Use Case: Add group-level z-scores, min-max scaling or working averages
.filter() — Preserve Solely What You Want
.filter() permits you to maintain particular rows or columns primarily based on customized logic.
df.groupby("Workforce").filter(lambda x: x["Salary"].imply() > 80)
Use Case: Drop noisy columns to focus solely on high-performing teams.