zekeLabs
Pandas for
Data Wrangling &
Statistical Modeling
Learning made Simpler !
www.zekeLabs.com
Agenda
● Introduction to Pandas
● Data Wrangling with Pandas
● Plotting & Visualization
● Statistical Data Modeling
Introduction
to
Pandas
● Introduction to Pandas
● Series and DataFrame objects
● Importing data
● Indexing, data selection and subsetting
● Hierarchical indexing
● Reading and writing files
● Date/time types
● String Operations
● Missing data
● Data summarization
Introduction to Pandas
● Open Source, High Performance, Easy-to-use data structure.
● Library for Data munging, preparation, analysis & modeling.
● Alternative to excel sheet
● Handle Time Series data
● Reads from different data formats
● Mutable in contents & size
● IO Tools to load from flat files, HDF5 etc.
Series
● Single vector of data like NumPy but with index
● Imagine series as one column of table
Series Access
DataFrames
● Data Structure to store, view, manipulate multivariate data
● Tabular data structure
● Series represent univariate data
● Combine different series and create a dataframe
DataFrames
● Data Structure to store, view, manipulate multivariate data
● Tabular data structure
● Series represent univariate data
● Combine different series and create a dataframe
DataFrames - Creation from Series
DataFrames - Creation
Importing data
● Data should be loaded before anything could be done on it.
Reading Data
Reading Large Data
Exploring Data
Exploring Data -2
Exploring Data -3
Exploring Data -3
Access Columns
Access Rows By Index & Index-Values
Access Rows By Index & Index-Values
Filtering Rows
Missing Values
Missing Values during load
Handling Missing Values
● Datasets in real world will have missing values
● If only few values of a column is present, we might drop the entire column
● If only few rows are missing, we might drop the rows
● We can’t afford to dropping lot of rows, it like reducing the dataset.
Handling Missing Values
● Fillna - Filling missing values
Handling Missing Values
● dropna - dropping based on missing values
Handling Missing Values
● replace - Replace values given in ‘to_replace’ with ‘value’.
Duplicate Finding
Duplicate Dropping
JSON Normalizing
● Handling semi-structured data
Working with Text Data
● Series with string data have ‘.str’ as a module.
● Inside .str we have many string utility functions.
Working with Text Data - 2
Working with Text Data - 3
Handling DateTime data
● Time series data appears very often in datasets
● Loading columns as datetime when loading file
● Parsing Datetimes
● Display datetime with time zones
● Rounding datetimes
● Filtering data between certain datetime
● Creating ranges
Loading date
Parsing DateTime
Parsing DateTime - 2
Datetime with timezone
Datetime Rounding
Datetime Offset
Datetime Filter
Data Wrangling
with
Pandas
● Reshaping DataFrame objects
● Pivoting
● Alignment
● Data aggregation and GroupBy operations
● Merging and joining DataFrame objects
GroupBy : split-apply-combine
Splitting data into groups
based on certain criteria
Applying a function to
each group independently
● Aggregate
● Transform
● Filtering
Combining the results into a
data structure
GroupBy : splitting an object into groups
Concatenate
Concatenate
Database-style DataFrame joining/merging
Database-style DataFrame joining/merging
Join By Indexes
Pivot
Stacking
Unstacking
Unstacking
Unstacking
Melt
Pivot Table
Computation
Splitting data into groups
based on certain criteria
Applying a function to
each group independently
● Aggregate
● Transform
● Filtering
Combining the results into a
data structure
Computation Tool
● Statistical Functions
● Window Functions
● Aggregations
Plotting
&
Visualization
● Introduction of Matplotlib
● Time series plots
● Grouped plots
● Scatterplots
● Histograms
● Box-plot
● Pie Charts
I
Statistical
Data Modeling
● Fitting data to probability distributions
● Linear models
● Spline models
● Time series analysis
● Bayesian models
Thank You !!!
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Pandas