Pandas

Pandas
•Open-source Python Library
•Data manipulation and analysis tool using its powerful data structures.
•Wes McKinney -2008
•Pandas will extract the data from that CSV into :
•DataFrame
•Table
•Perform following statistical operations on it like :
•What's the average, median, max, or min of each column?
•Does column A correlate with column B?
•What does the distribution of data in column C look like?
•Clean the data by doing things like removing missing values and
filtering rows or columns by some criteria
•Visualize the data with help from Matplotlib. Plot bars, lines,
histograms, bubbles, and more.
•Store the cleaned, transformed data back into a CSV, other file or
database

Pandas
• From command prompt
•pip install pandas
• From Jupyter notebook
• !pip install pandas’
•Then in notebook to use pandas..need to create instance of it
• import pandas as pd

Pandas
•Pandas deals with the following three data structures −
•Series
•DataFrame
•Panel
Data Structure Description
Series 1D labeled homogeneous array, size immutable.
Data Frames 2D - labeled, size-mutable tabular structure with potentially
heterogeneously typed columns.
Panel 3D labeled, size-mutable array.

Pandas
Series
Homogeneous data
Size Immutable
Values of Data Mutable
DataFrame is a two-dimensional array with heterogeneous data.
For example,
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 2.40
The data is represented in rows and columns. Each column represents an attribute
and each row represents a person.
Data Type of Columns
The data types of the four columns are as follows −
Column Type
Name String
Age Integer
Gender String
Rating Float

Pandas
Data Frame :
•Key Points
•Heterogeneous data
•Size Mutable
•Data Mutable
•Panel
Panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But a panel can be illustrated as a
container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
Data Mutable

Pandas
•Series is a one-dimensional labeled array capable of holding data of any type (integer,
string, float, python objects, etc.).
•The axis labels are collectively called index.
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
Sr.No Parameter & Description
1 Data : data takes various forms like ndarray, list, constants
2 Index : Index values must be unique and hashable, same length as
data. Default np.arrange(n) if no index is passed.
3 Dtype : dtype is for data type. If None, data type will be inferred
4 Copy : Copy data. Default False

Pandas
A series can be created using various inputs like −
Array
Dict
Scalar value or constant
Create a Series from ndarray
If data is an ndarray, then index passed must be of the same length. If no index is
passed, then by default index will be range(n) where n is array length, i.e.,
[0,1,2,3…. range(len(array))-1].

Pandas
•A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns.
•Features of DataFrame
•Potentially columns are of different types
•Size – Mutable
•Labeled axes (rows and columns)
•Can Perform Arithmetic operations on rows and columns

Pandas
•pandas.DataFrame
•A pandas DataFrame can be created using the following constructor −
•pandas.DataFrame( data, index, columns, dtype, copy)
•Create DataFrame
•A pandas DataFrame can be created using various inputs like −
•Lists
•dict
•Series
•Numpy ndarrays
•Another DataFrame

Pandas

In this document

More Related Content

What's hot

Similar to Pandas

Recently uploaded

Pandas