Data Frames
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with
rows and columns.
import pandas as pd
import pandas as pd
data = {
"Marks": [80, 75, 90],
"Sub": ['Python', 'Java', 'Database']
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Data Frames
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and
columns.
Pandas use the loc attribute to return one or more specified row(s)
import pandas as pd
data = {
"Marks": [80, 75, 90],
"Sub": ['Python', 'Java', 'Database']
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df.loc[0]) #print(df.loc[[0, 1]])
Data Frames
Named Index
import pandas as pd
data = {
"Marks": [80, 75, 90],
"Sub": ['Python', 'Java', 'Database']
}
#load data into a DataFrame object:
df = pd.DataFrame(data,index= ["day1","day2","day3"])
print(df)
Locate Named Indexes
Use the named index in the loc attribute to return the specified row(s).
Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])
Data Frames
Load Files Into a DataFrame
If your data sets are stored in a file, Pandas can load them into a DataFrame.
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
import pandas as pd
print(pd.options.display.max_rows)
Data Frames
Read JSON
Big data sets are often stored, or extracted as JSON.
JSON is plain text, but has the format of an object, and is well known in the world of
programming, including Pandas.
In our examples we will be using a JSON file called 'data.json’.
use to_string() to print the entire DataFrame.
Data Frames
import pandas as pd
data = {
"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
"4":45,
"5":60
},
"Pulse":{
"0":110,
"1":117,
"2":103,
"3":109,
"4":117,
"5":102
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
"4":148,
"5":127
},
"Calories":{
"0":409,
"1":479,
"2":340,
"3":282,
"4":406,
"5":300
}
}
df = pd.DataFrame(data)
print(df)
Viewing the Data
• One of the most used method for getting a quick overview of the
DataFrame, is the head() method.
• The head() method returns the headers and a specified number of rows,
starting from the top.
• import pandas as pd
• df = pd.read_csv('data.csv')
• print(df.head(10))
• #Print the first 5 rows of the DataFrame:print(df.head())
• There is also a tail() method for viewing the last rows of the
DataFrame.
• The tail() method returns the headers and a specified number of
rows, starting from the bottom.
• Example
• Print the last 5 rows of the DataFrame:
• print(df.tail())
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving rows by iloc method
row2 = data.iloc[3]
print(row2)
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
print(df)
• Dropping missing values using dropna() :
• In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null
values in different ways.
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
Print(df)
• Now we drop rows with at least one Nan value (Null value).
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
# using dropna() function
print(df.dropna())
• Iterating over rows and columns
• Iteration is a general term for taking each item of something, one
after another. Pandas DataFrame consists of rows and columns so, in
order to iterate over dataframe, we have to iterate a dataframe like a
dictionary.
• Iterating over rows :
• In order to iterate over rows, we can use three function iteritems(),
iterrows(), itertuples() . These three function will help in iteration
over rows.
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print(df)
Now we apply iterrows() function in order to get a each element of rows.
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
# iterating over rows using iterrows() function
for i, j in df.iterrows():
print(i, j)
print()
Data Frame Data structure in Python pandas.pptx

Data Frame Data structure in Python pandas.pptx

  • 1.
    Data Frames A PandasDataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns. import pandas as pd import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df)
  • 2.
    Data Frames Locate Row Asyou can see from the result above, the DataFrame is like a table with rows and columns. Pandas use the loc attribute to return one or more specified row(s) import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df.loc[0]) #print(df.loc[[0, 1]])
  • 3.
    Data Frames Named Index importpandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data,index= ["day1","day2","day3"]) print(df) Locate Named Indexes Use the named index in the loc attribute to return the specified row(s). Example Return "day2": #refer to the named index: print(df.loc["day2"])
  • 4.
    Data Frames Load FilesInto a DataFrame If your data sets are stored in a file, Pandas can load them into a DataFrame. import pandas as pd df = pd.read_csv('data.csv') print(df) import pandas as pd print(pd.options.display.max_rows)
  • 5.
    Data Frames Read JSON Bigdata sets are often stored, or extracted as JSON. JSON is plain text, but has the format of an object, and is well known in the world of programming, including Pandas. In our examples we will be using a JSON file called 'data.json’. use to_string() to print the entire DataFrame.
  • 6.
    Data Frames import pandasas pd data = { "Duration":{ "0":60, "1":60, "2":60, "3":45, "4":45, "5":60 }, "Pulse":{ "0":110, "1":117, "2":103, "3":109, "4":117, "5":102 }, "Maxpulse":{ "0":130, "1":145, "2":135, "3":175, "4":148, "5":127 }, "Calories":{ "0":409, "1":479, "2":340, "3":282, "4":406, "5":300 } } df = pd.DataFrame(data) print(df)
  • 7.
    Viewing the Data •One of the most used method for getting a quick overview of the DataFrame, is the head() method. • The head() method returns the headers and a specified number of rows, starting from the top. • import pandas as pd • df = pd.read_csv('data.csv') • print(df.head(10)) • #Print the first 5 rows of the DataFrame:print(df.head())
  • 8.
    • There isalso a tail() method for viewing the last rows of the DataFrame. • The tail() method returns the headers and a specified number of rows, starting from the bottom. • Example • Print the last 5 rows of the DataFrame: • print(df.tail())
  • 9.
    import pandas aspd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving rows by iloc method row2 = data.iloc[3] print(row2)
  • 10.
    # importing pandasas pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) print(df)
  • 11.
    • Dropping missingvalues using dropna() : • In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null values in different ways. # importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) Print(df)
  • 12.
    • Now wedrop rows with at least one Nan value (Null value). # importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) # using dropna() function print(df.dropna())
  • 13.
    • Iterating overrows and columns • Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary. • Iterating over rows : • In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.
  • 14.
    # importing pandasas pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) print(df)
  • 15.
    Now we applyiterrows() function in order to get a each element of rows. # importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) # iterating over rows using iterrows() function for i, j in df.iterrows(): print(i, j) print()