I have been looking around for a way to convert an excel file with multiple headers into column headings using the pandas library.
I have been successful in importing the data into a dataframe by reading the file and parsing it using the ExcelFile. I have also been able to identify the headers using the header=[0, 4]. Where I run into issues is reindexing and/or using the melt function to convert the headers into columns.
When I use the melt function I am able to successfully convert the columns into the rows. However, I want the headers to be a single column rather than be stacked with the rest of the data.
Currently, this is how the data is structured:
Excel file displaying data with multiple headers
After the conversion, the data should look like this:
Data that is unpivot with headers converted into columns
I have been reading about indexing, but not sure I understand how it would apply here.
I'm new to python, like really new, and any support or direction is greatly appreciated. I have been reading the following cheatsheets but haven't found the right way to convert it:
https://www.datacamp.com/community/data-science-cheatsheets
Here is a sample code:
import pandas as pd
xl = pd.ExcelFile('help.xlsx')
df1 = xl.parse('Sheet1')
df2 = pd.melt(df1,
id_vars=['PW'],
value_vars=['Fruit','Conventional'])
Also, adding the results after running the code: df1 the data with multiple headers
The following is the error with the data (headers are not converted into columns, headers are stacked with the rest of the data):
This is how the final product should look: