I have a DataFrame that looks as below:
id| products| quantity| price| time
101| prod_a, prod_b| 2, 1| 10,5| 2020-01-01, 2020-01-03
102| prod_a| 2| 5| 2020-01-03
103| prod_c| 21| 32| 2020-03-02
I have each column that has a particular metric value stored. I am trying to break this such that I can get the below structure:
id, products, quantity, price, time
101, prod_a, 2, 10, 2020-01-01
101, prod_b, 1, 5, 2020-01-03
102, prod_a, 2, 5, 2020-01-03
103, prod_c, 21, 32, 2020-03-02
I tried to convert each of the column to a Pandas series as below and then try concat into a master DataFrame as below:
s1 = df.products.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s2 = df.quantity.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s3 = df.price.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s4 = df. time.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
df = pd.concat([s1,s2,s3, s4], axis=1, keys=['products', 'quantity', 'price', 'time]).reset_index()
However doing the above throws the below error
ValueError: cannot reindex from a duplicate axis
pandas 1.0.3and it worked (for the exception of missing ' after time on the last line)