1

I have a dataframe df as

df
       col1 act_id col2                                                                                                 
   --------------------
0  40;30;30   act1 A;B;C
1  25;50;25   act2 D;E;F
2     70;30   act3 G;H

I want to break each record in such a way that values in column col1 and col2 explode into multiple rows but such that the first value in col1 after splitting upon ';' corresponds to the first value in col2 after splitting upon ';'. So my desired_df should look like this:

desired_df
       col1 act_id col2                                                                                                 
       ---------------
    0  40   act1   A
    1  30   act1   B
    2  30   act1   C
    3  25   act2   D
    4  50   act2   E
    5  25   act2   F                                                                                                  
    6  70   act3   G                                                                              
    7  30   act3   H                                                                               

NOTE: this is not the same as Split (explode) pandas dataframe string entry to separate rows as here the exploding/splitting of one record is not just across one column but the need is to split or explode one row into multiple rows, in two columns simultaneously.

Any help is appreciated. Thanks

1
  • you can use unnesting(df,['col1','col2']) from here: stackoverflow.com/questions/53218931/… after splitting the cols by ; like df.col1=df.col1.str.split(';') and same for col2 Commented Jun 4, 2019 at 4:41

3 Answers 3

1

one way to do this

df2.set_index('act_id').apply(lambda x: pd.Series(x.col1.split(';'),x.col2.split(';')), axis=1).stack().dropna().reset_index()

df2.columns = ['col1','act_id','col2']

  col1 act_id col2
0  A    act1   40 
1  B    act1   30 
2  C    act1   30 
3  D    act2   25 
4  E    act2   50 
5  F    act2   25 
6  G    act3   70 
7  H    act3   30 
Sign up to request clarification or add additional context in comments.

4 Comments

this solution worked for this particular df, although when i tried running this on a bigger dataframe with about 1M rows, i got the error as ValueError: cannot reindex from a duplicate axis I don't understand because even the bigger original dataframe had just these columns.
try the updated..problem might be related to duplicated index values, if the update doesn't works then try breaking the code and running it step by step, will help u debug the problem (do post on which step u faced the error)
I tried to break it down step by step and I found that the error occurs at the set_index('act_id') method. I removed all other methods and still got the error as ValueError: cannot reindex from a duplicate axis Is there any other way to get to the desired result...?
i still end up having the error ValueError: cannot reindex from a duplicate axis
0

the idea is col1 and col2 should be exploded and then merge on index and join back to the original dataframe.

df1 = df.col1.str.split(";").apply(pd.Series).stack().droplevel(1).reset_index()
df2 = df.col2.str.split(";").apply(pd.Series).stack().droplevel(1).reset_index()
df12 = pd.merge(df1, df2[0], left_index=True, right_index=True)
df12.columns = ["index", "col1", "col2"]

pd.merge(df12, df["act_id"], left_on="index", right_index=True)

2 Comments

AttributeError: 'Series' object has no attribute 'droplevel1'
droplevel(1).
0

A generic function could be:

list_cols = {'col1','col2'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
desired_df = pd.DataFrame(dict(zip(list_cols, exploded)))
desired_df = df[other_cols].merge(desired_df, how="right", left_index=True, right_index=True)

Please strsplit the col 1 and 2 before calling above function

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.