1

Starting with:

import pandas as pd

lis1= [['apples'],['bananas','oranges','cinnamon'],['pears','juice']]
lis2= [['john'],['stacy'],['ron']]

pd.DataFrame({'fruits':lis1,'users':lis2})

                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

I'd like to end with:

lis3= ['apples','bananas','oranges','cinnamon','pears','juice']
lis4= ['john','stacy','stacy','stacy','ron','ron']

pd.DataFrame({'fruits': lis3, 'users':lis4})

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

First, I need to create a new dataframe with each item sitting in its own row. Second, the name variable needs to repeat itself depending on the number of "fruits". So looking at the example, John has one fruit while Stacy has 5 fruits-- so under usernames Stacy has to be repeated 5 times.

2

3 Answers 3

3

itertools

from itertools import chain, product, starmap

pd.DataFrame(
    [*chain(*starmap(product, zip(df.fruits, df.users)))],
    columns=df.columns
)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

This also works if you have just 2 columns

pd.DataFrame(
    [*chain(*starmap(product, zip(*map(df.get, df))))],
    columns=df.columns
)

generator

def f(z):
  for A, B in z:
    for a in A:
      for b in B:
        yield (a, b)

pd.DataFrame([*f(zip(df.fruits, df.users))], columns=df.columns)

     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron
Sign up to request clarification or add additional context in comments.

Comments

2

Assuming that lis1 and lis2 have the same number of elements, you can do this with a list comprehension after zipping the lists.

pd.DataFrame(
  [{'fruit':F, 'users':U} for (f, u) in zip(lis1, lis2) for F in f for U in u]
)

The below code produces the following output:

      fruit    users
0    apples     john
1   bananas    stacy
2   oranges    stacy
3  cinnamon    stacy
4     pears      ron
5     juice      ron

1 Comment

This works only because I have access to lis1/lis2 in the example. For my dataset, I'm given a dataframe with a column variable "fruit" and "user". The rows are populated with lists like the above example. Would lis1 essentially be: df['fruit] ? -- which makes it a series, do they work like a list?
1

Here is a solution with lots of stacking and unstacking:

Starting with:

>>> df
                         fruits    users
0                      [apples]   [john]
1  [bananas, oranges, cinnamon]  [stacy]
2                [pears, juice]    [ron]

Use:

final = (df.stack().apply(pd.Series)
         .stack(0).unstack(1)
         .ffill()
         .reset_index(drop=True))

>>> final
     fruits  users
0    apples   john
1   bananas  stacy
2   oranges  stacy
3  cinnamon  stacy
4     pears    ron
5     juice    ron

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.