2

I need to try create two loops (must be separate):

LOOP 1) for each fruit:

  1. keep rows if that fruit is True
  2. remove rows with duplicate dates (either row can be deleted)
  3. save the result of the above as a dataframe for each fruit

LOOP 2) for each dataframe created, graph date on fruit_score:

    concat   apple_score  banana_score       date        apple      banana  
1   apple     0.400         0.400        2010-02-12      True        False  
2   banana    0.530         0.300        2010-01-12      False       True   
3   kiwi      0.532          0.200       2010-03-03      False       False  
4   bana      0.634         0.100        2010-03-03      False       True   

I tried:

fruits = ['apple',  'banana',   'orange']
for fruit in fruits:
    selected_rows = df[df[ fruit ] == True ]
    df_f'{fruit}' = selected_rows.drop_duplicates(subset='date')

for fruit in fruits:
    df_f'{fruit}'.plot(x="date", y=(f'{fruit}_score'), kind="line")
9
  • Are you trying to programatically define the name of a variable ? you're expecting to get a variable called df_apple for example ? Commented Jul 24, 2020 at 9:07
  • You could use a dict instead of getting a variable name based on the for loop: stackoverflow.com/a/11553769/1735729 Commented Jul 24, 2020 at 9:09
  • 1
    Use a dict then, fruits_df = {} and in your for loop use fruits_df[fruit] = ... Commented Jul 24, 2020 at 9:11
  • 1
    @Manakin i dont think that will work cause he got "bana" in concat but the column banana is set to true. + he wishes to drop duplicated by date between same fruit, the other one will drop duplicated for all fruits that have same date. Hes not looping on dataframe, but on fruits. Commented Jul 24, 2020 at 9:13
  • 1
    @Youyoun you can subset on more than one column, just add fruits to .drop_duplicates nothing complex here, no need to iterate over the list either. Commented Jul 24, 2020 at 9:17

1 Answer 1

3

You should do something along the lines suggested by @youyoun:

dfs = {}
fruits = ['apple',  'banana']
for fruit in fruits:
    selected_rows = df[df[ fruit ] == True ].drop_duplicates(subset='date')
    dfs[f'df_{fruit}'] = selected_rows

for a,v in dfs.items():
    print(a)
    print(v)

Output:

df_apple
  concat  apple_score  banana_score        date  apple  banana
1  apple          0.4           0.4  2010-02-12   True   False
df_banana
   concat  apple_score  banana_score        date  apple  banana
2  banana        0.530           0.3  2010-01-12  False    True
4    bana        0.634           0.1  2010-03-03  False    True
Sign up to request clarification or add additional context in comments.

1 Comment

even simplier you could do dfs = {fruit, data for fruit,data in df.groupby('fruit').unique()} or something along those lines.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.