Efficient/Pythonic way to Filter pandas DataFrame based on priority

Question

I have below dataframe.

+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
|  Inv-001  |     A    |  2  |
+-----------+----------+-----+
|  Inv-001  |     B    |  3  |
+-----------+----------+-----+
|  Inv-001  |     C    |  1  |
+-----------+----------+-----+
|  Inv-002  |     B    |  3  |
+-----------+----------+-----+
|  Inv-002  |     D    |  4  |
+-----------+----------+-----+
|  Inv-003  |     C    |  3  |
+-----------+----------+-----+
|  Inv-003  |     D    |  9  |
+-----------+----------+-----+
|  Inv-004  |     D    |  5  |
+-----------+----------+-----+
|  Inv-004  |     E    |  8  |
+-----------+----------+-----+
|  Inv-005  |     X    |  2  |
+-----------+----------+-----+

my task is to create an additional column Type based on the priority of the item occurrence.

eg: ItemCode A has 1st Priority. then B has 2nd priority and C has 3rd priority. rest of the items has least priority and classified has Other.

So, if any Invoice contains item A, the type should be Type - A irrespective other items presence. from the balance Invoices if item B contains, then the type should be Type - B. same for C. if none of A, B or C is not present in any invoice, then the type should be Type - Other.

Below is my desired output.

+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty |     Type     |
+-----------+----------+-----+--------------+
|  Inv-001  |     A    |  2  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     B    |  3  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     C    |  1  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-002  |     B    |  3  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-002  |     D    |  4  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-003  |     C    |  3  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-003  |     D    |  9  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-004  |     D    |  5  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-004  |     E    |  8  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-005  |     X    |  2  | Type - Other |
+-----------+----------+-----+--------------+

Below is my code and it works. But, it is more cumbersome and not pythonic at all.

# load Dataframe
df = pd.read_excel() 

# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'

# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'

# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'

# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'

# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)

Now, what is the most efficient and pythonic way to do this?

In the future please do not use markdown tables, because they are not easily reproducible. Please see How to provide a reproducible copy of your DataFrame, then edit your question, and paste the clipboard into a code block. — Trenton McKinney
– Trenton McKinney, Commented Aug 7, 2020 at 23:13

BENY · Accepted Answer · 2020-08-08 00:20:37Z

2

I feel like we can do Categorical then transform

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)

df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')

Update

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df=df.sort_values('Type')
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')
df=df.sort_index()

df
Out[32]: 
     InvoiceNo ItemCode  Qty        Type
0    Inv-001          A    2      Type_A
1    Inv-001          B    3      Type_A
2    Inv-001          C    1      Type_A
3    Inv-002          B    3      Type_B
4    Inv-002          D    4      Type_B
5    Inv-003          C    3      Type_C
6    Inv-003          D    9      Type_C
7    Inv-004          D    5  Type_other
8    Inv-004          E    8  Type_other
9    Inv-005          X    2  Type_other

edited Aug 8, 2020 at 0:20

answered Aug 7, 2020 at 23:15

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficient/Pythonic way to Filter pandas DataFrame based on priority

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related