I have below dataframe.
+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
| Inv-001 | A | 2 |
+-----------+----------+-----+
| Inv-001 | B | 3 |
+-----------+----------+-----+
| Inv-001 | C | 1 |
+-----------+----------+-----+
| Inv-002 | B | 3 |
+-----------+----------+-----+
| Inv-002 | D | 4 |
+-----------+----------+-----+
| Inv-003 | C | 3 |
+-----------+----------+-----+
| Inv-003 | D | 9 |
+-----------+----------+-----+
| Inv-004 | D | 5 |
+-----------+----------+-----+
| Inv-004 | E | 8 |
+-----------+----------+-----+
| Inv-005 | X | 2 |
+-----------+----------+-----+
my task is to create an additional column Type based on the priority of the item occurrence.
eg: ItemCode A has 1st Priority. then B has 2nd priority and C has 3rd priority. rest of the items has least priority and classified has Other.
So, if any Invoice contains item A, the type should be Type - A irrespective other items presence. from the balance Invoices if item B contains, then the type should be Type - B. same for C. if none of A, B or C is not present in any invoice, then the type should be Type - Other.
Below is my desired output.
+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty | Type |
+-----------+----------+-----+--------------+
| Inv-001 | A | 2 | Type - A |
+-----------+----------+-----+--------------+
| Inv-001 | B | 3 | Type - A |
+-----------+----------+-----+--------------+
| Inv-001 | C | 1 | Type - A |
+-----------+----------+-----+--------------+
| Inv-002 | B | 3 | Type - B |
+-----------+----------+-----+--------------+
| Inv-002 | D | 4 | Type - B |
+-----------+----------+-----+--------------+
| Inv-003 | C | 3 | Type - C |
+-----------+----------+-----+--------------+
| Inv-003 | D | 9 | Type - C |
+-----------+----------+-----+--------------+
| Inv-004 | D | 5 | Type - Other |
+-----------+----------+-----+--------------+
| Inv-004 | E | 8 | Type - Other |
+-----------+----------+-----+--------------+
| Inv-005 | X | 2 | Type - Other |
+-----------+----------+-----+--------------+
Below is my code and it works. But, it is more cumbersome and not pythonic at all.
# load Dataframe
df = pd.read_excel()
# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'
# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'
# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'
# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'
# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)
Now, what is the most efficient and pythonic way to do this?