1

I have below dataframe.

+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
|  Inv-001  |     A    |  2  |
+-----------+----------+-----+
|  Inv-001  |     B    |  3  |
+-----------+----------+-----+
|  Inv-001  |     C    |  1  |
+-----------+----------+-----+
|  Inv-002  |     B    |  3  |
+-----------+----------+-----+
|  Inv-002  |     D    |  4  |
+-----------+----------+-----+
|  Inv-003  |     C    |  3  |
+-----------+----------+-----+
|  Inv-003  |     D    |  9  |
+-----------+----------+-----+
|  Inv-004  |     D    |  5  |
+-----------+----------+-----+
|  Inv-004  |     E    |  8  |
+-----------+----------+-----+
|  Inv-005  |     X    |  2  |
+-----------+----------+-----+

my task is to create an additional column Type based on the priority of the item occurrence.

eg: ItemCode A has 1st Priority. then B has 2nd priority and C has 3rd priority. rest of the items has least priority and classified has Other.

So, if any Invoice contains item A, the type should be Type - A irrespective other items presence. from the balance Invoices if item B contains, then the type should be Type - B. same for C. if none of A, B or C is not present in any invoice, then the type should be Type - Other.

Below is my desired output.

+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty |     Type     |
+-----------+----------+-----+--------------+
|  Inv-001  |     A    |  2  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     B    |  3  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     C    |  1  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-002  |     B    |  3  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-002  |     D    |  4  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-003  |     C    |  3  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-003  |     D    |  9  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-004  |     D    |  5  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-004  |     E    |  8  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-005  |     X    |  2  | Type - Other |
+-----------+----------+-----+--------------+

Below is my code and it works. But, it is more cumbersome and not pythonic at all.

# load Dataframe
df = pd.read_excel() 

# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'

# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'

# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'

# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'

# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)

Now, what is the most efficient and pythonic way to do this?

2
  • 1
    In the future please do not use markdown tables, because they are not easily reproducible. Please see How to provide a reproducible copy of your DataFrame, then edit your question, and paste the clipboard into a code block. Commented Aug 7, 2020 at 23:13
  • Nevermind, I didn't use the invoice col Commented Aug 7, 2020 at 23:28

1 Answer 1

2

I feel like we can do Categorical then transform

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)

df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')

Update

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df=df.sort_values('Type')
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')
df=df.sort_index()

df
Out[32]: 
     InvoiceNo ItemCode  Qty        Type
0    Inv-001          A    2      Type_A
1    Inv-001          B    3      Type_A
2    Inv-001          C    1      Type_A
3    Inv-002          B    3      Type_B
4    Inv-002          D    4      Type_B
5    Inv-003          C    3      Type_C
6    Inv-003          D    9      Type_C
7    Inv-004          D    5  Type_other
8    Inv-004          E    8  Type_other
9    Inv-005          X    2  Type_other
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.