Suppose I have a DataFrame such as:
col1 col2
0 1 A
1 2 B
2 6 A
3 5 C
4 9 C
5 3 A
6 5 B
And multiple lists such as:
list_1 = [1, 2, 4]
list_2 = [3, 8]
list_3 = [5, 6, 7, 9]
I can update the value of col2 depending on whether the value of col1 is included in a list, for example:
for i in list_1:
df.loc[df.col1 == i, 'col2'] = 'A'
for i in list_2:
df.loc[df.col1 == i, 'col2'] = 'B'
for i in list_3:
df.loc[df.col1 == i, 'col2'] = 'C'
However this is very slow. With a dataframe of 30,000 rows, and each list containing approx 5,000-10,000 items, it can take a long time to calculate, especially compared to other pandas operations. Is there a better (faster) way of doing this?