62

I have a dataframe like this:

RecID| A  |B
----------------
1    |a   | abc 
2    |b   | cba 
3    |c   | bca
4    |d   | bac 
5    |e   | abc

I want to create another column, C, out of A and B such that for the same row, if the string in column A is contained in the string of column B, then C = True and if not then C = False.

The example output I am looking for is this:

RecID| A  |B    |C 
--------------------
1    |a   | abc |True
2    |b   | cba |True
3    |c   | bca |True
4    |d   | bac |False
5    |e   | abc |False

Is there a way to do this in pandas quickly and without using a loop?

1

3 Answers 3

83

You need apply with in:

df['C'] = df.apply(lambda row: row.A in row.B, axis=1)
print(df)

   RecID  A    B      C
0      1  a  abc   True
1      2  b  cba   True
2      3  c  bca   True
3      4  d  bac  False
4      5  e  abc  False

Another solution with list comprehension is faster, but there has to be no NaNs:

df['C'] = [row[0] in row[1] for row in zip(df['A'], df['B'])]
print(df)

   RecID  A    B      C
0      1  a  abc   True
1      2  b  cba   True
2      3  c  bca   True
3      4  d  bac  False
4      5  e  abc  False
Sign up to request clarification or add additional context in comments.

5 Comments

Wow, I wish I knew you could use apply in that way to use multiple columns for a calculation. I'm assuming this is way quicker than using data.iterrows()?
I thonk yes, it should be faster.
Timing test - I ran this on a dataframe with about 90k rows: lambda version took 9.4245 seconds, list comprehension took 0.0308 seconds. With the caveat of no NaN in the series per jezrael's note.
Sorry, I get an error: TypeError: argument of type 'float' is not iterable, as there are NaNs in df['B'], any ideas to deal with this?
Here's a version using apply that will handle NaNs. This checks to see if x.A is a null, returns False if so, and then checks for x.A in x.B otherwise: df['C'] = df.apply(lambda x: False if pd.isnull(x.A) else x.A in x.B, axis=1)
8

If you are comparing string to string and getting the Type Error you can code this like that:

df['C'] = df.apply(lambda x: str(x.A) in str(x.B), axis=1)

1 Comment

Note that if x.B contains the string 'nan' you will get some false positives here.
7

I could not get either answer @jezreal provided to handle None's in the first column. A slight alteration to the list comprehension is able to handle it:

[a in b if a is not None else False for a,b in zip(df['A'], df['B'])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.