Check if string in one column is contained in string of another column in the same row

Question

I have a dataframe like this:

RecID| A  |B
----------------
1    |a   | abc 
2    |b   | cba 
3    |c   | bca
4    |d   | bac 
5    |e   | abc

I want to create another column, C, out of A and B such that for the same row, if the string in column A is contained in the string of column B, then C = True and if not then C = False.

The example output I am looking for is this:

RecID| A  |B    |C 
--------------------
1    |a   | abc |True
2    |b   | cba |True
3    |c   | bca |True
4    |d   | bac |False
5    |e   | abc |False

Is there a way to do this in pandas quickly and without using a loop?

Possible duplicate: check element-wise for existence of string (it's newer, but the answers are more comprehensive) — wjandrea
– wjandrea, Commented Apr 24, 2024 at 19:58

wjandrea · Accepted Answer · 2024-04-24 19:36:01Z

83

You need apply with in:

df['C'] = df.apply(lambda row: row.A in row.B, axis=1)
print(df)

   RecID  A    B      C
0      1  a  abc   True
1      2  b  cba   True
2      3  c  bca   True
3      4  d  bac  False
4      5  e  abc  False

Another solution with list comprehension is faster, but there has to be no NaNs:

df['C'] = [row[0] in row[1] for row in zip(df['A'], df['B'])]
print(df)

   RecID  A    B      C
0      1  a  abc   True
1      2  b  cba   True
2      3  c  bca   True
3      4  d  bac  False
4      5  e  abc  False

edited Apr 24, 2024 at 19:36

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

answered May 8, 2017 at 19:22

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

bbennett36 Over a year ago

Wow, I wish I knew you could use apply in that way to use multiple columns for a calculation. I'm assuming this is way quicker than using data.iterrows()?

jezrael Over a year ago

I thonk yes, it should be faster.

elPastor Over a year ago

Timing test - I ran this on a dataframe with about 90k rows: lambda version took 9.4245 seconds, list comprehension took 0.0308 seconds. With the caveat of no NaN in the series per jezrael's note.

ah bon Over a year ago

Sorry, I get an error: TypeError: argument of type 'float' is not iterable, as there are NaNs in df['B'], any ideas to deal with this?

Bonnie Over a year ago

Here's a version using apply that will handle NaNs. This checks to see if x.A is a null, returns False if so, and then checks for x.A in x.B otherwise: df['C'] = df.apply(lambda x: False if pd.isnull(x.A) else x.A in x.B, axis=1)

KubaAdam · Accepted Answer · 2021-02-22 12:52:36Z

8

If you are comparing string to string and getting the Type Error you can code this like that:

df['C'] = df.apply(lambda x: str(x.A) in str(x.B), axis=1)

edited Feb 22, 2021 at 12:52

answered Feb 18, 2021 at 14:21

KubaAdam

811 silver badge3 bronze badges

1 Comment

Bonnie Over a year ago

Note that if x.B contains the string 'nan' you will get some false positives here.

David Clarke · Accepted Answer · 2023-07-27 22:44:32Z

7

I could not get either answer @jezreal provided to handle None's in the first column. A slight alteration to the list comprehension is able to handle it:

[a in b if a is not None else False for a,b in zip(df['A'], df['B'])]

edited Jul 27, 2023 at 22:44

David Clarke

13.4k9 gold badges92 silver badges121 bronze badges

answered Sep 18, 2019 at 22:24

Doubledown

4681 gold badge7 silver badges13 bronze badges

Collectives™ on Stack Overflow

Check if string in one column is contained in string of another column in the same row

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related