Find index of substring within string from a dataframe

Question

I have a dataframe with two columns (and alot of rows), one column is the full sequence the other contains a sub sequence.

I want to find the index of where the sub sequence starts within the full sequence and add this as a another column:

I have tried this:

df["start"] = df.sequence.index(df.sub_sequence)

But this returns: TypeError: 'RangeIndex' object is not callable

What am i doing wrong?

Heres the df and the df i wish to end up with:

Sample dataframe:

import pandas as pd 

data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}    
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])

  sequence sub_sequence
0    abcde          cde
1    fghij           gh
2    klmno           no

Expected result:

data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

Shubham Sharma · Accepted Answer · 2020-07-13 13:46:46Z

3

Use zip and str.index in a list comprehension:

df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]

OR Use DataFrame.apply along axis=1 + str.index:

df['start'] = df[['sequence', 'sub_sequence']].apply(lambda s: str.index(*s), axis=1)

Result:

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

edited Jul 13, 2020 at 13:46

answered Jul 13, 2020 at 13:40

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Find index of substring within string from a dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related