Vector function on Pandas Dataframe [duplicate]

Question

I want to calculate frequency of a word in a sentence. My dataframe has a "Title" column which contains a sentence (String) in each row. This is my current approach:

# num times queryWord is in sentence / num words in sentence
list = df['Title'].str.count(queryWord) / len(df['Title'].str.split())

However, len(df['Title'].str.split()) returns the length of the "Title" column rather than the length of the array that is generated by split() in each row. How do I fix this?

tobsecret · Accepted Answer · 2018-06-26 19:29:20Z

0

This should do the trick:

list = df['Title'].str.count(queryWord) / df['Title'].str.split().str.len()

df['Title'].str.split() returns a pd.Series of list objects. That's why this question was marked as a duplicate.

answered Jun 26, 2018 at 19:29

tobsecret

2,5221 gold badge18 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luciano Over a year ago

Thanks, that did it. What is the meaning of .str?

tobsecret Over a year ago

Glad it worked, please accept the answer. In pandas, the Series object has string methods which you can access via .str.method_name. The reason you have to access them that way is that some of them have the same name as another method that has a different use. Examples of this are pd.Series.str.replace which does not work the same way as pd.Series.replace and pd.Series.str.get which does not work the same way as pd.Series.get

Collectives™ on Stack Overflow

Vector function on Pandas Dataframe [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related