53

I have the below code

import pandas as pd
private = pd.read_excel("file.xlsx","Pri")
public = pd.read_excel("file.xlsx","Pub")
private["ISH"] = private.HolidayName.str.lower().contains("holiday|recess")
public["ISH"] = public.HolidayName.str.lower().contains("holiday|recess")

I get the following error:

AttributeError: 'Series' object has no attribute 'contains'

Is there anyway to convert the 'HolidayName' column to lower case and then check the regular expression ("Holiday|Recess")using .contains in one step?

2
  • 1
    If you convert the terms to lowercase, they'll never contain uppercase letters like H or R.. Commented Apr 7, 2014 at 10:08
  • Thank you for pointing that out. It was an oversight when I was typing out my example. Have fixed it. Commented Apr 7, 2014 at 10:09

2 Answers 2

99
private["ISH"] = private.HolidayName.str.contains("(?i)holiday|recess")

The (?i) in the regex pattern tells the re module to ignore case.


The reason why you were getting an error is because the Series object does not have the contains method; instead the Series.str attribute has the contains method. So you could avoid the error with:

private["ISH"] = private.HolidayName.str.lower().str.contains("holiday|recess")
Sign up to request clarification or add additional context in comments.

4 Comments

nice! is this in the docs somewhere? is this a pandas-specific thing, or something else? guessing something else.
@grisaitis: The vectorized string methods are documented here. The use of (?i) to do case-insensitive pattern matching is part of the Python re module's regular expression syntax. (Search for the string (?iLmsux)).
Sad that the best answer to this problem involves deep knowlege of RE. N00bs (such as myself) are looking to chain the str operations.
just to add, if you want an exact match, then df[(private['HolidayName'].str.lower() == "holiday")] should work
24

I'm a bit late to the party, but you could use the keyarg case : bool, default True, If True, case sensitive.

private["ISH"] = private.HolidayName.str.contains("holiday|recess", case=False)
public["ISH"] = public.HolidayName.str.contains("holiday|recess", case=False)

1 Comment

FYI - checked all 3 methods (chaining str, using regex, using case=False).....turns out regex is the fastest, case=False method is about 1.25 times longer, chaining str is about 1.06 times longer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.