0

I am trying to remove rows from a large data frame based on whether each row has certain values in either of two different columns.

I will have a Series called "finalists". Finalists with be a series of names that will be imported from a different part of the code and will change each time its run.

ex)

finalists = ["Company A", "Company F", "Product S"... etc]

The dataframe will be about 1,000 rows long and 200 columns wide

Simplifying it, the dataframe would look something like this:

category score description company_name product_name comments
"----" 2.8 "----" Company A Product A "----"
"----" 1.2 "----" Company B Product B "----"
"----" 2.4 "----" Company C Product C "----"

I need to keep the rows where either the company_name column or product_name column is one of the values in the Finalists Series (or remove rows where it isn't).

I tried doing something like this:

results = finalists.isin(app_data["company_name"]) or finalists.isin(app_data["product_name"])

but got an error that the answer was ambiguous

1 Answer 1

2

You want something like

mask = app_data["company_name"].isin(finalists) | app_data["product_name"].isin(finalists)

filtered_app_data = app_data[mask]
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect Thank You! Do you know a way to include partial matches? For example if my Finalist Series had a value of "Company A" but one of the columns contains "Company A, Inc." I'd want to select that row. It looks like currently, these are being left out of its a non-exact match.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.