Searching for a specific phrase in CSV file using regex in Python

Question

I have a csv database of tweets, which I need to search for a list of specific phrases and words. For example, I'm searching for "global warming". I want to find not only "global warming", but also "Global warming", "Global Warming", "#globalwarming", "#Globalwarming", "#GlobalWarming", etc. So, all the possible forms.

How could I implement regex into my code to do that? Or maybe there's another solution?

with open('filedirectory.csv', 'w', newline='') as output_file:
    writer = csv.writer(output_file)

    with open('filedirectory1.csv', 'w', newline='') as output_file2:
        writer2 = csv.writer(output_file2)

        with open('filedirectory2.csv') as csv_file:
          csv_read = csv.reader(csv_file)

          for row in csv_read:

                search_terms = ["global warming", "GLOBAL WARMING", etc.]

                if any([term in row[2] for term in search_terms]):
                   writer.writerow(row)

                else:
                   writer2.writerow(row) ``

you can skip the upper and lowercase by forcing it: row = row.lower() for instance. Then the regex would be something along those lines: #?global\s*warming — Plopp
– Plopp, Commented Dec 4, 2019 at 10:08
Building up a regex matching all the forms you gave is possible. Have a look at this website it is very helpful. I would suggest a case insensitive regex, making use of optional characters (# and space) between global and warming. — Thombou
– Thombou, Commented Dec 4, 2019 at 10:08

can · Accepted Answer · 2019-12-05 13:15:22Z

1

You can use your own code with very simple modification

...

for row in csv_read:
    row_lower = row.lower()
    search_terms = ["global warming", "globalwarming"]

    if any([term in row_lower for term in search_terms]):
        writer.writerow(row)
    else:
        writer2.writerow(row)

If you must use regex or you are afraid to miss some rows such as : "...global(more than one space)warming...", "..global____warming..", "..global serious warming.."

...

global_regex = re.compile(r'global.*?warming', re.IGNORECASE)
for row in csv_read:            

        if any(re.findall(global_regex, row)):
           writer.writerow(row)
        else:
           writer2.writerow(row)

I compiled the regex outside the loop for better performance.

Here you can see the regex in action.

answered Dec 5, 2019 at 13:15

can

4446 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tomito Over a year ago

Thank you! Just had a chance to try it out. Complains that "expected string or bytes-like object"...

can Over a year ago

Which code snippet gives this error? Also can you paste the exact error please?

Collectives™ on Stack Overflow

Searching for a specific phrase in CSV file using regex in Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related