Regular expression in Python for matching strings in a CSV file

Question

I am working with regular expressions in Python. I want to match a few lines from a CSV file inserted into a database that starts and ends with an underscore.

I have used regular expressions in my Python script to do the same but it prints the result as 'none'. Here is my code for the same, kindly tell me what mistake I am making:

reg = re.compile(r'^_.*_$',re.I)
imatch = reg.match(unicode(row[4], "utf8"))

Here r'^_.*_$',re.I is my regular expression to match lines starting and ending with _. unicode(row[4], "utf8") specifies the row from the CSV file inserted into a database.

Any help would be appreciated.

It's not possible to answer this question without knowing the contents of row[4], and what you're trying to match. Do you know that there are cases that begin and end with a _ that are not being matched? — David Robinson
– David Robinson, Commented Feb 17, 2013 at 16:17
unicode(row[4], "utf8") = ( aaaaa bbbb ccccc 5635! fgsfrq. ) Assume this is my string , i want to match few strings like this that starts and ends with _ and it should match with that regular expression — Gayathri
– Gayathri, Commented Feb 17, 2013 at 16:19
Why would you expect that to match this regular expression? It doesn't start and end with an _. — David Robinson
– David Robinson, Commented Feb 17, 2013 at 16:20
Can you give me a proper re syntax for matching if mine is wrong. — Gayathri
– Gayathri, Commented Feb 17, 2013 at 16:22
What are you trying to match? You said you wanted to match only lines starting and ending with _. Is that not what you want to do? — David Robinson
– David Robinson, Commented Feb 17, 2013 at 16:23

Anil · Accepted Answer · 2013-02-18 04:10:48Z

1

import re
lines = [line.strip() for line in open('file.csv')]
for x in lines:
    match=re.search(r'^_.*_$',x)
    if match: print x

we have to strip each line otherwise each line ends with char '\n' instead of '_' in that case regex won't match the string.

file.csv

_abdlfla_
sldjlfds_
_adlfdls
_132jdlfjflds_

output

_abdlfla_
_132jdlfjflds_

edited Feb 18, 2013 at 4:10

answered Feb 17, 2013 at 16:29

Anil

5881 gold badge4 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Honest Abe Over a year ago

If you included a sentence about why using strip solves the problem I would be inclined to upvote.

shantanoo · Accepted Answer · 2013-02-17 16:35:57Z

0

You may use startswith and endswith function instead of re. Any specific reason for using re?

for l in open('test.csv'):
    l=l.strip()
    if l.startswith('_') and l.endswith('_'):
        print(l)

answered Feb 17, 2013 at 16:35

shantanoo

3,7241 gold badge26 silver badges37 bronze badges

Collectives™ on Stack Overflow

Regular expression in Python for matching strings in a CSV file

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related