1

I have data of the form

CS989_-RED814298959
CS663_RED812657324
RED819238322_CS537
......

This data is in csv file. I want to retrieve all the sub strings starting with RED. Please suggest me a way to do this using regular expression in python

I tried following code:

import re
string="RED819238322_CS537"
substring=re.match("[a-zA-Z]*//([0-9]*)",string)

It's returning None

4
  • 2
    Put what you have tried in your question please. Commented Oct 7, 2013 at 17:10
  • What's a "substring" ? Could you post the expected results for your example input ? Commented Oct 7, 2013 at 17:27
  • 1
    There is no // in your data, why do you expect this to match? Commented Oct 7, 2013 at 17:52
  • @brunodesthuilliers Output should be a list ["RED814298959",RED812657324,RED819238322] Commented Oct 8, 2013 at 15:50

3 Answers 3

2

If you don't need regex, don't use regex.

with open('myfile') as f:
    print([l for l in f if l.startswith('RED')])

changing as necessary, i.e. with csv.reader:

with open('myfile') as f:
    print([row for row in csv.reader(f) if row[0].startswith('RED')])
Sign up to request clarification or add additional context in comments.

1 Comment

I can not use "startwith" because sometimes sub string comes in between
0

Help on function match in module re:

match(pattern, string, flags=0) Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.

You want re.search or re.findall instead. Also your regexp is incorrect - if what you want is just "RED" followed by any number of digits, it's spelled r"RED[0-9]+"

>>> strings
['CS989_-RED814298959', 'CS663_RED812657324', 'RED819238322_CS537']
>>> re.match(r"(RED[0-9]+)", strings[0])
>>> re.findall(r"(RED[0-9]+)", strings[0])
['RED814298959']
>>> re.findall(r"(RED[0-9]+)", strings[1])
['RED812657324']
>>> re.findall(r"(RED[0-9]+)", strings[2])
['RED819238322']
>>> re.search(r"(RED[0-9]+)", strings[0])
<_sre.SRE_Match object at 0x1772e40>

Comments

0

What are those slashes doing in there? Try this:

substring=re.match("[a-zA-Z]*([0-9]*)", string)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.