1

I am iterating through a for loop looking for keyword matches in a list and then compiling the match indices to a third list. I can compile the indices as a list of lists, but I want to further group sub-lists by the item they matched.

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']

indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
    for i in my_list:
        for m in re.finditer(pat, i):
            a =list((m.start(),m.end()))
            indices.append(a)
print(indices)

This returns:

[[0, 2], [0, 2], [1, 3]] 

Trying to get:

[[0, 2], [[0, 2], [1, 3]]]

so that it is clear that:

[[0, 2], [1, 3]]

are indices matches on 'cde' in the example above.

1
  • list((m.start(),m.end())) is normally spelled [m.start(), m.end()]. Commented Mar 27, 2013 at 12:20

2 Answers 2

2

Make indices a dict:

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']

indices = {}
pats = [re.compile(i) for i in keywords]
for pat in pats:
    for i in my_list:
        indices.setdefault(i, [])
        for m in re.finditer(pat, i):
            a = list((m.start(),m.end()))
            indices[i].append(a)
print(indices)

Giving:

{'cde': [[0, 2], [1, 3]], 'ab': [[0, 2]]}

Is this what you're looking for?

I played with this code for a while and since you import itertools you might as well use it to get rid off those ugly nested fors ;) like that:

import re
from itertools import product

my_list = ['ab', 'cde']
keywords = ['ab', 'cd', 'de']

indices = {}
pats = [re.compile(i) for i in keywords]

for i, pat in product(my_list, pats):
    indices.setdefault(i, [])
    for m in re.finditer(pat, i):
        indices[i].append((m.start(), m.end()))

print(indices)

Unfortunately I can't get Bakuriu's idea to use list comprehension to work properly. So for now this seems like the best solution to me.

Sign up to request clarification or add additional context in comments.

4 Comments

A great idea to resolve this with dicts: much better than my ugly nests. Thank you!
Since you are new to SO: welcome here. And tip: when your question is answered the way that helps you, you can tick the best answer to indicate that question is answered and to award the person who answered with some reputation points ;)
Piotr thanks again. I need a score of 15 to vote up, but will as soon as I get there.
I know, but you don't need more rep to accept an answer (that's a big tick under votes for answer). Cheers! ;)
0

Create a list for each match and accumulate the matches in this list, finally add it to the result:

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']

indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
    for i in my_list:
        sublist = []
        for m in re.finditer(pat, i):
            a =list((m.start(),m.end()))
            sublist.append(a)
        indices.append(sublist)
print(indices)

Or you could use a list-comprehension:

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']

indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
    for i in my_list:
        sublist = [(m.start(), m.end()) for m in re.finditer(pat, i)]
        indices.append(sublist)
print(indices)

1 Comment

So much more readable, especially the list comp solution. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.