Checking for pattern in string Python

Question

Suppose we have strings of the type:

test= '--a-kbb-:xx---xtx:-----x--:---g-x--:-----x--:------X-:XXn-tt-X:l--f--O-'

that is, they are always composed of 8 sections separated by :, so one could split the string into a list with each element corresponding to a section:

testsep = test.split(':')

giving

['--a-kbb-', 'xx---xtx', '-----x--', '---g-x--', '-----x--', '------X-', 'XXn-tt-X', 'l--f--O-']

Now I want to check if the string test is such that there are in 3 consecutive sections an x occurring at the same position of the section. For example, with the test given above, we find at least one such case: counting from 1, sections 2,3 and 4 contain an x at the same position, namely at index 6. Therefore, our test string here matches the wanted pattern.

Is there a simple (maybe functional way) of checking for such patterns given strings always composed with the formatting above?

The naive approach would be to split, then loop through all sections and see if there are consecutive sections having x at each possible position (first index 1, 2, ...up to 8), but that wouldn't be very python-like.

Have you thought about using regex? Is the pattern already known or do you need to identify the pattern as well? — Jake
– Jake, Commented Apr 10, 2019 at 15:56
You could turn your string into a matrix with 1 in place of 'x' and 0 elsewhere; then sum by column and check if any sum is equal or greater than 3 — Don
– Don, Commented Apr 10, 2019 at 16:07
@Don ah, so neat, thanks! Feel free to also add that as an answer for future readers. — user929304
– user929304, Commented Apr 10, 2019 at 16:13
Thanks. Have a look on my answer for an alternative solution — Don
– Don, Commented Apr 10, 2019 at 16:16

Ajax1234 · Accepted Answer · 2019-04-10 15:52:43Z

2

A possibility is to use itertools.groupby with a class to group runs of strings that all have an x at the same position:

from itertools import groupby
class X:
  def __init__(self, _x):
    self.x = _x
  def __eq__(self, _val):
    return any(a == 'x' and b =='x' for a, b in zip(self.x, _val.x))

d = ['--a-kbb-', 'xx---xtx', '-----x--', '---g-x--', '-----x--', '------X-', 'XXn-tt-X', 'l--f--O-']
result = [[a, [i.x for i in b]] for a, b in groupby(list(map(X, d)))]
final_result = [b for _, b in result if any(all(h == 'x' for h in c) for c in zip(*b))]

Output:

[['xx---xtx', '-----x--', '---g-x--', '-----x--']]

However, it is much simpler to use the naive approach and indeed, the solution is quite Pythonic:

def group(d):
  start = [d[0]]
  for i in d[1:]:
    if any(all('x' == c for c in b) for b in zip(*(start+[i]))):
       start.append(i)
    else:
       if len(start) > 1:
         yield start
       start = [i]

print(list(group(d)))

Output:

[['xx---xtx', '-----x--', '---g-x--', '-----x--']]

edited Apr 10, 2019 at 15:52

answered Apr 10, 2019 at 15:46

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user929304 Over a year ago

Hi Ajax, hope you are well. I was wondering, if time allows, if you could have a look at this recent post (somewhat related): stackoverflow.com/q/64106974/7685268 any feedback would be helpful. Thanks in any case

Mikhail Vladimirov · Accepted Answer · 2019-04-10 16:07:50Z

Is this pythonish enough?

str = '--a-kbb-:xx---xtx:-----x--:---g-x--:-----x--:------X-:XXn-tt-X:l--f--O-'
sections = str.split (':')
reduce (lambda a, b: a | ('xxx' in b), [reduce(lambda c, d: c + d, map(lambda c: c[i], sections), '') for i in range(reduce (lambda e, f: max (e, len (f)), sections, 0))], False)

Explanation

reduce (lambda e, f: max (e, len (f)), sections, 0)

calculates the maximum section length;

for i in range(reduce (lambda e, f: max (e, len (f)), sections, 0))

iterates i from zero to maximum section length minus 1;

map(lambda c: c[i], sections)

calculates list of i-th characters of all sections;

reduce(lambda c, d: c + d, map(lambda c: c[i], sections), '')

calculates string consisting of i-th characters of all sections;

[reduce(lambda c, d: c + d, map(lambda c: c[i], sections), '') for i in range(reduce (lambda e, f: max (e, len (f)), sections, 0))]

calculates list of strings, where i-th string consists of i-th characters of all sections;

and final expression returns True in case any of the strings in the list calculated at previous step contains three consecutive 'x's.

Don · Accepted Answer · 2019-04-10 16:22:04Z

2

Pick every 9th element and check if there are 3 consecutive 'x's:

test= '--a-kbb-:xx---xtx:-----x--:---g-x--:-----x--:------X-:XXn-tt-X:l--f--O-'
for i in range(9):
    if 'xxx' in test[i::9]:
        print("Pattern matched at position %d" % i)
        break
else:
    print("Pattern not matched")

gives

Pattern matched at position 5

Short version:

>>> any(('xxx' in test[i::9] for i in range(9)))
True

edited Apr 10, 2019 at 16:22

answered Apr 10, 2019 at 16:14

Don

17.7k13 gold badges67 silver badges106 bronze badges

Collectives™ on Stack Overflow

Checking for pattern in string Python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related