python - looping through values in a string

Question

I'm trying to get numerous values out of a pretty complex string that looks like this -

s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'

These are the values I need to scan for -

list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']

My intention is to get the 3 numbers after each string so in the example of HighPriority I would get [0, 74, 74] which I can then do something with each item.

I've used the below but it doesn't account for when the end of the string isn't a comma.

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""


for l in list:
    print l
    print find_between( s, l + ':', ',' ).split(':')

I think the best method to solve this problem is to learn to use module "re" of the standard lib. — mkiever
– mkiever, Commented Apr 5, 2016 at 20:46
yeah my re-fu is horrible. I've tried using re but when I see a block of code like \d\w\++\?\(\) I freeze because it's just not easy for me to read :( — whoisearth
– whoisearth, Commented Apr 5, 2016 at 20:48
Something like r = re.search('Compiler:([0-9]+):([0-9]+):([0-9]+)', s) should get you started. Use r.groups() to get the three substrings containing the numbers. — mkiever
– mkiever, Commented Apr 5, 2016 at 20:54
@whoisearth if you're desperately trying to avoid the use of regex, which I really don't recommend, you can use takewhile with a ''.join (see my edited answer). — Bahrom
– Bahrom, Commented Apr 5, 2016 at 21:18

Bahrom · Accepted Answer · 2016-04-05 22:11:17Z

Edit, if you really want to avoid regexes, your approach works with a minor tweak (I renamed list to l to avoid shadowing the built in type):

from itertools import takewhile
from string import digits

def find_between(s, first):
    try:
        start = s.index(first) + len(first)
        # Keep taking the next character while it's either a ':' or a digit
        # You can also just cast this into a list and forget about joining and later splitting.
        # Also, consider storing ':'+digits in a variable to avoid recreating it all the time
        return ''.join(takewhile(lambda char: char in ':'+digits, s[start:]))
    except ValueError:
        return ""


for _ in l:
    print _
    print find_between(s, _ + ':').split(':')

This prints:

Compiler
['0', '0', '0']
HighPriority
['0', '74', '74']
Default
['6', '1872', '1874']
LowPriority
['0', '2', '2']
Special
['0', '2', '2']
Event
['0', '0', '0']
CommHigh
['0', '1134', '1152']
CommDefault
['0', '4', '4']

However, this really is a task for regex, and you should try to get to know the basics.

import re

def find_between(s, word):
    # Search for your (word followed by ((:a_digit) repeated three times))
    x = re.search("(%s(:\d+){3})" % word, s)
    return x.groups()[0]

for word in l:
    print find_between(s, word).split(':', 1)[-1].split(':')

This prints

['0', '0', '0']
['0', '74', '74']
['6', '1872', '1874']
['0', '2', '2']
['0', '2', '2']
['0', '0', '0']
['0', '1134', '1152']
['0', '4', '4']

Milor123 · Accepted Answer · 2016-04-05 21:07:18Z

check this:

import re
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
data = []
for x in search:
    data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s))

data = [map(lambda x: x.split(':'), x) for x in data] # remove :
data = [x[0] for x in data] # remove unnecessary []
data = [map(int,x) for x in data] # convert to int
print data

>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]

Paulo Almeida · Accepted Answer · 2016-04-05 21:08:46Z

0

This will get you all the groups, provided the string is always well formed:

re.findall('(\w+):(\d+):(\d+):(\d+)', s)

It also gets the time, which you can easily remove from the list.

Or you can use a dictionary comprehension to organize the items:

matches = re.findall('(\w+):(\d+:\d+:\d+)', s)
my_dict = {k : v.split(':') for k, v in matches[1:]}

I used matches[1:] here to get rid of the spurious match. You can do that if you know it will always be there.

edited Apr 5, 2016 at 21:08

answered Apr 5, 2016 at 20:53

Paulo Almeida

8,09030 silver badges36 bronze badges

Collectives™ on Stack Overflow

python - looping through values in a string

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related