0

I'm trying to get numerous values out of a pretty complex string that looks like this -

s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'

These are the values I need to scan for -

list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']

My intention is to get the 3 numbers after each string so in the example of HighPriority I would get [0, 74, 74] which I can then do something with each item.

I've used the below but it doesn't account for when the end of the string isn't a comma.

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""


for l in list:
    print l
    print find_between( s, l + ':', ',' ).split(':')
5
  • I think the best method to solve this problem is to learn to use module "re" of the standard lib. Commented Apr 5, 2016 at 20:46
  • yeah my re-fu is horrible. I've tried using re but when I see a block of code like \d\w\++\?\(\) I freeze because it's just not easy for me to read :( Commented Apr 5, 2016 at 20:48
  • Something like r = re.search('Compiler:([0-9]+):([0-9]+):([0-9]+)', s) should get you started. Use r.groups() to get the three substrings containing the numbers. Commented Apr 5, 2016 at 20:54
  • @mkiever I think you mean re.search instead of re.find. Commented Apr 5, 2016 at 20:57
  • @whoisearth if you're desperately trying to avoid the use of regex, which I really don't recommend, you can use takewhile with a ''.join (see my edited answer). Commented Apr 5, 2016 at 21:18

3 Answers 3

2

Edit, if you really want to avoid regexes, your approach works with a minor tweak (I renamed list to l to avoid shadowing the built in type):

from itertools import takewhile
from string import digits

def find_between(s, first):
    try:
        start = s.index(first) + len(first)
        # Keep taking the next character while it's either a ':' or a digit
        # You can also just cast this into a list and forget about joining and later splitting.
        # Also, consider storing ':'+digits in a variable to avoid recreating it all the time
        return ''.join(takewhile(lambda char: char in ':'+digits, s[start:]))
    except ValueError:
        return ""


for _ in l:
    print _
    print find_between(s, _ + ':').split(':')

This prints:

Compiler
['0', '0', '0']
HighPriority
['0', '74', '74']
Default
['6', '1872', '1874']
LowPriority
['0', '2', '2']
Special
['0', '2', '2']
Event
['0', '0', '0']
CommHigh
['0', '1134', '1152']
CommDefault
['0', '4', '4']

However, this really is a task for regex, and you should try to get to know the basics.

import re

def find_between(s, word):
    # Search for your (word followed by ((:a_digit) repeated three times))
    x = re.search("(%s(:\d+){3})" % word, s)
    return x.groups()[0]

for word in l:
    print find_between(s, word).split(':', 1)[-1].split(':')

This prints

['0', '0', '0']
['0', '74', '74']
['6', '1872', '1874']
['0', '2', '2']
['0', '2', '2']
['0', '0', '0']
['0', '1134', '1152']
['0', '4', '4']
Sign up to request clarification or add additional context in comments.

Comments

0

check this:

import re
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
data = []
for x in search:
    data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s))

data = [map(lambda x: x.split(':'), x) for x in data] # remove :
data = [x[0] for x in data] # remove unnecessary []
data = [map(int,x) for x in data] # convert to int
print data

>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]

Comments

0

This will get you all the groups, provided the string is always well formed:

re.findall('(\w+):(\d+):(\d+):(\d+)', s)

It also gets the time, which you can easily remove from the list.

Or you can use a dictionary comprehension to organize the items:

matches = re.findall('(\w+):(\d+:\d+:\d+)', s)
my_dict = {k : v.split(':') for k, v in matches[1:]}

I used matches[1:] here to get rid of the spurious match. You can do that if you know it will always be there.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.