Converting a list of strings in a numpy array in a faster way

Question

br is the name of a list of strings that goes like this:

['14 0.000000 -- (long term 0.000000)\n',
 '19 0.000000 -- (long term 0.000000)\n',
 '22 0.000000 -- (long term 0.000000)\n',
...

I am interested in the first two columns, which I would like to convert to a numpy array. So far, I've come up with the following solution:

x = N.array ([0., 0.])
for i in br:
    x = N.vstack ( (x, N.array (map (float, i.split ()[:2]))) )

This results into having a 2-D array:

array([[  0.,   0.],
       [ 14.,   0.],
       [ 19.,   0.],
       [ 22.,   0.],
...

However, since br is rather big (~10^5 entries), this procedure takes some time. I was wondering, is there a way to accomplish the same result, but in less time?

sunetos · Accepted Answer · 2011-08-31 16:47:31Z

4

This is dramatically faster for me:

import numpy as N

br = ['14 0.000000 -- (long term 0.000000)\n']*50000
aa = N.zeros((len(br), 2))

for i,line in enumerate(br):
    al, strs = aa[i], line.split(None, 2)[:2]
    al[0], al[1] = float(strs[0]), float(strs[1])

Changes:

Preallocate the numpy array (this is big). You already know you want a 2-dimensional array with particular dimensions.
Only split() for the first 2 columns, since you don't want the rest.
Don't use map(): it's slower than list comprehensions. I didn't even use list comprehensions, since you know you only have 2 columns.
Assign directly into the preallocated array instead of generating new temp arrays as you iterate.

answered Aug 31, 2011 at 16:47

sunetos

3,5181 gold badge25 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

steabert Over a year ago

aa = numpy.array([x.split(' ',2)[0:2] for x in br], dtype='float')

Jir Over a year ago

Good to know about enumerate: I wasn't aware of it! Also thanks @steabert to his contribution. The speeds of both solutions seem quite similar to me.

Simon Bergot · Accepted Answer · 2011-08-31 16:32:29Z

2

You can try to preprocess (with awk for exemple) the list of strings if they come from a file, and use numpy.fromtxt. If you can't do anything about the way you get this list, you have several possibilities:

give up. You will run this function once a day. You don't care about speed, and your actual solution is good enough
write an IO plugin with cython. You have a big potential gain because you will be able to do all the loops in c, and affects directly the values in a big (10^5, 2) numpy ndarray
try another language to fix your problem. If using languages such as c or haskell, you may use ctypes to call the functions compiled in a dll from python

edit

maybe this approach is slightly faster:

def conv(mysrt):
    return map(float, mystr.split()[:2])

br_float = map(conv, br)
x = N.array(br_float)

edited Aug 31, 2011 at 16:32

answered Aug 31, 2011 at 16:26

Simon Bergot

10.6k7 gold badges41 silver badges59 bronze badges

1 Comment

Jir Over a year ago

Liked the 'out-of-the-box' thinking!

unutbu · Accepted Answer · 2011-08-31 16:31:37Z

1

Changing

map (float, i.split()[:2])

to

map (float, i.split(' ',2)[:2])

might result in a slight speedup. Since you only care about first two space-separated items in each line there is no need to split the entire line. The 2 in i.split(' ',2) tells split to just make a maximum of 2 splits. For example,

In [11]: x='14 0.000000 -- (long term 0.000000)\n' 

In [12]: x.split()
Out[12]: ['14', '0.000000', '--', '(long', 'term', '0.000000)']

In [13]: x.split(' ',2)
Out[13]: ['14', '0.000000', '-- (long term 0.000000)\n']

answered Aug 31, 2011 at 16:31

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

1 Comment

Jir Over a year ago

Thanks for the explanation of the second argument of split!

Collectives™ on Stack Overflow

Converting a list of strings in a numpy array in a faster way

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related