python numpy access list of arrays without for loop

Question

I have a list of arrays with variable length. I have something like that:

a=[np.array([0, 3, 4]), np.array([1, 8]), np.array([2, 5, 7]), np.array([6])]

And would like to extract from all arrays that contain more than one value all values but the first one. It is quite straight forward to do it in a for-loop but I would highly appreciate to know how to do it without a for loop to save time. My for-loop is like that:

duplicate_pos = []
    for i in range(len(a)):
        if len(a[i]) > 1:
            duplicate_pos.append(a[i][1:])

Thx a lot.

PS: Even though this is the first question I ever ask here, stackoverflow is my daily science companion since I started my PhD several years ago. Thx to this amazing community.

Despite your love for stackoverflow, this question better suites codereview... — Francesco
– Francesco, Commented Apr 10, 2016 at 13:42
Why do you think that doing it without for loop, it will save you time?...Did you try to profile it? — Iron Fist
– Iron Fist, Commented Apr 10, 2016 at 14:13
How to do numpy tasks without a loop is a very common type of SO question. It belongs here. — hpaulj
– hpaulj, Commented Apr 10, 2016 at 15:15

user2390182 · Accepted Answer · 2016-04-10 13:52:38Z

2

You can use a combination of filter (to get rid of shorties) and map (to slice):

b = map(lambda li: li[1:], filter(lambda li: len(li) > 1, a))

# [array([3, 4]), array([8]), array([5, 7])]

In Python3, b is a map object which can be listified like any other iterable via list(b). In Python2, map returns a list.

edited Apr 10, 2016 at 13:52

answered Apr 10, 2016 at 13:45

user2390182

73.7k6 gold badges71 silver badges95 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Iron Fist Over a year ago

Combining filter or map with lambda is really discouraged and does not make any performance improvement; most probably just the opposite, did you profile it against OP's method?

user2390182 Over a year ago

No, I did not and I do not think it does. It is however the only suggested way to avoid using the for keyword which is kind of indicated by the OP ;) Under the hood this cannot be done without a loop anyway (I would strongly assume).

Zorgmorduk · Accepted Answer · 2016-04-10 13:56:37Z

2

you can do that in one line as follows:

duplicate_pos = [i[1:] for i in a if len(i) != 1]

answered Apr 10, 2016 at 13:56

Zorgmorduk

1,3852 gold badges16 silver badges32 bronze badges

Comments

Mr. E · Accepted Answer · 2016-04-10 13:40:39Z

1

You can use a comprehension list:

duplicate_pos = [subarray[1:] for subarray in a if len(subarray)>1]

Or if you are going to use the values only once you could use a generator

duplicate_pos = (subarray[1:] for subarray in a if len(subarray)>1)

answered Apr 10, 2016 at 13:40

Mr. E

2,13014 silver badges24 bronze badges

Comments

MSeifert · Accepted Answer · 2016-04-10 15:11:48Z

1

In case you want to use pure numpy to solve this problem:

Numpy supports multidimensional arrays and has very fast reduce-like functions. But numpy requires multidimensional arrays to have a constant length in each dimension. So you could (not necessarily should) use a masked-array to solve this problem:

>>> a=[[0., 3, 4], [1, 8, np.nan], [2, 5, 7], [6, np.nan, np.nan]] # nan to fill the rows
>>> b = np.ma.masked_invalid(a)
>>> b
masked_array(
 data =
   [[0.0 3.0 4.0]
    [1.0 8.0 --]
    [2.0 5.0 7.0]
    [6.0 -- --]],
 mask =
   [[False False False]
    [False False  True]
    [False False False]
    [False  True  True]],
 fill_value = 1e+20)

To discard all rows only containing less than 2 elements use count (counts unmasked values in this case) followed by a boolean indexing:

>>> b[np.ma.count(b, axis=1) > 1][:,1:]
masked_array(
 data =
   [[3.0 4.0]
    [8.0 --]
    [5.0 7.0]],
 mask =
   [[False False]
    [False  True]
    [False False]],
 fill_value = 1e+20)

I've included the intermediate steps here:

>>> np.ma.count(b, axis=1)
array([3, 2, 3, 1], dtype=int64)
>>> np.ma.count(b, axis=1) > 1
array([ True,  True,  True, False], dtype=bool)
>>> b[np.ma.count(b, axis=1) > 1]
masked_array(
 data =
   [[0.0 3.0 4.0]
    [1.0 8.0 --]
    [2.0 5.0 7.0]],
 mask =
   [[False False False]
    [False False  True]
    [False False False]],
 fill_value = 1e+20)

answered Apr 10, 2016 at 15:11

MSeifert

154k41 gold badges356 silver badges377 bronze badges

2 Comments

hpaulj Over a year ago

Using nan turns the integer arrays into floats.

MSeifert Over a year ago

@hpaulj - This is just to illustrate how it could be done. It was just convenient to use np.ma.masked_invalid to create the masked array. In practice this would be rather done by some kind of preprocessing. The important point is just the b[np.ma.count(b, axis=1) > 1][:,1:]-line which replaces the for-loop.

hpaulj · Accepted Answer · 2016-04-10 16:06:11Z

Since the list contains numpy arrays I suspect you are hoping to replace the loop with a numpy operation, not just another form of Python iteration. That can speed things up by moving the iteration to compiled code. For small arrays it isn't faster because of a numpy overhead.

In this case you are starting with a list, not a 2d array, and the list contains arrays of varying size. That's a good indicator that there isn't a pure numpy solution.

A cleaner version of your loop is (no need to use a index)

def foo(a):
   b=[]
   for i in a:
       if i.shape[0]>1:   # use len(i) if i might be a list
           b.append(i[1:])
   return b

But this is expressed nicely as a list comprehension

[i[1:] for i in a if i.shape[0]>1]

In timeit tests, this is 50% faster than the for loop. But the test case is so small I wouldn't put too much stock in the time differences.

I expect the other iterators - generators, maps, itertools - will time about the same. Others are welcome to elaborate on times.

i[1:] runs ok on a 1 (or 0) element array, so you might not need the if test. Or you could filter out empty arrays in another iteration. For small lists, the iteration choice is usually a matter of style, what expresses the task most clearly to the reader, rather than a matter of time.

If the subarrays were all the same length, or possibly padded with something like -1, you could combine them into a 2d array, and select from that

A = np.vstack(a)
A[:,1:]

But vstack iterates on the list, turning each sub array into a 2d array before applying concatenate. That alone makes it slower than the list solutions.

Thx. I put the solved here because of the additional info provided.

Collectives™ on Stack Overflow

python numpy access list of arrays without for loop

5 Answers 5

2 Comments

Comments

Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related