Numpy max slow when applied to list of arrays

Question

I carry out some computations to obtain a list of numpy arrays. Subsequently, I would like to find the largest values along the first axis. My current implementation (see below) is very slow and I would like to find alternatives.

Original

pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
dominant = np.max(matrix, axis = 0)

Revision 1: This implementation is faster (~10x; presumably because numpy does not need to figure out the shape of the array)

pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
matrix = np.vstack(matrix)
dominant = np.max(matrix, axis = 0)

I ran a couple of tests and the slowdown seems to be due to an internal conversion of the list of arrays to a numpy array

 Timer unit: 1e-06 s
 Total time: 1.21389 s
 Line # Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 4                                           def direct_max(list_of_arrays):
 5      1000      1213886   1213.9    100.0      np.max(list_of_arrays, axis = 0)

 Total time: 1.20766 s
 Line # Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 8                                           def numpy_max(list_of_arrays):
 9      1000      1151281   1151.3     95.3      list_of_arrays = np.array(list_of_arrays)
10      1000        56384     56.4      4.7      np.max(list_of_arrays, axis = 0)

Total time: 0.15437 s
Line # Hits         Time  Per Hit   % Time  Line Contents
==============================================================
12                                           @profile
13                                           def stack_max(list_of_arrays):
14      1000       102205    102.2     66.2      list_of_arrays = np.vstack(list_of_arrays)
15      1000        52165     52.2     33.8      np.max(list_of_arrays, axis = 0)

Is there any way to speed up the max function or is it possible to populate a numpy array efficiently with the results of my calculation such that max is fast?

The fastest way would be to start in the first place with a 2d numpy array instead of a list of arrays. If the lists have different lengths, just pad with -inf or nan. — Bitwise
– Bitwise, Commented Apr 10, 2013 at 18:16
@mgilson: The items themselves are key-value-pairs of the form (key: some hashable type, value: numpy array) — Till Hoffmann
– Till Hoffmann, Commented Apr 10, 2013 at 18:19
@Bitwise: Yes, that would be ideal. However, I need to process the items sequentially. What is the best way of doing that with a numpy array? — Till Hoffmann
– Till Hoffmann, Commented Apr 10, 2013 at 18:31

HYRY · Accepted Answer · 2013-04-12 01:47:47Z

6

You can use reduce(np.maximum, matrix), here is a test:

import numpy as np
np.random.seed(0)

N, M = 1000, 1000
matrix = [np.random.rand(N) for _ in xrange(M)]

%timeit np.max(matrix, axis = 0)
%timeit np.max(np.vstack(matrix), axis = 0)
%timeit reduce(np.maximum, matrix)

The result is:

10 loops, best of 3: 116 ms per loop
10 loops, best of 3: 10.6 ms per loop
100 loops, best of 3: 3.66 ms per loop

Edit

`argmax()' is more difficult, but you can use a for loop:

def argmax_list(matrix):
    m = matrix[0].copy()
    idx = np.zeros(len(m), dtype=np.int)
    for i, a in enumerate(matrix[1:], 1):
        mask = m < a
        m[mask] = a[mask]
        idx[mask] = i
    return idx

It's still faster than argmax():

%timeit np.argmax(matrix, axis=0)
%timeit np.argmax(np.vstack(matrix), axis=0)
%timeit argmax_list(matrix)

result:

10 loops, best of 3: 131 ms per loop
10 loops, best of 3: 21 ms per loop
100 loops, best of 3: 13.1 ms per loop

edited Apr 12, 2013 at 1:47

answered Apr 11, 2013 at 1:43

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Till Hoffmann Over a year ago

That's great. One more question: Do you have a suggestion how to emulate the behaviour of np.argmax using the same methodology?

Collectives™ on Stack Overflow

Numpy max slow when applied to list of arrays

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related