4

I carry out some computations to obtain a list of numpy arrays. Subsequently, I would like to find the largest values along the first axis. My current implementation (see below) is very slow and I would like to find alternatives.

Original

pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
dominant = np.max(matrix, axis = 0)

Revision 1: This implementation is faster (~10x; presumably because numpy does not need to figure out the shape of the array)

pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
matrix = np.vstack(matrix)
dominant = np.max(matrix, axis = 0)

I ran a couple of tests and the slowdown seems to be due to an internal conversion of the list of arrays to a numpy array

 Timer unit: 1e-06 s
 Total time: 1.21389 s
 Line # Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 4                                           def direct_max(list_of_arrays):
 5      1000      1213886   1213.9    100.0      np.max(list_of_arrays, axis = 0)

 Total time: 1.20766 s
 Line # Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 8                                           def numpy_max(list_of_arrays):
 9      1000      1151281   1151.3     95.3      list_of_arrays = np.array(list_of_arrays)
10      1000        56384     56.4      4.7      np.max(list_of_arrays, axis = 0)

Total time: 0.15437 s
Line # Hits         Time  Per Hit   % Time  Line Contents
==============================================================
12                                           @profile
13                                           def stack_max(list_of_arrays):
14      1000       102205    102.2     66.2      list_of_arrays = np.vstack(list_of_arrays)
15      1000        52165     52.2     33.8      np.max(list_of_arrays, axis = 0)

Is there any way to speed up the max function or is it possible to populate a numpy array efficiently with the results of my calculation such that max is fast?

13
  • What datatype are items? Commented Apr 10, 2013 at 18:00
  • 12
    The fastest way would be to start in the first place with a 2d numpy array instead of a list of arrays. If the lists have different lengths, just pad with -inf or nan. Commented Apr 10, 2013 at 18:16
  • @mgilson: The items themselves are key-value-pairs of the form (key: some hashable type, value: numpy array) Commented Apr 10, 2013 at 18:19
  • @Bitwise: Yes, that would be ideal. However, I need to process the items sequentially. What is the best way of doing that with a numpy array? Commented Apr 10, 2013 at 18:31
  • Can you hint us what processing you are doing? Commented Apr 10, 2013 at 18:32

1 Answer 1

6

You can use reduce(np.maximum, matrix), here is a test:

import numpy as np
np.random.seed(0)

N, M = 1000, 1000
matrix = [np.random.rand(N) for _ in xrange(M)]

%timeit np.max(matrix, axis = 0)
%timeit np.max(np.vstack(matrix), axis = 0)
%timeit reduce(np.maximum, matrix)

The result is:

10 loops, best of 3: 116 ms per loop
10 loops, best of 3: 10.6 ms per loop
100 loops, best of 3: 3.66 ms per loop

Edit

`argmax()' is more difficult, but you can use a for loop:

def argmax_list(matrix):
    m = matrix[0].copy()
    idx = np.zeros(len(m), dtype=np.int)
    for i, a in enumerate(matrix[1:], 1):
        mask = m < a
        m[mask] = a[mask]
        idx[mask] = i
    return idx

It's still faster than argmax():

%timeit np.argmax(matrix, axis=0)
%timeit np.argmax(np.vstack(matrix), axis=0)
%timeit argmax_list(matrix)

result:

10 loops, best of 3: 131 ms per loop
10 loops, best of 3: 21 ms per loop
100 loops, best of 3: 13.1 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

That's great. One more question: Do you have a suggestion how to emulate the behaviour of np.argmax using the same methodology?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.