I carry out some computations to obtain a list of numpy arrays. Subsequently, I would like to find the largest values along the first axis. My current implementation (see below) is very slow and I would like to find alternatives.
Original
pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
dominant = np.max(matrix, axis = 0)
Revision 1: This implementation is faster (~10x; presumably because numpy does not need to figure out the shape of the array)
pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
matrix = np.vstack(matrix)
dominant = np.max(matrix, axis = 0)
I ran a couple of tests and the slowdown seems to be due to an internal conversion of the list of arrays to a numpy array
Timer unit: 1e-06 s
Total time: 1.21389 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 def direct_max(list_of_arrays):
5 1000 1213886 1213.9 100.0 np.max(list_of_arrays, axis = 0)
Total time: 1.20766 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 def numpy_max(list_of_arrays):
9 1000 1151281 1151.3 95.3 list_of_arrays = np.array(list_of_arrays)
10 1000 56384 56.4 4.7 np.max(list_of_arrays, axis = 0)
Total time: 0.15437 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
12 @profile
13 def stack_max(list_of_arrays):
14 1000 102205 102.2 66.2 list_of_arrays = np.vstack(list_of_arrays)
15 1000 52165 52.2 33.8 np.max(list_of_arrays, axis = 0)
Is there any way to speed up the max function or is it possible to populate a numpy array efficiently with the results of my calculation such that max is fast?
items?