5

So let's say I have a 2d array. How can I apply a function to every single item in the array and replace that item with the return? Also, the function's return will be a tuple, so the array will become 3d.

Here is the code in mind.

def filter_func(item):
    if 0 <= item < 1:
        return (1, 0, 1)
    elif 1 <= item < 2:
        return (2, 1, 1)
    elif 2 <= item < 3:
        return (5, 1, 4)
    else:
        return (4, 4, 4)

myarray = np.array([[2.5, 1.3], [0.4, -1.0]])

# Apply the function to an array

print(myarray)

# Should be array([[[5, 1, 4],
#                   [2, 1, 1]],
#                  [[1, 0, 1],
#                   [4, 4, 4]]])

Any ideas how I could do it? One way is to do np.array(list(map(filter_func, myarray.reshape((12,))))).reshape((2, 2, 3)) but that's quite slow, especially when I need to do it on an array of shape (1024, 1024).

I've also seen people use np.vectorize, but it somehow ends up as (array([[5, 2], [1, 4]]), array([[1, 1], [0, 4]]), array([[4, 1], [1, 4]])). Then it has shape of (3, 2, 2).

3 Answers 3

7

No need to change anything in your function.

Just apply the vectorized version of your function to your array and stack the result:

np.stack(np.vectorize(filter_func)(myarray), axis=2)

The result is:

array([[[5, 1, 4],
        [2, 1, 1]],

       [[1, 0, 1],
        [4, 4, 4]]])
Sign up to request clarification or add additional context in comments.

3 Comments

This parameter specifies, along which axis the stacking is to occur (described in Numpy documentation). Try my code with axis == 1 and 0 to see the difference.
Thanks! That's just what I needed. It's about 8 times faster than my first try!
In my timings this vectorize is slower than your list(map...). I've always found vectorize to be slower than plain iteration.
3

Your list-map:

In [4]: np.array(list(map(filter_func, myarray.reshape((4,))))).reshape((2, 2, 3))                   
Out[4]: 
array([[[5, 1, 4],
        [2, 1, 1]],

       [[1, 0, 1],
        [4, 4, 4]]])

A variation using nested list comprehension:

In [5]: np.array([[filter_func(j) for j in row] for row in myarray])                                 
Out[5]: 
array([[[5, 1, 4],
        [2, 1, 1]],

       [[1, 0, 1],
        [4, 4, 4]]])

Using vectorize, the result is one array for each element returned by the function.

In [6]: np.vectorize(filter_func)(myarray)                                                           
Out[6]: 
(array([[5, 2],
        [1, 4]]),
 array([[1, 1],
        [0, 4]]),
 array([[4, 1],
        [1, 4]]))

As @Vladi shows these can be combined with stack (or np.array followed by a transpose):

In [7]: np.stack(np.vectorize(filter_func)(myarray),2)                                               
Out[7]: 
array([[[5, 1, 4],
        [2, 1, 1]],

       [[1, 0, 1],
        [4, 4, 4]]])

Your list-map is fastest. I've never found vectorize to be faster:

In [8]: timeit np.array(list(map(filter_func, myarray.reshape((4,))))).reshape((2, 2, 3))            
17.2 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: timeit np.array([[filter_func(j) for j in row] for row in myarray])                          
20.5 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]: timeit np.stack(np.vectorize(filter_func)(myarray),2)                                       
75.2 µs ± 297 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Taking the np.vectorize(filter_func) out of the timing loop helps just a bit.

frompyfunc is similar to vectorize, but returns object dtype. It usually is faster:

In [29]: timeit np.stack(np.frompyfunc(filter_func, 1,3)(myarray),2).astype(int)                     
28.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Generally if you have a function that only takes scalar inputs, it's hard to do better than simple iteration. vectorize/frompyfunc don't improve on that. Optimal use of numpy requires rewriting the function to work directly with arrays, as @Hammad demonstrates.

Though with this small example, even this proper numpy solution isn't faster. I expect it will scale better:

In [32]: timeit func(myarray)                                                                        
25 µs ± 60.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

1 Comment

The list map took 6.34 seconds on a 1024 by 1024 array, but the vectorize only took 1.18 seconds. Mabye the list-map is better for smaller arrays.
1

you could use this function, with vectorised implementation

def func(arr):
    
    elements = np.array([
        [1, 0, 1],
        [2, 1, 1],
        [5, 1, 4],
        [4, 4, 4],
    ])
    
    arr  = arr.astype(int)
    mask = (arr != 0) & (arr != 1) & (arr != 2)

    arr[mask] = -1
    
    return elements[arr]

you wont be able to rewrite your array because of shape mismatch but you could overwrite the variable myarray

myarray = func(myarray)
myarray

>>>   [[[5, 1, 4],
        [2, 1, 1]],

       [[1, 0, 1],
        [4, 4, 4]]]

1 Comment

Umm how would I do it if I had a function already? I don't really understand what your code does

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.