Using Numpy for more efficient filter generation

Question

I'm generating a few filters for use with implementing Fast Fourier Transform blur and sharpen operations on images. The filters are correctly generated, however the computations last very long.

The way I'm currently generating the filters is by iterating over the dimensions of the desired filter, item by item. I understand that I need to use Numpy in order to solve this problem, but I can't figure out how exactly. Here is my code for generating the Gaussian filter:

def gaussian_filter(mode, size, cutoff):
    filterImage = np.zeros(size, np.float64)    
    cutoffTerm = 2 * (cutoff ** 2)
    v = np.asarray([size[0] // 2, size[1] // 2])

    for px in range(0, size[0]):
        for py in range(0, size[1]):
            u = np.asarray([px, py])
            Duv = np.linalg.norm(u - v)
            distance = -1 * (Duv ** 2)
            result = pow(np.e, distance / cutoffTerm)
            if mode == 'low':
                filterImage.itemset((px, py), result)
            elif mode == 'high':
                filterImage.itemset((px, py), 1 - result)

    return filterImage

Generating the filter of size 1920 x 1080 takes 70.36 seconds, which is completely unacceptable. Any ideas would be much appreciated.

Community · Accepted Answer · 2020-06-20 09:12:55Z

Here's a vectorized one leveraging broadcasting -

def gaussian_filter_vectorized(mode, size, cutoff):
    cutoffTerm = 2 * (cutoff ** 2)
    v = np.asarray([size[0] // 2, size[1] // 2])
    
    I,J = np.ogrid[:size[0],:size[1]]
    p,q = I-v[0],J-v[1]
    Dsq = p**2 + q**2
    d = -1 * Dsq
    R = np.power(np.e,d/cutoffTerm)
    if mode == 'low':
        return R
    elif mode == 'high':
        return 1-R

Timings on big size's -

In [80]: N = 100
    ...: %timeit gaussian_filter(mode='low', size=(N,N), cutoff=N)
    ...: %timeit gaussian_filter_vectorized(mode='low', size=(N,N), cutoff=N)
10 loops, best of 3: 65.2 ms per loop
1000 loops, best of 3: 225 µs per loop

In [81]: N = 1000
    ...: %timeit gaussian_filter(mode='low', size=(N,N), cutoff=N)
    ...: %timeit gaussian_filter_vectorized(mode='low', size=(N,N), cutoff=N)
1 loop, best of 3: 6.5 s per loop
10 loops, best of 3: 29.8 ms per loop

200x+ speedups!

Leverage `numexpr` on large data computations for further perf. boost

When working with large data, we can also use numexpr module that supports multi-core processing if the intended operations could be expressed as arithmetic ones. To solve our case, we can replace the steps at : Dsq = p**2 + q**2 and R = np.power(np.e,d/cutoffTerm) with numexpr equivalent ones, using numexpr.evaluate function.

So, we would end up with something like this -

import numexpr as ne

def gaussian_filter_vectorized_numexpr(mode, size, cutoff):
    cutoffTerm = 2 * (cutoff ** 2)

    I,J = np.ogrid[:size[0],:size[1]]
    v0,v1 = size[0] // 2, size[1] // 2
    p,q = I-v0,J-v1    
    E = np.e
    if mode == 'low':
        return ne.evaluate('E**(-1*(p**2+q**2)/cutoffTerm)')
    elif mode == 'high':
        return ne.evaluate('1-E**(-1*(p**2+q**2)/cutoffTerm)')

Timings on 1920x1080 sized image -

In [2]: M,N=1920,1080
   ...: %timeit gaussian_filter(mode='low', size=(M,N), cutoff=N)
   ...: %timeit gaussian_filter_vectorized(mode='low', size=(M,N), cutoff=N)
   ...: %timeit gaussian_filter_vectorized_numexpr(mode='low',size=(M,N),cutoff=N)
1 loop, best of 3: 13.9 s per loop
10 loops, best of 3: 63.3 ms per loop
100 loops, best of 3: 9.48 ms per loop

Close to 1500x speedup here!

This was with 8 threads. Thus, with more number of threads available for compute, it should improve further. Related post on how to control multi-core functionality.

Collectives™ on Stack Overflow

Using Numpy for more efficient filter generation

1 Answer 1

Leverage `numexpr` on large data computations for further perf. boost

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Leverage numexpr on large data computations for further perf. boost

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Leverage `numexpr` on large data computations for further perf. boost