1

I'm generating a few filters for use with implementing Fast Fourier Transform blur and sharpen operations on images. The filters are correctly generated, however the computations last very long.

The way I'm currently generating the filters is by iterating over the dimensions of the desired filter, item by item. I understand that I need to use Numpy in order to solve this problem, but I can't figure out how exactly. Here is my code for generating the Gaussian filter:

def gaussian_filter(mode, size, cutoff):
    filterImage = np.zeros(size, np.float64)    
    cutoffTerm = 2 * (cutoff ** 2)
    v = np.asarray([size[0] // 2, size[1] // 2])

    for px in range(0, size[0]):
        for py in range(0, size[1]):
            u = np.asarray([px, py])
            Duv = np.linalg.norm(u - v)
            distance = -1 * (Duv ** 2)
            result = pow(np.e, distance / cutoffTerm)
            if mode == 'low':
                filterImage.itemset((px, py), result)
            elif mode == 'high':
                filterImage.itemset((px, py), 1 - result)

    return filterImage

Generating the filter of size 1920 x 1080 takes 70.36 seconds, which is completely unacceptable. Any ideas would be much appreciated.

1 Answer 1

2

Here's a vectorized one leveraging broadcasting -

def gaussian_filter_vectorized(mode, size, cutoff):
    cutoffTerm = 2 * (cutoff ** 2)
    v = np.asarray([size[0] // 2, size[1] // 2])
    
    I,J = np.ogrid[:size[0],:size[1]]
    p,q = I-v[0],J-v[1]
    Dsq = p**2 + q**2
    d = -1 * Dsq
    R = np.power(np.e,d/cutoffTerm)
    if mode == 'low':
        return R
    elif mode == 'high':
        return 1-R

Timings on big size's -

In [80]: N = 100
    ...: %timeit gaussian_filter(mode='low', size=(N,N), cutoff=N)
    ...: %timeit gaussian_filter_vectorized(mode='low', size=(N,N), cutoff=N)
10 loops, best of 3: 65.2 ms per loop
1000 loops, best of 3: 225 µs per loop

In [81]: N = 1000
    ...: %timeit gaussian_filter(mode='low', size=(N,N), cutoff=N)
    ...: %timeit gaussian_filter_vectorized(mode='low', size=(N,N), cutoff=N)
1 loop, best of 3: 6.5 s per loop
10 loops, best of 3: 29.8 ms per loop

200x+ speedups!

Leverage numexpr on large data computations for further perf. boost

When working with large data, we can also use numexpr module that supports multi-core processing if the intended operations could be expressed as arithmetic ones. To solve our case, we can replace the steps at : Dsq = p**2 + q**2 and R = np.power(np.e,d/cutoffTerm) with numexpr equivalent ones, using numexpr.evaluate function.

So, we would end up with something like this -

import numexpr as ne

def gaussian_filter_vectorized_numexpr(mode, size, cutoff):
    cutoffTerm = 2 * (cutoff ** 2)

    I,J = np.ogrid[:size[0],:size[1]]
    v0,v1 = size[0] // 2, size[1] // 2
    p,q = I-v0,J-v1    
    E = np.e
    if mode == 'low':
        return ne.evaluate('E**(-1*(p**2+q**2)/cutoffTerm)')
    elif mode == 'high':
        return ne.evaluate('1-E**(-1*(p**2+q**2)/cutoffTerm)')

Timings on 1920x1080 sized image -

In [2]: M,N=1920,1080
   ...: %timeit gaussian_filter(mode='low', size=(M,N), cutoff=N)
   ...: %timeit gaussian_filter_vectorized(mode='low', size=(M,N), cutoff=N)
   ...: %timeit gaussian_filter_vectorized_numexpr(mode='low',size=(M,N),cutoff=N)
1 loop, best of 3: 13.9 s per loop
10 loops, best of 3: 63.3 ms per loop
100 loops, best of 3: 9.48 ms per loop

Close to 1500x speedup here!

This was with 8 threads. Thus, with more number of threads available for compute, it should improve further. Related post on how to control multi-core functionality.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.