0

I have some memory intensive image filters that I want to call block by block on large images/arrays (because they compute the filter for an entire array, they run out of memory when trying to compute the whole array).

def block_process(Ic, blocksize):
    B = numpy.empty(Ic.shape)

    colstart = 0
    while colstart < Ic.shape[1]:
        BlockWidth = blocksize
        if (colstart + blocksize) > Ic.shape[1]:
            BlockWidth = Ic.shape[1] - colstart
        rowstart = 0
        while rowstart < Ic.shape[0]:
            BlockHeight = blocksize
            if (rowstart + blocksize) > Ic.shape[0]:
                BlockHeight = Ic.shape[0] - rowstart

            B[colstart:colstart+BlockWidth, rowstart:rowstart+BlockHeight] = filter1(params) # One of many available filters

            rowstart += BlockHeight
        colstart += BlockWidth
    return B # The complete filtered array

My filters are computed in other functions i.e. def filter1(A, filtsize), def filter2(A, filtsize, otherparam), which have an A parameter (the input array, given by the block function), and other parameters such as filter size. Some filters have more parameters than others. They return the filtered array.

Two questions

  1. How do I go about calling one of my filter functions through the block_process function? I don't want to copy the block processing code into each function. In other words, is there a way of specifying which filter to call (and with what parameters) as a parameter of the block_process() call?
  2. Is there a better way of coding this?

1 Answer 1

2

You can do it like this:

def block_process(a, blocksize, filt, args):
    b = numpy.empty(a.shape)
    for row in xrange(0, a.shape[0], blocksize):
        for col in xrange(0, a.shape[1], blocksize):
            b[row:row + blocksize, col:col + blocksize] = (
                filt(a[row:row + blocksize, col:col + blocksize], *args))
    return b

There is no need to correct for incomplete blocks at the right and lower edge of the image -- this will happen automatically. You can simply pass in the filter function and the tuple of arguments. To call the filter filter1(a, filtsize), use

block_process(a, blocksize, filter1, (filtsize,))

The code above assumes that the first parameter to a filter is the array to be filtered and that a filter returns a filtered array of the same shape.

It is also possible that your filters can be rewritten in a way that they don't use as much memory, so the blockprocessing would become unnecessary.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks, this is great. My filters are running on large (5000x5000) arrays and require keeping 3-4 additional arrays of the same size in memory, which is why they are running out of memory but that is for another question...
@Benjamin: often the number of additional arrays needed can be reduced -- this may indeed be for another question :)
@Benjamin - On a side note, another advantage to doing things in chunks is that it makes easy to spread the calculation across multiple processors using multiprocessing. So, even if the memory problems are fixed later, it can be useful to chunk things like this.
Simple threading often works as well, since most of the compuationally intensive NumPy stuff runs outside the GIL.
@Joe Kington: I posted my filter stackoverflow.com/questions/4959171/… if you are interested.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.