Vectorizing a pixel-averaging operation in Numpy

Question

I am reading from a file containing some segments (irregular parcels of the image) and trying to average the entire segment to have one pixel value. This is the code I use:

band = band[:,:,0] #take the first band of the image 

for i in range(numSegments): #for every segment
    tx = band[segments==i]   #select all the pixels in segment
    avg = np.average(tx)     #average the values
    band[segments==i] = avg  #write the average back into the image

I am omiting some transformation steps and code for printing running time from the snippet.

This takes quite sometime to run for even one band. Almost 1000 seconds. I was wondering if there is a way to vectorize this operation to make it faster?

Data:
Segment2009: an image of all the segments in the image.
This is what the segments look like:

Bands: 3000x3000 pixels, float32 values.

Full context:

workFolder = '/home/shaunak/Work/ChangeDet_2016/SLIC/003_lee_m0_alpha'
bandlist=os.path.join(workFolder,'bandlist.txt')
configfile = os.path.join(workFolder,'config.txt')

segmentfile = os.path.join(workFolder,'Segments2009')

#%% Load the bands -- can refer to subfolders in the bandlist
files = utilities.readBandList(bandlist)
destinations = []
for f in files:
    destinations.append(f.split('.')[0]+"_SP."+f.split('.')[1])


(lines,samples,bands) = utilities.readConfigImSizeBand(configfile)
#%% Superpixel file
segments = np.fromfile(segmentfile,dtype='float32')
segments = np.reshape(segments,(lines,samples))
numSegments = int(np.max(segments))

#%% simple avg
for idx,f in enumerate(files):
    band = np.fromfile(f,dtype='float32').reshape((lines,samples))
    start = time.time()
    for i in range(numSegments):
        tx = band[segments==i]
        avg = np.average(tx)
        band[segments==i] = avg

    band.tofile(destinations[idx])

I am writing the values back to the original after averaging. It is not necessary, also not the most expensive part -- and helps me visualize the results better so I kept it in. I used the following approach also:

avgOut = np.zeros((numSegments,bands),dtype='float32')
#avgOutJoined = np.zeros((lines,samples,bands),dtype='float32')

for i in range(numSegments):
    tx = band[segments==i]
    avgOut[i,:] = np.average(tx,axis=0)
#    avgOutJoined[segments==i,:] = np.average(tx,axis=0)


np.tofile(outputSeperated,avgOut)       
#np.tofile(outputJoined,avgOutJoined)

Since not writing the averaged result back did not save much time, I kept it in.

You migth be able to get what you are looking for by replacing the whole snippet by np.apply_along_axis(np.average, 3, band). But is it trully your whole code? I mean, there is lack of context here. How can you be sure this is your bottleneck? — 301_Moved_Permanently
– 301_Moved_Permanently, Commented Oct 27, 2016 at 12:58
@MathiasEttinger This is the whole code. The only other line I omitted was ` tx = tx + 8.7` (a constant). I will look into the apply_along_axis -- but I need to filter by that section map (segments) -- and I don't think this is the solution -- will check it out anyway. — shaunakde
– shaunakde, Commented Oct 27, 2016 at 14:05
@MathiasEttinger - No doesn't work :( -- I need to basically mask the array before averaging. — shaunakde
– shaunakde, Commented Oct 27, 2016 at 14:06
Anyway, it would help to get a more accurate description of the data. Like the code that read the file and an excerpt of said file. — 301_Moved_Permanently
– 301_Moved_Permanently, Commented Oct 27, 2016 at 19:26
apply_along_axis under the covers, uses Python iteration. It just make iteration over multiple axes (all but the chosen one) easier. — hpaulj
– hpaulj, Commented Oct 27, 2016 at 23:42

Gareth Rees · Accepted Answer · 2016-10-28 15:00:20Z

3

The function scipy.ndimage.measurements.mean takes a labelled array and computes the mean of the values at each label. So instead of:

for i in range(numSegments):
    tx = band[segments==i]
    avg = np.average(tx)
    band[segments==i] = avg

you can write:

segment_mean = scipy.ndimage.measurements.mean(band, segments, range(numSegments))
band = segment_mean[segments]

Though I should add that the last operation here (the reconstruction of band) seems quite wasteful to me: all the information you need is already in the array segments and the array segment_mean (which has just one value per segment). Why do you then need to reconstruct the full array, filling each segment with its mean value? Could you not refactor the subsequent processing to use segments and segment_mean directly?

Update: you clarified the question to explain that writing the mean values back to band was just for visualization and is not an essential part of your application. In that case, you just need the one call to scipy.ndimage.measurements.mean.

edited Oct 28, 2016 at 15:00

answered Oct 28, 2016 at 10:10

Gareth Rees

50.1k3 gold badges130 silver badges211 bronze badges

\$\begingroup\$ Yes. I added my full code to also show what else I tried. Even doing it without the reconstruction was quite time consuming. Will give what you wrote a try. \$\endgroup\$

shaunakde
– shaunakde

2016-10-28 14:46:58 +00:00
Commented Oct 28, 2016 at 14:46
\$\begingroup\$ Down from hundreds of seconds to a couple, thanks! i would love to explore how this function works to speed this up. \$\endgroup\$

shaunakde
– shaunakde

2016-10-28 15:52:56 +00:00
Commented Oct 28, 2016 at 15:52
\$\begingroup\$ @shaunakde: Works for me, but obviously I have a different segments array. What is the error? \$\endgroup\$

Gareth Rees
– Gareth Rees

2016-10-28 16:03:32 +00:00
Commented Oct 28, 2016 at 16:03
\$\begingroup\$ Nevermind. It was my stupidity. Thanks a lot for everything! \$\endgroup\$

shaunakde
– shaunakde

2016-10-28 16:11:25 +00:00
Commented Oct 28, 2016 at 16:11

Add a comment |

hpaulj · Accepted Answer · 2016-10-28 09:44:05Z

0

Try replacing

 band[segments==i] = avg

With

tx[:] = avg

Eliminating a redundant masking operation.

Does the shape of tx differ with i? In other words tell us something about segments==i. Can you precalculate it for all i, so you don't have to repeat it for each band1 file? Maybe applywhere` so it's indexes rather than boolean mask?

edited Oct 28, 2016 at 9:44

answered Oct 28, 2016 at 9:21

hpaulj

1,5611 gold badge9 silver badges16 bronze badges

\$\begingroup\$ Each segment has a different number of pixels, so this solution will not work \$\endgroup\$

shaunakde
– shaunakde

2016-10-28 14:53:56 +00:00
Commented Oct 28, 2016 at 14:53

Add a comment |

Stack Exchange Network

Vectorizing a pixel-averaging operation in Numpy

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Vectorizing a pixel-averaging operation in Numpy

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions