I am reading from a file containing some segments (irregular parcels of the image) and trying to average the entire segment to have one pixel value. This is the code I use:
band = band[:,:,0] #take the first band of the image
for i in range(numSegments): #for every segment
tx = band[segments==i] #select all the pixels in segment
avg = np.average(tx) #average the values
band[segments==i] = avg #write the average back into the image
I am omiting some transformation steps and code for printing running time from the snippet.
This takes quite sometime to run for even one band. Almost 1000 seconds. I was wondering if there is a way to vectorize this operation to make it faster?
Data:
Segment2009: an image of all the segments in the image.
This is what the segments look like:
Bands: 3000x3000 pixels, float32 values.
Full context:
workFolder = '/home/shaunak/Work/ChangeDet_2016/SLIC/003_lee_m0_alpha'
bandlist=os.path.join(workFolder,'bandlist.txt')
configfile = os.path.join(workFolder,'config.txt')
segmentfile = os.path.join(workFolder,'Segments2009')
#%% Load the bands -- can refer to subfolders in the bandlist
files = utilities.readBandList(bandlist)
destinations = []
for f in files:
destinations.append(f.split('.')[0]+"_SP."+f.split('.')[1])
(lines,samples,bands) = utilities.readConfigImSizeBand(configfile)
#%% Superpixel file
segments = np.fromfile(segmentfile,dtype='float32')
segments = np.reshape(segments,(lines,samples))
numSegments = int(np.max(segments))
#%% simple avg
for idx,f in enumerate(files):
band = np.fromfile(f,dtype='float32').reshape((lines,samples))
start = time.time()
for i in range(numSegments):
tx = band[segments==i]
avg = np.average(tx)
band[segments==i] = avg
band.tofile(destinations[idx])
I am writing the values back to the original after averaging. It is not necessary, also not the most expensive part -- and helps me visualize the results better so I kept it in. I used the following approach also:
avgOut = np.zeros((numSegments,bands),dtype='float32')
#avgOutJoined = np.zeros((lines,samples,bands),dtype='float32')
for i in range(numSegments):
tx = band[segments==i]
avgOut[i,:] = np.average(tx,axis=0)
# avgOutJoined[segments==i,:] = np.average(tx,axis=0)
np.tofile(outputSeperated,avgOut)
#np.tofile(outputJoined,avgOutJoined)
Since not writing the averaged result back did not save much time, I kept it in.

np.apply_along_axis(np.average, 3, band). But is it trully your whole code? I mean, there is lack of context here. How can you be sure this is your bottleneck? \$\endgroup\$apply_along_axisunder the covers, uses Python iteration. It just make iteration over multiple axes (all but the chosen one) easier. \$\endgroup\$