vectorize a loop using numpy

Question

I am working on image file. Need help in converting following loop in vectorize form

for i in range(height):
  for j in range(width):
    print(j,i)
    start1[i, j, 0] = -0.5 + j / (width - 1)
    start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
    start1[i, j, 2] = 0

yatu · Accepted Answer · 2020-10-30 11:00:21Z

You can leverage NumPy`s broadcasting to avoid looping here:

# example image
from sklearn.datasets import load_sample_image
china = load_sample_image('china.jpg')

china.shape
# (427, 640, 3)

height, width, _ = china.shape

vec = china.copy()
i = np.arange(height)
j = np.arange(width)
vec[..., 0] = -0.5 + j / (width - 1)
vec[..., 1] = (-0.5 + i[:,None] / (height - 1)) * height / width
vec[..., 2] = 0

Checking we get the same:

op = china.copy()
for i in range(height):
  for j in range(width):
    op[i, j, 0] = -0.5 + j / (width - 1)
    op[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
    op[i, j, 2] = 0

np.array_equal(op, vec)
# True

Timing comparisson (up to a 558x speedup!):

%%timeit
for i in range(height):
  for j in range(width):
    op[i, j, 0] = -0.5 + j / (width - 1)
    op[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
    op[i, j, 2] = 0
# 365 ms ± 27.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
vec[..., 0] = -0.5 + j / (width - 1)
vec[..., 1] = (-0.5 + i[:,None] / (height - 1)) * height / width
vec[..., 2] = 0
# 654 µs ± 33.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Arty · Accepted Answer · 2020-10-30 11:38:53Z

Down below I provide two solution, one is using just numpy, another one using numpy+numba, both can be installed using python -m pip install numpy numba.

Try next code online!

import numpy as np

# -------- Version 1, vectorized ----------

height, width = 2, 4
start1 = np.zeros((height, width, 3), dtype = np.float64)
start1[:, :, 0] = -0.5 + np.arange(width)[None, :] / (width - 1)
start1[:, :, 1] = (-0.5 + np.arange(height)[:, None] / (height - 1)) * height / width
print(start1)

# -------- Version 2, vectorized ----------

import numba

@numba.njit(cache = True)
def compute_start1(height, width):
    start1 = np.zeros((height, width, 3), dtype = np.float64)
    for i in range(height):
        for j in range(width):
            start1[i, j, 0] = -0.5 + j / (width - 1)
            start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
            start1[i, j, 2] = 0
    return start1

print(compute_start1(2, 4))

Output:

[[[-0.5        -0.25        0.        ]
  [-0.16666667 -0.25        0.        ]
  [ 0.16666667 -0.25        0.        ]
  [ 0.5        -0.25        0.        ]]

 [[-0.5         0.25        0.        ]
  [-0.16666667  0.25        0.        ]
  [ 0.16666667  0.25        0.        ]
  [ 0.5         0.25        0.        ]]]


[[[-0.5        -0.25        0.        ]
  [-0.16666667 -0.25        0.        ]
  [ 0.16666667 -0.25        0.        ]
  [ 0.5        -0.25        0.        ]]

 [[-0.5         0.25        0.        ]
  [-0.16666667  0.25        0.        ]
  [ 0.16666667  0.25        0.        ]
  [ 0.5         0.25        0.        ]]]

I provided Numba solution because it is a nice package, it allows you to make very fast code just of regular python code. No need even to think about how to implement your code as numpy functions. Almost any quite simple code can be boosted by numba, up to 50-200x times.

Numba is a Just-In-Time compiler that converts python code to C++ and then to fast machine code, this boosts original code by around 100x times on average! It is as fast and even sometimes faster than using NumPy. Also it is very closely related to numpy, it supports all numpy functions inside and even can parallelize them by providing parallel = True argument to njit function decorator, see this jit documentation for reference.

In most cases in order to vectorize and boost 100x times your code you just need to add @numba.njit line before your function and you're done. Of cause Numba can compile to pure fast C++ not any code, but most of quite simple algorithms involving a lot of loops/conditions/etc and numerical and/or numpy operations can be compiled and boosted by Numba.

Next is time measuring code for all three solutions (one non-vectorized and two vectorized). Needs installing one time modules by python -m pip install numpy numba timerit.

Try it online!

import numpy as np

# -------- Version 0, non-vectorized ----------

def f0(height, width):
    start1 = np.zeros((height, width, 3), dtype = np.float32)
    for i in range(height):
        for j in range(width):
            start1[i, j, 0] = -0.5 + j / (width - 1)
            start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
    return start1

# -------- Version 1, vectorized ----------

def f1(height, width):
    start1 = np.zeros((height, width, 3), dtype = np.float32)
    start1[:, :, 0] = -0.5 + np.arange(width)[None, :] / (width - 1)
    start1[:, :, 1] = (-0.5 + np.arange(height)[:, None] / (height - 1)) * height / width
    return start1

# -------- Version 2, vectorized ----------

import numba

@numba.njit(cache = True, fastmath = True)
def f2(height, width):
    start1 = np.zeros((height, width, 3), dtype = np.float32)
    for i in range(height):
        for j in range(width):
            start1[i, j, 0] = -0.5 + j / (width - 1)
            start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
    return start1

# -------- Time measuring ----------

from timerit import Timerit
Timerit._default_asciimode = True

h, w = 256, 512
ra, rt = None, None
for f in [f0, f1, f2]:
    print(f'{f.__name__}: ', end = '', flush = True)
    tim = Timerit(num = 15, verbose = 1)
    for t in tim:
        a = f(h, w)
    if ra is None:
        ra, rt = a, tim.mean()
    else:
        t = tim.mean()
        assert np.allclose(a, ra)
        print(f'speedup {round(rt / t, 3)}x')

Output:

f0: Timed best=159.324 ms, mean=159.855 +- 0.4 ms
f1: Timed best=1.212 ms, mean=1.257 +- 0.0 ms
speedup 127.178x
f2: Timed best=1.294 ms, mean=1.310 +- 0.0 ms
speedup 122.065x

Thanks for solution. Can you provide some reference material about vectorization
@KetanChaudhari Just added time measurements for my solutions, they give almost same 125x times speedup! I didn't use any reference material for my code, just taken all my knowledge from head. Don't even know what to suggest, I gained all knowledge by just Googling.
@KetanChaudhari Regarding Numba I can tell next thing, basically almost always you just add @numba.njit line before your function and done, you're boosted 100x times! See last time measuring code in my answer, look at f0 and f2, it is totally same function code just with @numba.njit line added before and f2 runs 125x times faster than f0 just because of one line. You may read about jit/njit decorator in this documentation.

Collectives™ on Stack Overflow

vectorize a loop using numpy

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest