Disable Numpy parallelization inside Numba JIT

Ask Question

Asked 5 months ago

Modified 5 months ago

Viewed 101 times

The problem is illustrated by the following script, which works correctly if MKL is used for linear algebra operations:

from numba import njit, prange
from numpy import random, dot, empty
from threadpoolctl import ThreadpoolController
controller = ThreadpoolController()

numba_parallel=True # True False

@njit(parallel=False)
def internal_dot(a, b):
    return dot(a, b)

@njit(parallel=numba_parallel)
def total_sum(b, c):
    npoints=c.shape[0]
    output=empty((npoints, c.shape[1], b.shape[1]))
    for i in prange(npoints):
        output[i]=internal_dot(c[i], b)
    return output

@controller.wrap(limits=1, user_api='blas')
def safe_total_sum(b, c):
    return total_sum(b, c)

nvecs=256
dim1=256
dim2=256

vector=random.random((dim1, dim2))
matrix=random.random((nvecs, dim2, dim1))

_ = total_sum(vector, matrix)

_ = safe_total_sum(vector, matrix)

However, using it with OpenBLAS leads to warning OpenBLAS warning: precompiled NUM_THREADS exceeded, adding auxiliary array for thread metadata., indicating an oversubscription problem that in my (much more lengthy) usecase crashes the code. I am aware that setting export OPENBLAS_NUM_THREADS=1 solves the issue for this script, but it is not applicable in my usecase since my code calls other Numpy functions elsewhere and needs them parallelized. Using ThreadpoolController does not seem to help; I am also aware of the possibility to pack everything into a .pkl and unpack it in a subprocess environment with OPENBLAS_NUM_THREADS=1, but it'd really prefer to avoid this dirty trick.

Is there a proper Python solution for this problem?

asked Jun 20 at 20:33

Konstantin Karandashev

734 bronze badges

1

For libraries using OpenMP, you can control a bit how nesting works (e.g. using variables like OMP_MAX_ACTIVE_LEVELS or OMP_NESTED and possibly others). That being said, Numba can use different parallel backend, not just OpenMP. It might use Intel TBB on a machine with an intel environment setup. IDK if TBB have a similar feature. It would be great to use OpenMP everywhere to avoid mixing parallel library/frameworks/APIs (IDK what the MKL uses -- possibly the Intel TBB -- or if you can force Numba to use it either).

Jérôme Richard
– Jérôme Richard

2025-06-21 01:12:32 +00:00
Commented Jun 21 at 1:12
Since OpenBLAS uses OpenMP, you can try to set the number of threads dynamically with omp_set_num_threads and reset it later. One possible downside is that the threads might be re-created for each parallel section which can be quite expensive for small work but this cannot be avoided with some OpenMP implementation if you change the number of threads dynamically. Note the above function is a C one from the OpenMP runtime. It can be called from Python using ctypes, cffi, or cython.

Jérôme Richard
– Jérôme Richard

2025-06-21 01:15:37 +00:00
Commented Jun 21 at 1:15
@JérômeRichard One can use the environment variable NUMBA_THREADING_LAYER='omp' to force OMP. numba.pydata.org/numba-doc/dev/user/…

Nick ODell
– Nick ODell

2025-06-21 23:03:17 +00:00
Commented Jun 21 at 23:03
1

@NickODell Great, then nesting can be used to control the issue. However, there is a catch: there is not one single OpenMP implementation but multiple (e.g. libgomp for the GNU environment and libomp for Clang). The OP should be careful to ensure the one linked by OpenBLAS is the same than the one of Numba in this case. Using multiple OpenMP runtime often results in sneaky issues.

Jérôme Richard
– Jérôme Richard

2025-06-22 00:21:52 +00:00
Commented Jun 22 at 0:21
@JérômeRichard , @NickODell - thank you both. OMP_MAX_ACTIVE_LEVELS and OMP_NESTED did not work, but I realized that setting NUMBA_NUM_THREADS to the number of CPUs does help, though I am not sure why (I didn't even notice previously that NUMBA_NUM_THREADS was undefined in the environment because there was no evidence of CPU oversubscription). Once I can test the "problematic" machine again I will additionally check setting NUMBA_THREADING_LAYER, and then post a summary of possible solutions either as an answer to the question or its edit.

Konstantin Karandashev
– Konstantin Karandashev

2025-06-22 11:08:09 +00:00
Commented Jun 22 at 11:08

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Disable Numpy parallelization inside Numba JIT

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest