1

I have a multi-nested for loop and I'd like to parallelize this as much as possible, in Python.

Suppose I have some arbitrary function, which accepts two arguments func(a,b) and I'd like to compute this function on all combinations of M and N.

What I've done so far is 'flatten' the indices as a dictionary

idx_map = {} 
count = 0
for i in range(n):
    for j in range(m):
        idx_map[count] = (i,j)
        count += 1 

Now that my nested loop is flattened, I can use it like so:

arr = []
for idx in range(n*m):
    i,j = idx_map[idx]
    arr.append( func(M[i], N[j]) )  

Can I use this with Python's built in multi-Processing to parallelize? Race conditions should not be an issue because I do not need to aggregate func calls; rather, I just want to arrive at some final array, which evaluates all func(a,b) combinations across M and N. (So Async behavior and complexity should not be relevant here.)

What's the best way to accomplish this effect?

I see from this related question but I don't understand what the author was trying to illustrate.

if 1:   # multi-threaded
    pool = mp.Pool(28) # try 2X num procs and inc/dec until cpu maxed
    st = time.time()
    for x in pool.imap_unordered(worker, range(data_Y)):
        pass
    print 'Multiprocess total time is %4.3f seconds' % (time.time()-st)
    print
6
  • There is no possible race condition with multiprocessing as long as you do not explicitly use shared data which is not the case here (unless hidden in func). This is because multiprocessing use processes (heavy) and not threads (light). Note that Python code is generally interpreted using CPython so it is generally 10-100 slower than native code unless func spend most of its time in optimized native modules. If func mostly uses pure-Python codes, then it is generally more efficient to vectorize it first (ie. use native code). It avoid wasting 28 additional cores. Commented Jan 6, 2023 at 0:08
  • The # multi-threaded comment is thus not correct. The "best way" is not really well defined (does it means fastest, pythonic, shortest, etc.?). It is also likely dependent of the content of func: if func is IO bound (or mostly use Numpy on large arrays), then using multiple threads can be possibly faster and more convenient. Besides, what did you do not specifically understand from the code (and many multiprocessing example, and this post as well as the standard documentation)? Commented Jan 6, 2023 at 0:17
  • By vectorize, would you recommend using ‘map(func, idx_map)’ ? Commented Jan 6, 2023 at 0:32
  • No, map just call the Python function N times on each item, but func is still certainly not vectorized itself (hard to know without the code). I recommend re-writing func so it is a native call or only few of them (regarding your performance needs) and using efficient modules in the first place. Commented Jan 6, 2023 at 0:52
  • So what do you mean by vectorize? Commented Jan 6, 2023 at 3:15

1 Answer 1

1

You can accomplish this yes, however the amount of work you are doing per function call needs to be quite substantial to overcome the overhead of the processes.

Vectorizing using something like numpy is typically easier, like Jérôme stated previously.

I have altered your code so that you may observe the speed up you get by using multiprocessing.

Feel free to change the largNum variable to see how as the amount of work increases per function call the scaling for multiprocessing gets better and how at low values multiprocessing is actually slower.

from concurrent.futures import ProcessPoolExecutor
import time

# Sums n**2 of a+b
def costlyFunc(theArgs):
    a=theArgs[0]
    b=theArgs[1]
    topOfRange=(a+b)**2

    sum=0
    for i in range(topOfRange):
        sum+=i

    return sum
            
#changed to list
idx_map = []
largNum=200

# Your indicey flattening
for i in range(largNum):
    for j in range(largNum):
        idx_map.append((i,j))

I use the map function in the single core version to call costlyFunc on every element in the list. Python's concurrent.futures module also has a similar map function, however it distributes it over multiple processes.

if __name__ == "__main__":
    # No multiprocessing
    oneCoreTimer=time.time()
    result=[x for x in map(costlyFunc,idx_map)]
    oneCoreTime=time.time()-oneCoreTimer
    print(oneCoreTime," seconds to complete the function without multiprocessing")

    # Multiprocessing
    mpTimer=time.time()
    with ProcessPoolExecutor() as ex:
        mpResult=[x for x in ex.map(costlyFunc,idx_map)]
    mpTime=time.time()-mpTimer
    print(mpTime," seconds to complete the function with multiprocessing")

    print(f"Multiprocessing is {oneCoreTime/mpTime} times faster than using one core")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.