0

I need to parallelize a Python for loop in which for every iteration, a function (that takes two arguments) is called that returns two results, and then these results are appended to two different lists. The for loop iterates over two lists of arguments.

So say I have the following code:

def my_f(a, b):
    res1 = a + b
    res2 = a * b
    return res1, res2
    
# lists of arguments
args1 = [1, 2, 3, 4]  
args2 = [5, 6, 7, 8]
    
res_list1, res_list2 = [], []
for i in range(len(args1)):  # loop to parallelize
    res1, res2 = my_f(args1[i], args2[i])
    res_list1.append(res1)
    res_list2.append(res2)

The result should be

res_list1 = [6, 8, 10, 12]
res_list2 = [5, 12, 21, 32]

How would I go about making it run in parallel?

I am aware that in C/C++ one can just use #pragma omp for to obtain a parallel for. Is there anything similar in Python?

I am using python 3.8.5 on Linux, but I need to have it work on any OS.

2 Answers 2

1

You can use Python's multiprocessing.Pool feature to achieve your result. Here's the link from the docs (https://docs.python.org/3/library/multiprocessing.html#using-a-pool-of-workers) However, instead of using map, you are going to want to use starmap because you are passing more than one argument. Here is how I would do it:

from multiprocessing import Pool

def my_f(a, b):
    res1 = a + b
    res2 = a * b
    return res1, res2
   

if __name__ == '__main__':
    args1 = [1, 2, 3, 4]  
    args2 = [5, 6, 7, 8]
        
    
    res = []
    with Pool(processes=4) as pool:
        res = pool.starmap(my_f, zip(args1,args2))

    res_list1 = [r[0] for r in res]
    res_list2 = [r[1] for r in res]

Firstly, notice that the main code is in the if __name__ == '__main__': block. This is super important to Python parallelism because Python will actually create new processes and not threads. Anything in the if block will only be run by the main process.

Secondly, I converted your two lists into a single iterable using the zip method. This is important because the starmap function must have the arguments in the form of a tuple.

Finally, the last few lines convert the res list into two lists like your example had. That is because the res output is actually a list of tuples.

Sign up to request clarification or add additional context in comments.

Comments

1

An alternative approach using concurrent.future so you can easily switch between ProcessPoolExecutor and ThreadPoolExecutor in case your workload will change in the future:

from concurrent.futures import ProcessPoolExecutor
  

def worker(args):
    a, b = args
    res1 = a + b
    res2 = a * b

    return res1, res2


def main():
    args1 = [1, 2, 3, 4]
    args2 = [5, 6, 7, 8]

    with ProcessPoolExecutor() as executor:
        result = executor.map(worker, zip(args1, args2))

    a, b = map(list, zip(*result))

    print(a, b)


if __name__ == "__main__":
    main()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.