Parallel Processing in Python with nested loop

Question

Due to performance issue, i would like to run in parallel my function in python :

import multiprocessing as mp

source_nodes = [10413173,    10414530,   10414530,   10437199]
sink_nodes =  [10420346,     10438770,   10438711,   10414530,   10436258]
path =[]    


def createpath(source,sink):
    for i in source:
        for j in sink:
            path = path + list(nx.all_simple_paths(Directed_G,i,j))
    return path

From my understanding i must give 1 iterable to apply function. but my idea was to do something like :

results = [pool.apply(createpath, args=(source_nodes, sink_nodes))]

And then don't give any iterable object to applyfunction I managed to get it work, but i don't think it run on parallel.

Do you think i should include the apply function inside the first loop ?

If you want to run your loop in parallel, you could try to use tool similar to openMP in C++, e.g., Pymp. This allows you to give each core its portion of iterations and all core the loop in prallel. You should use reduction to get you result. — Marek Justyna
– Marek Justyna, Commented Nov 19, 2019 at 13:15
Oh, thanks but I would like to avoid installing any others external tool, knowing that I’m running this code on a virtual machine managed by an other department. — Benjamin Lenglet
– Benjamin Lenglet, Commented Nov 19, 2019 at 13:23
So then, maybe you could try to achieve something similar, by giving range of loop in argument. Then each core would process only its own part of the whole loop. — Marek Justyna
– Marek Justyna, Commented Nov 19, 2019 at 13:34
Thanks for your comment, can you develop a little bit more your thought, I reckon it’s a bit hard to understand for me — Benjamin Lenglet
– Benjamin Lenglet, Commented Nov 19, 2019 at 13:39

Marek Justyna · Accepted Answer · 2019-11-19 14:06:19Z

2

from multiprocessing import Pool


source_nodes = [1,2,3,4,5,6]
sink_nodes =  [1,1,1,1,1,1,1,1,1]


def sum_values(parameter_tuple):
    source,sink, start, stop = parameter_tuple
    out = 0
    for i in range(start, stop):
        val_i = source[i]
        for j in sink:
            out += val_i*j
    return out

if __name__ == "__main__":
    params = (source_nodes, sink_nodes, 0, 6)
    print(sum_values(params))
    with Pool(2) as p:
        print(p.map(sum_values, [
            (source_nodes, sink_nodes, 0, 3),
            (source_nodes, sink_nodes, 3, 6),
        ]))

You can try to run this one. This runs parallel with map pattern on pool of 2 threads. In this case your output result is the sum of result of each process from pool.

answered Nov 19, 2019 at 14:06

Marek Justyna

3343 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3666197 Over a year ago

This code does not assemble the requested list-of-lists, generated for all source-sink paths ( neither a pair of partial lists, each one per a Pool-member process, to be later joined ), as the O/P has requested. Also the scaling matters, try to validate your code's scaling somewhere above 1E9 by 1E9 source-sink graphs to see the costs of doing the job in separated processes and post the corrected code and its scaling properties benchmarked

Collectives™ on Stack Overflow

Parallel Processing in Python with nested loop

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related