0

Due to performance issue, i would like to run in parallel my function in python :

import multiprocessing as mp

source_nodes = [10413173,    10414530,   10414530,   10437199]
sink_nodes =  [10420346,     10438770,   10438711,   10414530,   10436258]
path =[]    


def createpath(source,sink):
    for i in source:
        for j in sink:
            path = path + list(nx.all_simple_paths(Directed_G,i,j))
    return path

From my understanding i must give 1 iterable to apply function. but my idea was to do something like :

results = [pool.apply(createpath, args=(source_nodes, sink_nodes))]

And then don't give any iterable object to applyfunction I managed to get it work, but i don't think it run on parallel.

Do you think i should include the apply function inside the first loop ?

4
  • If you want to run your loop in parallel, you could try to use tool similar to openMP in C++, e.g., Pymp. This allows you to give each core its portion of iterations and all core the loop in prallel. You should use reduction to get you result. Commented Nov 19, 2019 at 13:15
  • Oh, thanks but I would like to avoid installing any others external tool, knowing that I’m running this code on a virtual machine managed by an other department. Commented Nov 19, 2019 at 13:23
  • 1
    So then, maybe you could try to achieve something similar, by giving range of loop in argument. Then each core would process only its own part of the whole loop. Commented Nov 19, 2019 at 13:34
  • Thanks for your comment, can you develop a little bit more your thought, I reckon it’s a bit hard to understand for me Commented Nov 19, 2019 at 13:39

1 Answer 1

2
from multiprocessing import Pool


source_nodes = [1,2,3,4,5,6]
sink_nodes =  [1,1,1,1,1,1,1,1,1]


def sum_values(parameter_tuple):
    source,sink, start, stop = parameter_tuple
    out = 0
    for i in range(start, stop):
        val_i = source[i]
        for j in sink:
            out += val_i*j
    return out

if __name__ == "__main__":
    params = (source_nodes, sink_nodes, 0, 6)
    print(sum_values(params))
    with Pool(2) as p:
        print(p.map(sum_values, [
            (source_nodes, sink_nodes, 0, 3),
            (source_nodes, sink_nodes, 3, 6),
        ]))

You can try to run this one. This runs parallel with map pattern on pool of 2 threads. In this case your output result is the sum of result of each process from pool.

Sign up to request clarification or add additional context in comments.

1 Comment

This code does not assemble the requested list-of-lists, generated for all source-sink paths ( neither a pair of partial lists, each one per a Pool-member process, to be later joined ), as the O/P has requested. Also the scaling matters, try to validate your code's scaling somewhere above 1E9 by 1E9 source-sink graphs to see the costs of doing the job in separated processes and post the corrected code and its scaling properties benchmarked

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.