Recommendations for workflow when debugging Python scripts employing multiprocessing?

Question

I use the Spyder IDE. Usually, when I am running non-parallelized scripts, I tend to debug using print statements. Depending on which statements are printed (or not), I can see where errors are occurring.

For example:

print "Started while loop..."
doWhileLoop = False
while doWhileLoop == True:
    print "Doing something important!"
    time.sleep(5)
print "Finished while loop..."

Above, I am missing a line that changes doWhileLoop to False at some point, so I will be stuck perpetually in the while loop, but my print statements let me see where it is in my code that I have hung up.

However, when running scripts that are parallelized, I get no output to the console until after the process has finished. Normally, what I do in this case is attempt to debug with a single process (i.e. temporarily deparallelize the program by running only one task, for instance), but currently, I am dealing with an error that seems to occur only when I am running more than one task.

So, I am having trouble figuring out what this error is using my usual methods -- how should I change my usual debugging practice in order to efficiently debug scripts employing multiprocessing?

Debugging parallel things is hard. You kind of just have to do it on a case-by-case basis. As for your print statements not printing until the process is done, that's probably Spyder doing some stdout buffering that doesn't otherwise happen. You can do sys.stdout.flush() after every print statement, or use a different environment. — roippi
– roippi, Commented May 26, 2014 at 22:08

johntellsall · Accepted Answer · 2014-05-30 21:57:29Z

Like @roippi said, debugging parallel things is hard. Another tool is using logging over print. Logging gives you severity, timestamps, and most importantly which process is doing something.

Example code:

import logging, multiprocessing, Queue

def myproc(arg):
    return arg*2

def worker(inqueue, outqueue):
    mylog = multiprocessing.get_logger()
    mylog.info('start')
    for job in iter(inqueue.get, 'STOP'):
        mylog.info('got %s', job)
        try:
            outqueue.put( myproc(job), timeout=1 )
        except Queue.Full:
            mylog.error('queue full!')

    mylog.info('done')

def executive(inqueue):
    total = 0
    mylog = multiprocessing.get_logger()
    for num in iter(inqueue.get, 'STOP'):
        total += num
        mylog.info('got {}\ttotal{}', job, total)

logger = multiprocessing.log_to_stderr(
    level=logging.INFO,
)
logger.info('setup')

inqueue, outqueue = multiprocessing.Queue(), multiprocessing.Queue()
if 0:                           # debug 'queue full!' issues
    outqueue = multiprocessing.Queue(maxsize=1)
# prefill with 3 jobs
for num in range(3):
    inqueue.put(num)
# signal end of jobs
inqueue.put('STOP')

worker_p = multiprocessing.Process(
    target=worker, args=(inqueue, outqueue),
    name='worker',
)
worker_p.start()

worker_p.join()

logger.info('done')

Example output:

[INFO/MainProcess] setup
[INFO/worker] child process calling self.run()
[INFO/worker] start
[INFO/worker] got 0
[INFO/worker] got 1
[INFO/worker] got 2
[INFO/worker] done
[INFO/worker] process shutting down
[INFO/worker] process exiting with exitcode 0
[INFO/MainProcess] done
[INFO/MainProcess] process shutting down

Collectives™ on Stack Overflow

Recommendations for workflow when debugging Python scripts employing multiprocessing?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related