1

I use the Spyder IDE. Usually, when I am running non-parallelized scripts, I tend to debug using print statements. Depending on which statements are printed (or not), I can see where errors are occurring.

For example:

print "Started while loop..."
doWhileLoop = False
while doWhileLoop == True:
    print "Doing something important!"
    time.sleep(5)
print "Finished while loop..."

Above, I am missing a line that changes doWhileLoop to False at some point, so I will be stuck perpetually in the while loop, but my print statements let me see where it is in my code that I have hung up.

However, when running scripts that are parallelized, I get no output to the console until after the process has finished. Normally, what I do in this case is attempt to debug with a single process (i.e. temporarily deparallelize the program by running only one task, for instance), but currently, I am dealing with an error that seems to occur only when I am running more than one task.

So, I am having trouble figuring out what this error is using my usual methods -- how should I change my usual debugging practice in order to efficiently debug scripts employing multiprocessing?

1
  • 2
    Debugging parallel things is hard. You kind of just have to do it on a case-by-case basis. As for your print statements not printing until the process is done, that's probably Spyder doing some stdout buffering that doesn't otherwise happen. You can do sys.stdout.flush() after every print statement, or use a different environment. Commented May 26, 2014 at 22:08

1 Answer 1

1

Like @roippi said, debugging parallel things is hard. Another tool is using logging over print. Logging gives you severity, timestamps, and most importantly which process is doing something.

Example code:

import logging, multiprocessing, Queue

def myproc(arg):
    return arg*2

def worker(inqueue, outqueue):
    mylog = multiprocessing.get_logger()
    mylog.info('start')
    for job in iter(inqueue.get, 'STOP'):
        mylog.info('got %s', job)
        try:
            outqueue.put( myproc(job), timeout=1 )
        except Queue.Full:
            mylog.error('queue full!')

    mylog.info('done')

def executive(inqueue):
    total = 0
    mylog = multiprocessing.get_logger()
    for num in iter(inqueue.get, 'STOP'):
        total += num
        mylog.info('got {}\ttotal{}', job, total)

logger = multiprocessing.log_to_stderr(
    level=logging.INFO,
)
logger.info('setup')

inqueue, outqueue = multiprocessing.Queue(), multiprocessing.Queue()
if 0:                           # debug 'queue full!' issues
    outqueue = multiprocessing.Queue(maxsize=1)
# prefill with 3 jobs
for num in range(3):
    inqueue.put(num)
# signal end of jobs
inqueue.put('STOP')

worker_p = multiprocessing.Process(
    target=worker, args=(inqueue, outqueue),
    name='worker',
)
worker_p.start()

worker_p.join()

logger.info('done')

Example output:

[INFO/MainProcess] setup
[INFO/worker] child process calling self.run()
[INFO/worker] start
[INFO/worker] got 0
[INFO/worker] got 1
[INFO/worker] got 2
[INFO/worker] done
[INFO/worker] process shutting down
[INFO/worker] process exiting with exitcode 0
[INFO/MainProcess] done
[INFO/MainProcess] process shutting down
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.