0

I have a for loop, which uses some binary conditions and finally writes a file accordingly. The problem I have is, the conditions are true for many files (sometimes around 1000 files need to be written). So writing them takes a long time (around 10 mins). I know I can somehow use Python's multiprocessing and utilise some of the cores.

This is the code that works, but only uses one core.

for i,n in enumerate(halo_param.strip()):
    mask = var1['halo_id'] == n
    newtbdata = tbdata1[mask]
    hdu = pyfits.BinTableHDU(newtbdata)
    hdu.writeto(('/home/Documments/file_{0}.fits').format(i))

I came across that it can be done using Pool from multiprocessing.

if __name__ == '__main__': pool = Pool(processes=4)

I would like to know how to do it and utilise atleast 4 of my cores.

1 Answer 1

1

Restructure the for loop body as a function, and use Pool.map with the function.

def work(arg):
    i, n = arg
    mask = var1['halo_id'] == n
    newtbdata = tbdata1[mask]
    hdu = pyfits.BinTableHDU(newtbdata)
    hdu.writeto(('/home/Documments/file_{0}.fits').format(i))

if __name__ == '__main__':
    pool = Pool(processes=4)
    pool.map(work, enumerate(halo_param.strip()))
    pool.close()
    pool.join()
Sign up to request clarification or add additional context in comments.

2 Comments

so by saying i,n = arg when we call pool.map the function takes the arg to be enumerate(halo_param.strip()). But how does it loop? I don't see any loops here, but it seems to work!
@ThePredator, multiprocessing.Pool.map works like a builtin function map. map apply the given function for each item of the sequence (or any iterable, the second argument). Follow the link if you want to know more about the map function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.