Sharing and editing numpy array in python multiprocessing

Question

I was experimenting on numpy with multiprocessing in python I've read numerous tutorials and stackoverflow answers. I wrote a code :

from multiprocessing import Process, Array
import numpy as np

def main():
    im_arr = np.array([[1,2,3],[4,6,7]])
    print('Array in main before process:',im_arr)

    shape = im_arr.shape
    size = im_arr.size
    im_arr.shape = size
    arr = Array('B', im_arr)   
    p = Process(target=fun, args=(arr,shape))
    p.start()
    p.join()

    arr = np.frombuffer(arr.get_obj(), dtype=np.uint8)
    arr.shape = shape
    print('Array in main after process:',arr)

def fun(a, shape):
    a = np.frombuffer(a.get_obj(), dtype=np.uint8)
    a.shape = shape

    a[0][0] = 10
    a = np.array([[0,0,0],[0,0,0]])
    a[0][0] = 5

    print('Array inside function:',a)
    a.shape = shape[0]*shape[1]

if __name__ == '__main__':
    main()

What i hoped to do was to share a numpy array and to edit the array in another process while the change can also be observed in main program. But the output i get is as follows

('Array in main before process:', array([[1, 2, 3],
       [4, 6, 7]]))
('Array inside function:', array([[5, 0, 0],
       [0, 0, 0]]))
('Array in main after process:', array([[10,  2,  3],
       [ 4,  6,  7]], dtype=uint8))

it seems like 'a' in the function behaves like a new independent object after the numpy array is saved to it.

Please correct what i'm doing wrong.

user7138814 · Accepted Answer · 2020-09-01 19:24:45Z

2

it seems like 'a' in the function behaves like a new independent object after the numpy array is saved to it.

Well, this is partly true. With np.array([[0,0,0],[0,0,0]]) you create a new independent object and then with a = attach the label a to it. From then on the label a doesn't point to the shared array anymore.

If you want to save a new array in the shared memory you can use

a[...] = np.array([[0,0,0],[0,0,0]])

(This is actually valid syntax, ... is called the ellipsis literal)

edited Sep 1, 2020 at 19:24

answered Jan 13, 2018 at 10:42

user7138814

2,05112 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

John Zwinck · Accepted Answer · 2018-01-13 09:37:35Z

2

I suggest using memory mapping for this. First, create your array in one of the processes:

im_arr = np.array([[1,2,3],[4,6,7]])

Then, save it to disk:

np.save('im_arr.npy', im_arr)

Then, load it in each process, with mode='r+' so you can modify it:

im_arr = np.load('im_arr.npy', 'r+')

Now the contents will be visible to both processes at all times.

answered Jan 13, 2018 at 9:37

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

2 Comments

PunyCode Over a year ago

is np.save() faster than cpickleing..?

John Zwinck Over a year ago

@AnuragJk: In general, yes. Try it and see.

Collectives™ on Stack Overflow

Sharing and editing numpy array in python multiprocessing

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related