1

I have a program that generates say a million numpy arrays of size 784, and I want to save them on a file as they are being generated (so only one array is kept in memory at any time). I tried the code below that seems to hold up when n_arrays is in the order of 10^5 (memory usage goes up by about 400MB but then drops back and keeps doing so until finished). When 10^6 however memory usage goes up until it hits the limit and throws MemoryError. Is there any way to accomplish this?

import numpy as np


def generator(n):
    num = 0
    while num < n:
        yield np.array(range(784))
        num += 1


class StreamArray(list):
    def __init__(self, n=0):
        super().__init__()
        self.n = n
        self.len = 1

    def __iter__(self):
        return generator(self.n)

    def __len__(self):
        return self.len


n_arrays = 10**6
np.save('out', StreamArray(n_arrays))

1 Answer 1

2
def generator(n):
    num = 0
    while num < n:
        yield np.array(range(784))
        num += 1


n_arrays = 10**6
with open('out.npy', 'wb') as f:
    for item in generator(n_arrays):
        np.save(f, item)
Sign up to request clarification or add additional context in comments.

7 Comments

That saves the bytes of every array in one file but loading it only shows the first (i.e. len(np.load('out.npy')) should print 10^6 but prints 784 instead)
with open('out.py', 'rb') as f: np.load(f,) first one, np.load(f) second one and so on... Don't close the file.
And then? Where do I use np.load()?
with open('out.npy', 'rb') as f: print(len(np.load(f))) still gives 784.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.