I have a program that generates say a million numpy arrays of size 784, and I want to save them on a file as they are being generated (so only one array is kept in memory at any time). I tried the code below that seems to hold up when n_arrays is in the order of 10^5 (memory usage goes up by about 400MB but then drops back and keeps doing so until finished).
When 10^6 however memory usage goes up until it hits the limit and throws MemoryError.
Is there any way to accomplish this?
import numpy as np
def generator(n):
num = 0
while num < n:
yield np.array(range(784))
num += 1
class StreamArray(list):
def __init__(self, n=0):
super().__init__()
self.n = n
self.len = 1
def __iter__(self):
return generator(self.n)
def __len__(self):
return self.len
n_arrays = 10**6
np.save('out', StreamArray(n_arrays))