I have around 40k rows and I want to test all kinds of selection combinations on the rows. By selection I mean boolean masks. The number of masks/filters is around 250MM.
The current simplified code:
np_arr = np.random.randint(1, 40000, 40000)
results = np.empty(250000000)
filters = np.random.randint(1, size=(250000000, 40000))
for i in range(250000000):
row_selection = np_arr[filters[i].astype(np.bool_)] # Select rows based on next filter
# Performing simple calculations such as sum, prod, count on selected rows and saving to result
results[i] = row_selection.sum() # Save simple calculation result to results array
I tried Numba and Multiprocessing, but since most of the processing is in the filter selection rather than the computation, that doesn't help much.
What would be the most efficient way to solve this? Is there any way to parallelize this? As far as I see I need to loop through each filter to then individually calculate the sum, prod, count etc because I can't apply filters in parallel (even though the calculations after applying the filters are very simple).
Appreciate any suggestions on performance improvement/speedup.