The basic problem is getting j unique elements for 1000 rows. We can't use np.random.choice(.....replace=True) directly there as then we won't have j unique elements. To solve our case, one vectorized approach would be to use a random matrix of shape (1000,len(input_array)), perform argsort along the second axis and get j unique indices per row, then index into the input array with it and finally sum along the second axis.
To implement it, we would have two approaches -
def app1(serie1, j, N=1000):
idx = np.random.rand(N,serie1.size).argsort(1)[:,:j]
return serie1[idx].sum(1)
Using efficient np.argpartition for selecting random j elements and then np.take for efficient indexing -
def app2(serie1, j, N=1000):
idx = np.random.rand(N,serie1.size).argpartition(j,axis=1)[:,:j]
return np.take(serie1, idx).sum(1)
Sample run to demo creating the indices idx -
In [35]: serie1 = np.random.randint(0,9,(20))
In [36]: idx = np.random.rand(1000,serie1.size).argsort(1)[:,:5]
In [37]: idx
Out[37]:
array([[16, 13, 19, 0, 15],
[ 7, 4, 13, 15, 14],
[ 8, 3, 15, 1, 9],
...,
[11, 15, 17, 4, 19],
[19, 0, 3, 7, 9],
[10, 1, 19, 12, 6]])
Verifying uniform random sampling -
In [81]: serie1 = np.arange(20)
In [82]: j = 5
In [83]: idx = np.random.rand(1000000,serie1.size).argsort(1)[:,:j]
In [84]: np.bincount(idx.ravel())
Out[84]:
array([250317, 250298, 250645, 249544, 250396, 249972, 249492, 250512,
249968, 250133, 249622, 250170, 250291, 250060, 250102, 249446,
249398, 249003, 250249, 250382])
Having fairly equal counts across the length of 20 elems in the input array, I think its pretty uniformly distributed.
Runtime test -
In [140]: serie1 = np.random.randint(0,9,(20))
In [141]: j = 5
# @elcombato's soln
In [142]: %timeit [sum(sample(serie1, j)) for _ in range(1000)]
100 loops, best of 3: 10.7 ms per loop
# Posted solutions in this post
In [143]: %timeit app1(serie1, j, N=1000)
...: %timeit app2(serie1, j, N=1000)
...:
1000 loops, best of 3: 943 µs per loop
1000 loops, best of 3: 870 µs per loop