Python: random matrix from array

Question

I would like to generate from a vector that for simplicity we can call "serie1" another vector of dimension 1000x1 where each element of this new vector is the sum of j random elements of the vector "serie1".

I was thinking about creating a random matrix from the vector of dimension 1000xj and them sum horizontally.

How would you suggest to do in Python?

In order to get a random vector I could do

Vector=np.random.choice(serie1, 1000, replace=True)

but I would not know how to proceed and if there is an efficient solution.

Divakar · Accepted Answer · 2017-02-15 11:29:58Z

1

The basic problem is getting j unique elements for 1000 rows. We can't use np.random.choice(.....replace=True) directly there as then we won't have j unique elements. To solve our case, one vectorized approach would be to use a random matrix of shape (1000,len(input_array)), perform argsort along the second axis and get j unique indices per row, then index into the input array with it and finally sum along the second axis.

To implement it, we would have two approaches -

def app1(serie1, j, N=1000):
    idx = np.random.rand(N,serie1.size).argsort(1)[:,:j]
    return serie1[idx].sum(1)

Using efficient np.argpartition for selecting random j elements and then np.take for efficient indexing -

def app2(serie1, j, N=1000):
    idx = np.random.rand(N,serie1.size).argpartition(j,axis=1)[:,:j]
    return np.take(serie1, idx).sum(1)

Sample run to demo creating the indices idx -

In [35]: serie1 = np.random.randint(0,9,(20))

In [36]: idx = np.random.rand(1000,serie1.size).argsort(1)[:,:5]

In [37]: idx
Out[37]: 
array([[16, 13, 19,  0, 15],
       [ 7,  4, 13, 15, 14],
       [ 8,  3, 15,  1,  9],
       ..., 
       [11, 15, 17,  4, 19],
       [19,  0,  3,  7,  9],
       [10,  1, 19, 12,  6]])

Verifying uniform random sampling -

In [81]: serie1 = np.arange(20)

In [82]: j = 5

In [83]: idx = np.random.rand(1000000,serie1.size).argsort(1)[:,:j]

In [84]: np.bincount(idx.ravel())
Out[84]: 
array([250317, 250298, 250645, 249544, 250396, 249972, 249492, 250512,
       249968, 250133, 249622, 250170, 250291, 250060, 250102, 249446,
       249398, 249003, 250249, 250382])

Having fairly equal counts across the length of 20 elems in the input array, I think its pretty uniformly distributed.

Runtime test -

In [140]: serie1 = np.random.randint(0,9,(20))

In [141]: j = 5

# @elcombato's soln
In [142]: %timeit [sum(sample(serie1, j)) for _ in range(1000)]
100 loops, best of 3: 10.7 ms per loop

# Posted solutions in this post
In [143]: %timeit app1(serie1, j, N=1000)
     ...: %timeit app2(serie1, j, N=1000)
     ...: 
1000 loops, best of 3: 943 µs per loop
1000 loops, best of 3: 870 µs per loop

edited Feb 15, 2017 at 11:29

answered Feb 15, 2017 at 10:09

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Klapaucius Over a year ago

Ok, nice answer but I get ValueError: kth(=1300) out of bounds (800) for everyone of the 1000 simulations I have to sum 1300 elements in my specific case (so j=1300) taken randomly from the array series1 which has 800 elements

Divakar Over a year ago

@Klapaucius So, you are supposed to take 1300 elements out off an array that has 800 elements? So, you are okay with some of those elements being duplicated?

Divakar Over a year ago

@Klapaucius Then, I guess Paul Panzer's solution would be the one to solve your case.

Paul Panzer · Accepted Answer · 2017-02-15 12:27:28Z

1

You are close:

vector = np.random.choice(serie1, (1000, j), replace=True).sum(axis=-1, keepdims=True)

Note that this is with replacement.

For not too large j an acceptance-rejection scheme could be applied to eliminate repeats.

def accept_reject(serie1, j):
    efficiency_ratio = 2 # just a guess
    M = len(serie1)
    accept_rate = np.prod(np.linspace(1-(j-1)/M, 1, j))
    n_draw = int(1000 / accpet_rate + 4 * np.sqrt(1000*(1 - accept_rate)))
    if n_draw * j * efficiency_ratio > 1000 * M:
        return use_other_solution(serie1, j)
    raw = np.random.randint(0, M, (n_draw, j))
    raw.sort(axis=-1)
    raw = raw[np.all(np.diff(raw, axis=-1) > 0, axis=-1), :]
    if len(raw)>1000:
        raw = raw[:1000, :]
    elif len(raw)<1000:
        return use_other_solution(serie1, j)
    return serie1[raw].sum(axis=-1, keepdims=True)

edited Feb 15, 2017 at 12:27

answered Feb 15, 2017 at 10:06

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

3 Comments

Divakar Over a year ago

This would create duplicate indices, right? I thought OP wanted j random elements and by that I assumed j unique elements off serie1.

Paul Panzer Over a year ago

@Divakar well they have a replace=True in their code snippet. Btw. ingenious as it is are you sure your argpartition solution is correct in the sense of sampling correctly?

Divakar Over a year ago

Well OP mnetioned : In order to get a random vector ..but I would not know how to proceed . Don't think that leads us to what was mentioned as the goal of the question. Added a section in my post on the sampling.

elcombato · Accepted Answer · 2017-02-16 12:55:21Z

1

Base Python

from random import sample
vector = [sum(sample(serie1, j)) for _ in range(1000)]

With Numpy to enable replacement

import numpy as np
vector = [sum(np.random.choice(serie1, j, replace=True)) for _ in range(1000)]

edited Feb 16, 2017 at 12:55

answered Feb 15, 2017 at 10:08

elcombato

5131 gold badge4 silver badges17 bronze badges

3 Comments

Klapaucius Over a year ago

I guess that in this case it is not allowed for replacement=true

elcombato Over a year ago

That's right. So with replacement: vector = [sum(np.random.choice(serie1, j, replace=True)) for _ in range(1000)]

Klapaucius Over a year ago

So, if both your solution and Paul's one are both correct these two: Vector = [sum(np.random.choice(serie1, N, replace=True)) for _ in range(1000000)] and Vector = np.random.choice(serie1, (1000000, N), replace=True).sum(axis=-1, keepdims=True) should yield the some result

Collectives™ on Stack Overflow

Python: random matrix from array

3 Answers 3

3 Comments

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related