Python, Numpy: all UNIQUE combinations of a numpy.array() vector

Question

I want to get all unique combinations of a numpy.array vector (or a pandas.Series). I used itertools.combinations but it's very slow. For an array of size (1000,) it takes many hours. Here is my code using itertools (actually I use combination differences):

def a(array):
    temp = pd.Series([])
    for i in itertools.combinations(array, 2):
        temp = temp.append(pd.Series(np.abs(i[0]-i[1])))
    temp.index=range(len(temp))
    return temp

As you see there is no repetition!! The sklearn.utils.extmath.cartesian is really fast and good but it provides repetitions which I do not want! I need help rewriting above function without using itertools and much more speed for large vectors.

Is this what you're after: stackoverflow.com/questions/1208118/… if so I will close as a duplicate — EdChum
– EdChum, Commented Nov 9, 2015 at 15:35
No because It doesn't provide unique combinations! itertools itself is very slow! — kasra545
– kasra545, Commented Nov 9, 2015 at 15:38
ensure the array is unque to start and mask out values which are equal (to themselves / maybe add them back in as a special case)? — Andy Hayden
– Andy Hayden, Commented Nov 9, 2015 at 15:43
If you mean vector array, it is unique because it's elements are random float64 numbers. Actually I do not understand your suggestion! — kasra545
– kasra545, Commented Nov 9, 2015 at 15:51
Another possible solution(s) to your problem is stackoverflow.com/questions/11144513/… — Sergey Bushmanov
– Sergey Bushmanov, Commented Nov 9, 2015 at 16:02

Rory Yorke · Accepted Answer · 2015-11-09 16:08:34Z

3

You could take the upper triangular part of a matrix formed on the Cartesian product with the binary operation (here subtraction, as in your example):

import numpy as np
n = 3
a = np.random.randn(n)
print(a)
print(a - a[:, np.newaxis])
print((a - a[:, np.newaxis])[np.triu_indices(n, 1)])

gives

[ 0.04248369 -0.80162228 -0.44504522]
[[ 0.         -0.84410597 -0.48752891]
 [ 0.84410597  0.          0.35657707]
 [ 0.48752891 -0.35657707  0.        ]]
[-0.84410597 -0.48752891  0.35657707]

with n=1000 (and output piped to /dev/null) this runs in 0.131s on my relatively modest laptop.

answered Nov 9, 2015 at 16:08

Rory Yorke

2,23614 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jaroslav Bezděk · Accepted Answer · 2022-03-24 16:33:01Z

2

For a random array of ints:

import numpy as np
import pandas as pd
import itertools as it

b = np.random.randint(0, 8, ((6,)))
# array([7, 0, 6, 7, 1, 5])
pd.Series(list(it.combinations(np.unique(b), 2)))

it returns:

0    (0, 1)
1    (0, 5)
2    (0, 6)
3    (0, 7)
4    (1, 5)
5    (1, 6)
6    (1, 7)
7    (5, 6)
8    (5, 7)
9    (6, 7)
dtype: object

edited Mar 24, 2022 at 16:33

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Nov 9, 2015 at 15:55

Lee

31.4k31 gold badges124 silver badges187 bronze badges

4 Comments

kasra545 Over a year ago

Good and fast. But the order of combinations has changed. According to your random array I want something with this order: (7,0) (7,6) (7,1) (7,7) (7,5) (0,6) (0,7) and so on. so it's better to delete np.unique(). Thanks for your kind help :)

NealWalters Over a year ago

Assuming pd = pandas, what is it? Missing imports?

dk-na Over a year ago

import itertools as it is missing.

Lee Over a year ago

@dk-na good point, updated

Collectives™ on Stack Overflow

Python, Numpy: all UNIQUE combinations of a numpy.array() vector

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related