0

I want to get all unique combinations of a numpy.array vector (or a pandas.Series). I used itertools.combinations but it's very slow. For an array of size (1000,) it takes many hours. Here is my code using itertools (actually I use combination differences):

def a(array):
    temp = pd.Series([])
    for i in itertools.combinations(array, 2):
        temp = temp.append(pd.Series(np.abs(i[0]-i[1])))
    temp.index=range(len(temp))
    return temp

As you see there is no repetition!! The sklearn.utils.extmath.cartesian is really fast and good but it provides repetitions which I do not want! I need help rewriting above function without using itertools and much more speed for large vectors.

5
  • Is this what you're after: stackoverflow.com/questions/1208118/… if so I will close as a duplicate Commented Nov 9, 2015 at 15:35
  • No because It doesn't provide unique combinations! itertools itself is very slow! Commented Nov 9, 2015 at 15:38
  • ensure the array is unque to start and mask out values which are equal (to themselves / maybe add them back in as a special case)? Commented Nov 9, 2015 at 15:43
  • If you mean vector array, it is unique because it's elements are random float64 numbers. Actually I do not understand your suggestion! Commented Nov 9, 2015 at 15:51
  • Another possible solution(s) to your problem is stackoverflow.com/questions/11144513/… Commented Nov 9, 2015 at 16:02

2 Answers 2

3

You could take the upper triangular part of a matrix formed on the Cartesian product with the binary operation (here subtraction, as in your example):

import numpy as np
n = 3
a = np.random.randn(n)
print(a)
print(a - a[:, np.newaxis])
print((a - a[:, np.newaxis])[np.triu_indices(n, 1)])

gives

[ 0.04248369 -0.80162228 -0.44504522]
[[ 0.         -0.84410597 -0.48752891]
 [ 0.84410597  0.          0.35657707]
 [ 0.48752891 -0.35657707  0.        ]]
[-0.84410597 -0.48752891  0.35657707]

with n=1000 (and output piped to /dev/null) this runs in 0.131s on my relatively modest laptop.

Sign up to request clarification or add additional context in comments.

Comments

2

For a random array of ints:

import numpy as np
import pandas as pd
import itertools as it

b = np.random.randint(0, 8, ((6,)))
# array([7, 0, 6, 7, 1, 5])
pd.Series(list(it.combinations(np.unique(b), 2)))

it returns:

0    (0, 1)
1    (0, 5)
2    (0, 6)
3    (0, 7)
4    (1, 5)
5    (1, 6)
6    (1, 7)
7    (5, 6)
8    (5, 7)
9    (6, 7)
dtype: object

4 Comments

Good and fast. But the order of combinations has changed. According to your random array I want something with this order: (7,0) (7,6) (7,1) (7,7) (7,5) (0,6) (0,7) and so on. so it's better to delete np.unique(). Thanks for your kind help :)
Assuming pd = pandas, what is it? Missing imports?
import itertools as it is missing.
@dk-na good point, updated

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.