Sort one array by another with duplicate values?

Question

You can sort two arrays using one as the leading.

arr1inds = lead_arr1.argsort()
sorted_arr1 = lead_arr1[arr1inds]
sorted_arr2 = arr2[arr1inds]

The question is how would you do this if both arrays have duplicate values, and in addition you want to "collapse" the lead-array values and average the arr2 that match it..

F.e. :

 sorted_arr1 = [ ...5,5,5 ...]
 arr2        = [ ...4,7,8 ...]

becomes (4+7+8)/3. = 6.333 :

 sorted_arr1 = [ ...5 ...]
 arr2        = [ ...6.333 ...]

may be it is possible to make it using loop "for i in arr1.unique().sort()" ... but I was wondering if it is possible with pure numpy ?

Would you consider a pandas solution?

Quang Hoang
– Quang Hoang

2019-12-05 22:17:55 +00:00
Commented Dec 5, 2019 at 22:17 — Quang Hoang
– Quang Hoang, Commented Dec 5, 2019 at 22:17
may be ... probably it could be translated to numpy

sten
– sten

2019-12-05 22:19:38 +00:00
Commented Dec 5, 2019 at 22:19 — sten
– sten, Commented Dec 5, 2019 at 22:19

pyropy · Accepted Answer · 2019-12-05 22:23:57Z

1

If you want to sort one array by another using it as a lead you can always zip it and put into sorted function, using key parameter with lambda and tuple unpacking to pick the key.

In example

sorted(zip(arr1, arr2), key=lambda zipped: zipped[0])

In this example you'll use first value from tuple to sort the array.

You can always filter out and unpack the tuples, leaving out two sorted arrays.

answered Dec 5, 2019 at 22:23

pyropy

2532 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Quang Hoang · Accepted Answer · 2019-12-05 22:24:46Z

1

Pandas is very convenient for grouping:

a1 = np.array([1,1,1,5,5,5,3,3])
a2 = np.array([10,11,1,4,7,8,9,10])

s = pd.Series(a2).groupby(a1).transform('mean')

a1[np.argsort(s)]

Output:

array([5, 5, 5, 1, 1, 1, 3, 3])

Or do you want:

s = pd.Series(a2).groupby(a1).mean()

gives

1    7.333333
3    9.500000
5    6.333333
dtype: float64

and s.sort_values() gives

5    6.333333
1    7.333333
3    9.500000
dtype: float64

answered Dec 5, 2019 at 22:24

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Comments

Paul Panzer · Accepted Answer · 2019-12-06 02:46:41Z

0

np.unique and np.bincount can be used here:

# set up example
a1 = np.random.randint(0,10,20)
a2 = np.random.random(20)

# solve
sa1,idx,cnt = np.unique(a1,return_counts=True,return_inverse=True)
sa2 = np.bincount(idx,a2)/cnt

# compare with brute force
np.all(sa1 == sorted(set(a1)))
# True
np.all(sa2 == [np.mean(a2[a1 == x]) for x in sa1])
# True

answered Dec 6, 2019 at 2:46

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Collectives™ on Stack Overflow

Sort one array by another with duplicate values?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related