Python Numpy - Aggregate numpy array for multiple groups

Question

I have an array like this:

([(1, 1, 10),
  (1, 1, 20),
  (1, 2, 10),
  (2, 1, 30),
  (2, 1, 40),
  (2, 2, 20)],
  dtype=[('id', '<i8'), ('group', '<i8'), ('age', '<i8')])

And I would like to aggregate this array, grouped bu 'id' and 'age', getting the mean for age.

I would like to get this result:

([(1, 1, 15),
  (1, 2, 10),
  (2, 1, 35),
  (2, 2, 20)],
  dtype=[('id', '<i8'), ('group', '<i8'), ('age', '<i8')])

I've seen easy ways with pandas but I'm really looking for a way to do it with numpy. I tried:

unique, uniqueInd, uniqueCount = np.unique(old_array['id'], return_inverse=True, return_counts=True)
means = np.bincount(uniqueInd, old_array['age'])/uniqueCount
new_array = np.dstack([unique, means])

but I can't get it to expand and group by multiple columns.

Thank you so much :)!

V. Ayrat · Accepted Answer · 2020-05-25 15:38:45Z

2

You can divide your bincount with weights to bincount without weights to get means.

import numpy as np

a = np.array([(1, 1, 10),
  (1, 1, 20),
  (1, 2, 10),
  (2, 1, 30),
  (2, 1, 40),
  (2, 2, 20)],
  dtype=[('id', '<i8'), ('group', '<i8'), ('age', '<i8')])


ans, indices = np.unique(a[['id', 'group']], return_inverse=True)
means = np.bincount(indices, a['age']) / np.bincount(indices)
answer = np.empty(means.size, dtype=a.dtype)
answer['id'] = ans['id']
answer['group'] = ans['group']
answer['age'] = means
print(answer)
# [(1, 1, 15) (1, 2, 10) (2, 1, 35) (2, 2, 20)]

answered May 25, 2020 at 15:38

V. Ayrat

2,74912 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Numpy - Aggregate numpy array for multiple groups

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related