I have an array like this:
([(1, 1, 10),
(1, 1, 20),
(1, 2, 10),
(2, 1, 30),
(2, 1, 40),
(2, 2, 20)],
dtype=[('id', '<i8'), ('group', '<i8'), ('age', '<i8')])
And I would like to aggregate this array, grouped bu 'id' and 'age', getting the mean for age.
I would like to get this result:
([(1, 1, 15),
(1, 2, 10),
(2, 1, 35),
(2, 2, 20)],
dtype=[('id', '<i8'), ('group', '<i8'), ('age', '<i8')])
I've seen easy ways with pandas but I'm really looking for a way to do it with numpy. I tried:
unique, uniqueInd, uniqueCount = np.unique(old_array['id'], return_inverse=True, return_counts=True)
means = np.bincount(uniqueInd, old_array['age'])/uniqueCount
new_array = np.dstack([unique, means])
but I can't get it to expand and group by multiple columns.
Thank you so much :)!