Python Pandas sort by column, but keep index same

Question

I created a data frame that consists of a Country, deal_category, and some_metric.

It looks like

    Country     metric_count    channel
0   Country1    123472          c1
1   Country1    159392          c2
2   Country2    14599           c3
3   Country2    17382           c4

I indexed according to Country and channel using the command

df2 = df.set_index(["Country", "channel"])

This creates the following dataframe.

            metric_count
Country     channel     
Country1    category1   12347
            category2   159392
            category3   14599
            category4   17382

Country2    category1   1234

Here's what I want to do. I'd like to keep this structure the same and sort according to the metric counts. In other words, I'd like to display for each country, the top 3 channels based on the metric count.

For instance, I'd like a dataframe to display for each country, the top 3 categories ordered by descending metric_counts.

Country2    top category1   12355555
            top category2   159393
            top category3   16759

I've tried sorting first, then indexing, but the resulting data frame no longer partitions based on country. Any tips would be greatly appreciated. Thanks!

Andy Lee · Accepted Answer · 2015-07-06 15:16:36Z

1

After some taxing experimentation, I was able to get exactly what I wanted. I outline my steps below

Groupby Country
```
group = df.groupby("Country")
```
At a high-level, this indicates that we would like to look at each country differently. Now our goal is to determine the top 3 metric counts and report the corresponding channel. To do this, we will apply a sort to the resulting data-frame and then only return the top 3 results. We can do this by defining a sort function that returns only the top 3 results and use the apply function in pandas. This indicates to panda that "I want to apply this sort function to each of our groups and return the top 3 results for each group".

Sort and return top 3

sort_function = lambda x: x.sort("metric_count", ascending = False)[:3]
desired_df = group.apply(sort_function)

answered Jul 6, 2015 at 15:16

Andy Lee

911 gold badge4 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

unutbu · Accepted Answer · 2015-07-06 15:47:21Z

0

Use groupby/apply to sort each group individually, and pick off just the top three rows:

def top_three(grp):
    grp.sort(ascending=False)
    return grp[:3]
df = df.set_index(['channel'])
result = df.groupby('Country', group_keys=False).apply(top_three)

For example,

import numpy as np
import pandas as pd
np.random.seed(2015)
N = 100
df = pd.DataFrame({
    'Country': np.random.choice(['Country{}'.format(i) for i in range(3)], size=N),
    'channel': np.random.choice(['channel{}'.format(i) for i in range(4)], size=N),
    'metric_count': np.random.randint(100, size=N)
})

def top_three(grp):
    grp.sort(ascending=False)
    return grp[:3]

df = df.set_index(['channel'])
result = df.groupby('Country', group_keys=False).apply(top_three)
result = result.set_index(['Country'], append=True)
result = result.reorder_levels(['Country', 'channel'], axis=0)
print(result)

yields

                   metric_count
Country  channel               
Country0 channel3            93
         channel3             0
         channel1             5
Country1 channel0            46
         channel2            86
         channel2            41
Country2 channel0             4
         channel0            51
         channel3            36

edited Jul 6, 2015 at 15:47

answered Jul 5, 2015 at 17:30

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

1 Comment

Andy Lee Over a year ago

Thank you for the help. I didn't get the exact right answer with your approach, but it provided the necessary insight for me to make some tweaks and ultimately get the right answer.

Collectives™ on Stack Overflow

Python Pandas sort by column, but keep index same

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related