Indexing or sorting dataframe using list

Question

I wanted to sort the dataframe based on the list. Dataframe consists of unique id's and I have a list of ids.

Note:- list not have all id's value. I used df.loc but it has limitations.

Example code is as follows:

import pandas as pd
ratings_dict = {
    "ID": ["101", "102", "103", "104", "105"],
    "title": ['TV', 'AC', 'Monitor', 'Headphone', 'Laptop'],
    "rating": [1, 2, 2, 3, 2]
}

df = pd.DataFrame(ratings_dict)

trend_sort=["103","101"]

trend_sort is the id list.

df.set_index('ID',inplace=True)
df=df.loc[trend_sort]

After using df.loc I got output as,

Expected output:

akuiper · Accepted Answer · 2021-09-07 18:50:41Z

3

You can find the rank for each ID first, and then sort by the rank:

# to optimize the rank look up, store the rank / indices in a dictionary
rank = {v: i for i, v in enumerate(trend_sort)}
rank
# {'103': 0, '101': 1}

# map ID to the rank and if ID doesn't exist default to len of data frame
# so it will sorted to the end
df.loc[df.ID.map(lambda x: rank.get(x, len(df))).argsort()]

    ID      title  rating
2  103    Monitor       2
0  101         TV       1
1  102         AC       2
3  104  Headphone       3
4  105     Laptop       2

answered Sep 7, 2021 at 18:50

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Scott Boston · Accepted Answer · 2021-09-07 18:51:48Z

2

You can do it like:

df.reindex(pd.Index(trend_sort).append(df.index[~df.index.isin(trend_sort)]))

Output:

         title  rating
103    Monitor       2
101         TV       1
102         AC       2
104  Headphone       3
105     Laptop       2

answered Sep 7, 2021 at 18:51

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Comments

Andrej Kesely · Accepted Answer · 2021-09-07 18:53:52Z

2

Another solution, using the key= parameter in .sort_values:

df = df.sort_values(
    by="ID", key=lambda x: x.map({v: i for i, v in enumerate(trend_sort)})
)
print(df)

Prints:

    ID      title  rating
2  103    Monitor       2
0  101         TV       1
1  102         AC       2
3  104  Headphone       3
4  105     Laptop       2

answered Sep 7, 2021 at 18:53

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Comments

Jorge L. · Accepted Answer · 2021-09-07 19:03:52Z

1

The only thing I can think of, is creating a new list like this:

sorted_list = trend_sort + [i for i in df.index.tolist() if i not in trend_sort]

and then:

df = df.loc[sorted_list]

output:

    title   rating
ID      
103 Monitor   2
101 TV        1
102 AC        2
104 Headphone 3
105 Laptop    2

answered Sep 7, 2021 at 19:03

Jorge L.

1165 bronze badges

1 Comment

Mahipal Singh Over a year ago

Thank you for the solution, All solutions worked perfectly, but this solution is most time-efficient.

Panwen Wang · Accepted Answer · 2021-09-07 18:55:20Z

1

Just attach the rest indices, no need to sort values, no map and no lambda:

trend_sort=["103","101"]
new_idx = pd.Index(trend_sort).append(df.index.difference(trend_sort))
df.loc[new_idx]

         title  rating
      <object> <int64>
103    Monitor       2
101         TV       1
102         AC       2
104  Headphone       3
105     Laptop       2

answered Sep 7, 2021 at 18:55

Panwen Wang

3,8652 gold badges21 silver badges41 bronze badges

1 Comment

Scott Boston Over a year ago

I think I beat you to this solution. :-) Reindex or loc pretty close to the same. Reindex is little more flexible to handle values not in the index.

gal peled · Accepted Answer · 2021-09-07 18:55:03Z

0

I am not sure what you really want. but if you want the top column in the table to be columns in the trend_sort you can do the following

import pandas as pd
ratings_dict = {
    "ID": ["101", "102", "103", "104", "105"],
    "title": ['TV', 'AC', 'Monitor', 'Headphone', 'Laptop'],
    "rating": [1, 2, 2, 3, 2]
}

df = pd.DataFrame(ratings_dict)

trend_sort=["103","101"]

def get_sort_value(item,trend_l):
    if item in trend_l:
        return trend_l.index(item)
    return len(trend_l)+1

df['sort_column'] = df['ID'].apply(lambda x: get_sort_value(x,trend_sort))
df = df.sort_values(by=['sort_column'])
df = df.drop(columns=['sort_column'])
print(df)

answered Sep 7, 2021 at 18:55

gal peled

4725 silver badges8 bronze badges

Collectives™ on Stack Overflow

Indexing or sorting dataframe using list

6 Answers 6

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related