1

I wanted to sort the dataframe based on the list. Dataframe consists of unique id's and I have a list of ids.

Note:- list not have all id's value. I used df.loc but it has limitations.

Example code is as follows:

import pandas as pd
ratings_dict = {
    "ID": ["101", "102", "103", "104", "105"],
    "title": ['TV', 'AC', 'Monitor', 'Headphone', 'Laptop'],
    "rating": [1, 2, 2, 3, 2]
}

df = pd.DataFrame(ratings_dict)

trend_sort=["103","101"]

trend_sort is the id list.

df.set_index('ID',inplace=True)
df=df.loc[trend_sort]

After using df.loc I got output as,

enter image description here

Expected output:

enter image description here

6 Answers 6

3

You can find the rank for each ID first, and then sort by the rank:

# to optimize the rank look up, store the rank / indices in a dictionary
rank = {v: i for i, v in enumerate(trend_sort)}
rank
# {'103': 0, '101': 1}

# map ID to the rank and if ID doesn't exist default to len of data frame
# so it will sorted to the end
df.loc[df.ID.map(lambda x: rank.get(x, len(df))).argsort()]

    ID      title  rating
2  103    Monitor       2
0  101         TV       1
1  102         AC       2
3  104  Headphone       3
4  105     Laptop       2
Sign up to request clarification or add additional context in comments.

Comments

2

You can do it like:

df.reindex(pd.Index(trend_sort).append(df.index[~df.index.isin(trend_sort)]))

Output:

         title  rating
103    Monitor       2
101         TV       1
102         AC       2
104  Headphone       3
105     Laptop       2

Comments

2

Another solution, using the key= parameter in .sort_values:

df = df.sort_values(
    by="ID", key=lambda x: x.map({v: i for i, v in enumerate(trend_sort)})
)
print(df)

Prints:

    ID      title  rating
2  103    Monitor       2
0  101         TV       1
1  102         AC       2
3  104  Headphone       3
4  105     Laptop       2

Comments

1

The only thing I can think of, is creating a new list like this:

sorted_list = trend_sort + [i for i in df.index.tolist() if i not in trend_sort]

and then:

df = df.loc[sorted_list]

output:

    title   rating
ID      
103 Monitor   2
101 TV        1
102 AC        2
104 Headphone 3
105 Laptop    2

1 Comment

Thank you for the solution, All solutions worked perfectly, but this solution is most time-efficient.
1

Just attach the rest indices, no need to sort values, no map and no lambda:

trend_sort=["103","101"]
new_idx = pd.Index(trend_sort).append(df.index.difference(trend_sort))
df.loc[new_idx]
         title  rating
      <object> <int64>
103    Monitor       2
101         TV       1
102         AC       2
104  Headphone       3
105     Laptop       2

1 Comment

I think I beat you to this solution. :-) Reindex or loc pretty close to the same. Reindex is little more flexible to handle values not in the index.
0

I am not sure what you really want. but if you want the top column in the table to be columns in the trend_sort you can do the following

import pandas as pd
ratings_dict = {
    "ID": ["101", "102", "103", "104", "105"],
    "title": ['TV', 'AC', 'Monitor', 'Headphone', 'Laptop'],
    "rating": [1, 2, 2, 3, 2]
}

df = pd.DataFrame(ratings_dict)

trend_sort=["103","101"]

def get_sort_value(item,trend_l):
    if item in trend_l:
        return trend_l.index(item)
    return len(trend_l)+1

df['sort_column'] = df['ID'].apply(lambda x: get_sort_value(x,trend_sort))
df = df.sort_values(by=['sort_column'])
df = df.drop(columns=['sort_column'])
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.