3

I'm trying to sort list by frequency and then by name (pandas 1.3.2, python 3.10).

Firstly, I count each occurence in list, then, if amount is equal, names must be ordered alphabetically.

I found out that all works when len(list) < 19. Magic...

Code:

import pandas
        
df_data = pandas.DataFrame({
                'data':
                    ['14209adobepremiere', 'adobe-flash-player', 'adobe-flash-player-cis', 
                     'adobe-photoshop-cc-cis', 'discord', 'discord', 'driverpack', 
                     'freeoffice', 'freeoffice2018', 'generals',
                     'tiktok-for-pc-cis', 'tlauncher', 'utorrent', 'viber', 
                     'winrar', 'zoom', 'zoom', 'zoom-client-for-conferences', 
                     'zoom-client-for-conferences-cis']
            })

with pandas.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df_data['data'].value_counts().sort_index(
            ascending=True,
        ).sort_values(ascending=False))

Expected output (by count desc, then alphabetically asc):

discord                            2
zoom                               2
14209adobepremiere                 1
adobe-flash-player                 1
adobe-flash-player-cis             1
adobe-photoshop-cc-cis             1
driverpack                         1
freeoffice                         1
freeoffice2018                     1
generals                           1
tiktok-for-pc-cis                  1
tlauncher                          1
utorrent                           1
viber                              1
winrar                             1
zoom-client-for-conferences        1
zoom-client-for-conferences-cis    1
Name: data, dtype: int64

Real output (by count desc, but not alphabetically asc):

zoom                               2
discord                            2
14209adobepremiere                 1
tiktok-for-pc-cis                  1
zoom-client-for-conferences        1
winrar                             1
viber                              1
utorrent                           1
tlauncher                          1
generals                           1
adobe-flash-player                 1
freeoffice2018                     1
freeoffice                         1
driverpack                         1
adobe-photoshop-cc-cis             1
adobe-flash-player-cis             1
zoom-client-for-conferences-cis    1
Name: data, dtype: int64

Thnx in advance for any help.

2 Answers 2

1

I don't think you can chain the .sort_values operations on the index and then data, one method could be to reset the index, sort and reapply the index.

df_data['data'].value_counts()\
        .reset_index().sort_values(['data','index'],
          ascending=[False,True]).set_index('index')

                                data
index
discord                             2
zoom                                2
14209adobepremiere                  1
adobe-flash-player                  1
adobe-flash-player-cis              1
adobe-photoshop-cc-cis              1
driverpack                          1
freeoffice                          1
freeoffice2018                      1
generals                            1
tiktok-for-pc-cis                   1
tlauncher                           1
utorrent                            1
viber                               1
winrar                              1
zoom-client-for-conferences         1
zoom-client-for-conferences-cis     1
Sign up to request clarification or add additional context in comments.

Comments

1

For counting frequencies only, you could use the collections.Counter object on the list directly, and if needed, convert the result to a pandas.DataFrame -

from collections import Counter
data = ['14209adobepremiere', 'adobe-flash-player', 'adobe-flash-player-cis', 
                     'adobe-photoshop-cc-cis', 'discord', 'discord', 'driverpack', 
                     'freeoffice', 'freeoffice2018', 'generals',
                     'tiktok-for-pc-cis', 'tlauncher', 'utorrent', 'viber', 
                     'winrar', 'zoom', 'zoom', 'zoom-client-for-conferences', 
                     'zoom-client-for-conferences-cis']
pandas.DataFrame(sorted(Counter(data).items(), key = lambda x: x[1], reverse=True), columns=['index', 'data']).set_index('index')

Output

                                 data
index                                
discord                             2
zoom                                2
14209adobepremiere                  1
adobe-flash-player                  1
adobe-flash-player-cis              1
adobe-photoshop-cc-cis              1
driverpack                          1
freeoffice                          1
freeoffice2018                      1
generals                            1
tiktok-for-pc-cis                   1
tlauncher                           1
utorrent                            1
viber                               1
winrar                              1
zoom-client-for-conferences         1
zoom-client-for-conferences-cis     1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.