Stripping strings in order to sort rows in Dataframe

Question

I have a master CSV file that looks something like this -- it's only 1 column.

column_name
string1010string
string1013string
string1014string
string1015string
string1016string
string1018string
string1020string

Then I have a temporary CSV that I would like to keep track of separtely but also merge it with the master CSV and have it sorted in ascending order only taking the integers into consideration. I am aware that I have to strip the strings (from the start & the end) of each row to isolate the integers and then sort but I'm not quite sure how to approach it after that.

column_name
string1011string
string1012string
string1017string
string1019string

My function looks something like this:

def output_master_concatenated(list1, list2):
    master= pd.concat([list1, list2])
    #sorting_system
    master.to_csv(str('master' + '.csv'), index = False, sep = ' ')
    return master

Ideally, this is what I would like it to look like.

column_name
string1010string
string1011string
string1012string
string1013string
string1014string
string1015string
string1016string
string1017string
string1018string
string1019string
string1020string

update: string(integer)string is actually a link, each row is basically the same link with only the integer changing

is the string part literal or variable? Fixed length? Alpha only? etc. Can you update your question to reflect this? — mozway
– mozway, Commented Feb 15, 2022 at 12:14
@mozway I have updated my post. Sorry for not being clear, it's my 2nd ever post and I wanted to be brief. — liebestod
– liebestod, Commented Feb 15, 2022 at 12:20
can you provide an example? It matters if the link contains numbers — mozway
– mozway, Commented Feb 15, 2022 at 12:20
@mozway is right. You can take advantage of the string format like http://www.company.tld/path/to/1010/page.php or http://www.compagny.tld/path/to/page.php?id=1010 — Corralien
– Corralien, Commented Feb 15, 2022 at 12:25
or if you link is unfortunately like 'http://www2.example3.tld/4/x/1234' — mozway
– mozway, Commented Feb 15, 2022 at 12:26

Corralien · Accepted Answer · 2022-02-15 12:15:50Z

1

Use sort_values with a custom key:

df = pd.concat([df1, df2])

num_sort = lambda x: x.str.extract('(\d+)', expand=False).str.zfill(10)
df = df.sort_values('column_name', key=num_sort, ignore_index=True)
print(df)

# Output
         column_name
0   string1010string
1   string1011string
2   string1012string
3   string1013string
4   string1014string
5   string1015string
6   string1016string
7   string1017string
8   string1018string
9   string1019string
10  string1020string

answered Feb 15, 2022 at 12:15

Corralien

121k8 gold badges43 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mozway Over a year ago

blind guessing I see :p I was waiting to get more info ;) I would use the same approach +1

Collectives™ on Stack Overflow

Stripping strings in order to sort rows in Dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related