1

I have a master CSV file that looks something like this -- it's only 1 column.

column_name
string1010string
string1013string
string1014string
string1015string
string1016string
string1018string
string1020string

Then I have a temporary CSV that I would like to keep track of separtely but also merge it with the master CSV and have it sorted in ascending order only taking the integers into consideration. I am aware that I have to strip the strings (from the start & the end) of each row to isolate the integers and then sort but I'm not quite sure how to approach it after that.

column_name
string1011string
string1012string
string1017string
string1019string

My function looks something like this:

def output_master_concatenated(list1, list2):
    master= pd.concat([list1, list2])
    #sorting_system
    master.to_csv(str('master' + '.csv'), index = False, sep = ' ')
    return master

Ideally, this is what I would like it to look like.

column_name
string1010string
string1011string
string1012string
string1013string
string1014string
string1015string
string1016string
string1017string
string1018string
string1019string
string1020string

update: string(integer)string is actually a link, each row is basically the same link with only the integer changing

6
  • is the string part literal or variable? Fixed length? Alpha only? etc. Can you update your question to reflect this? Commented Feb 15, 2022 at 12:14
  • @mozway I have updated my post. Sorry for not being clear, it's my 2nd ever post and I wanted to be brief. Commented Feb 15, 2022 at 12:20
  • can you provide an example? It matters if the link contains numbers Commented Feb 15, 2022 at 12:20
  • @mozway is right. You can take advantage of the string format like http://www.company.tld/path/to/1010/page.php or http://www.compagny.tld/path/to/page.php?id=1010 Commented Feb 15, 2022 at 12:25
  • or if you link is unfortunately like 'http://www2.example3.tld/4/x/1234' Commented Feb 15, 2022 at 12:26

1 Answer 1

1

Use sort_values with a custom key:

df = pd.concat([df1, df2])

num_sort = lambda x: x.str.extract('(\d+)', expand=False).str.zfill(10)
df = df.sort_values('column_name', key=num_sort, ignore_index=True)
print(df)

# Output
         column_name
0   string1010string
1   string1011string
2   string1012string
3   string1013string
4   string1014string
5   string1015string
6   string1016string
7   string1017string
8   string1018string
9   string1019string
10  string1020string
Sign up to request clarification or add additional context in comments.

1 Comment

blind guessing I see :p I was waiting to get more info ;) I would use the same approach +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.