I have a dataframe (the sample looks like this)
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale XL,S,M
Variation 2.5 Boots XL XL 330
Variation 2.6 Boots S S 330
Variation 2.7 Boots M M 330
Variable 3 Helmet Helmet Sizes E42,E41
Variation 3.8 Helmet E42 E42 89
Variation 3.2 Helmet E41 E41 89
What I want to do is sort the values based on Size so the final data frame should look like this:
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale S,M,XL
Variation 2.6 Boots S S 330
Variation 2.7 Boots M M 330
Variation 2.5 Boots XL XL 330
Variable 3 Boots Helmet Sizes E41,E42
Variation 3.2 Helmet E41 E41 89
Variation 3.8 Helmet E42 E42 89
I am able to successfully get the results using this code
sizes, dig = ['S','M','XL','L',], ['000','111','333','222'] #make sure dig values do not exist as a substring anywhere in your dataframe
df = (df.assign(Size=df['Size'].replace(sizes, dig, regex=True))
.assign(grp=(df['Type'] == 'Variable').cumsum())
.sort_values(['grp', 'Type', 'Size']).drop('grp', axis=1))
df['Size'] = df['Size'].apply(lambda x: ','.join(sorted(x.split(',')))).replace(dig, sizes, regex=True)
df
The issue is that the given code dosen't work on dataframe
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale XL,S,3XL
Variation 2.5 Boots XL XL 330
Variation 2.6 Boots 3XL 3XL 330
Variation 2.7 Boots S S 330
Variable 3 Helmet Helmet Sizes S19, S9
Variation 3.8 Helmet E42 S19 89
Variation 3.2 Helmet E41 S9 89
it gives the results 'S,3XL,XL' and 'S19,S9' whereas I want the results as
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale S,XL,3XL
Variation 2.7 Boots S S 330
Variation 2.5 Boots XL XL 330
Variation 2.6 Boots 3XL 3XL 330
Variable 3 Helmet Helmet Sizes S9,S19
Variation 3.2 Helmet E41 S9 89
Variation 3.8 Helmet E42 S19 89
also in case of more sizes, the order should be 'XXS,XS,S,M,L,XL,XXL,3XL,4XL,5XL' and in case of second example, 'S9,S19,M9,M19,L9 and so on'
This is what I have done so far but it's not working and showing the wrong order
sizes, dig = ['XS','S','M','L','XL','XXL','3XL','4XL','5XL'], ['000','111','222','333','444','555','666','777','888'] #make sure dig values do not exist as a substring anywhere in your dataframe
df = (df.assign(Size=df['Size'].replace(sizes, dig, regex=True))
.assign(grp=(df['Type'] == 'variable').cumsum())
.sort_values(['grp', 'Type', 'Size']).drop('grp', axis=1))
df['Size'] = df['Size'].apply(lambda x: ','.join(sorted(x.split(',')))).replace(dig, sizes, regex=True)