0

I have a csv file with 1600 lines from which top ten lines given below

N,EQ,ADANIPORTS,ADANI PORT & SEZ LTD,384.5,385,387.8,375,376.75,792818726.1,2085488,Y, ,40850,452.35,350.45
N,EQ,ASIANPAINT,ASIAN PAINTS LIMITED,1394.75,1395,1411,1385.05,1393.5,1284559258,919355,Y, ,36117,1490.6,1090.1
N,EQ,AXISBANK,AXIS BANK LIMITED,631.75,638.05,643.4,634,639.9,9599936309,15035968,Y, ,144038,644.65,447.5
N,EQ,BAJAJ-AUTO,BAJAJ AUTO LIMITED,2685.55,2683.9,2697,2664,2682.25,1476618943,551229,Y, ,23611,3468.35,2605
N,EQ,BAJAJFINSV,BAJAJ FINSERV LTD.,7092.1,7092,7129,7025.25,7050.65,909166393.3,128111,Y, ,19707,7200,4500
N,EQ,BAJFINANCE,BAJAJ FINANCE LIMITED,2893.85,2892,2943.4,2891.05,2916.6,3884349778,1327710,Y, ,52356,2943.4,1511.2
N,EQ,BHARTIARTL,BHARTI AIRTEL LIMITED,369.9,370,370.8,365,368.95,768282183.8,2089422,Y, ,26515,564.8,331
N,EQ,BPCL,BHARAT PETROLEUM CORP LT,357.75,358.25,362,353.5,356.95,1738725370,4865929,Y, ,77863,551.55,353.5
N,EQ,CIPLA,CIPLA LTD,657.95,658,658,645,651.2,1235846442,1904031,Y, ,38575,665,507.2
N,EQ,COALINDIA,COAL INDIA LTD,289.05,287.85,293.6,287.8,291,791484837,2713583,Y, ,55421,316.95,235.85

I wanted to sort 10 the column in descending order so that top 20 I can find out. The file name is Pd240818.csv my powershell code is as below.

# To remove unwanted few lines

sls ",BE,",",EQ," .\Pd240818.csv | select -exp line | Where-Object {$_ -notmatch ',EQ, ,'} > .\temp.csv

#Sorting line is as follows

gc .\temp.csv | Where-Object {$_ -notmatch 'MKT,'}|%{$_.split(",")[9]}|Sort-Object -Descending| Select-Object -first 20 > temp.txt

Sorted

I get temp.txt as follows:

99988.7
99896.5
9989273.6
99769.75
996134.55
9933960.45
99228.65
99199.95
989418.15
988423057.7
9884111.1
98572145.2
982146.5
981497584.9
97982.75
9786178.9
9775915.05
9760482.5
97384498.85
971033.85

Where as if I sort the same column in excel, I get as below.

28818819313
9599936309
8459873415
6175554483
5889553012
5690666055
5439638100
5121938441
5079530750
5042021707
4972762046
4889394601
4742835986
3884349778
3690976213
3486309023
3388956937
3336437125
3206801588
3114870807

Where am I doing wrong. How to correct it?

2
  • 1
    It looks like 9 isn't the correct column. Either that, or the top 10 you're showing us here isn't actually the top 10 that's in your file. If you're going to show us 20 output records, at least show us the 20 input records that correspond to your output data. The output from your temp.txt doesn't even appear to be part of your input set. Commented Aug 25, 2018 at 21:46
  • Thanks for response. It is tenth column. I am given to understand in splitting csv file, it starts with 0,1,2 ,...Further my file is nearly having 1600 lines. I have shown only first 10 lines. When I sorted full file for top 20 in column 10 with powershell code and excel, results are given above. Commented Aug 25, 2018 at 21:56

1 Answer 1

5

The clue is seeing numbers of different lengths, all sorted together:

This is a common problem, where numbers are sorted as text, instead of number values - when we sort words it does not matter how long they are, we put all the a together, then all the b together ... do that with numbers and put all the 9 together, then all the 8 together, you see this varying length sort:

99896.5
9989273.6
99769.75

The solution is to convert the text to numbers, while sorting, then they will sort on the value:

.. | Sort-Object -Descending -Property { $_ -as [decimal] } | ..

Then the output is more like you want:

988423057.7
981497584.9
98572145.2
97384498.85
9989273.6
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.