1

I'm having a problem sorting a numpy array that has numbers as strings. I need to keep these as strings because there are other words after the integers.

It's sorting negative numbers in reverse order:

>>> import numpy as np
>>> a = np.array(["3", "-2", "-1", "0", "2"])
>>> a.sort()
>>> a
array(['-1', '-2', '0', '2', '3'], dtype='|S2')

I would have expected the output to be:

array(['-2', '-1', '0', '2', '3'], dtype='|S2')

Any suggestions?

3
  • 1
    So you are keeping two types of data in a single string? Doesn't seem particularly suited to numpy. Commented Oct 3, 2011 at 18:05
  • "I need to keep these as strings because there are other words after the integers". So you have a string like "76 trombones", and you want to treat it like the number 76 followed by the word "trombones"? Then do that. Parse the strings and create 2-tuples of (number, rest of string). Commented Oct 4, 2011 at 0:11
  • No, it's not well-behaved. Sometimes it's a number and string, sometimes it's just a string. The "natural sorting" approach works. Commented Oct 4, 2011 at 0:33

2 Answers 2

6

You could use natural sorting:

import numpy as np
import re

def atoi(text):
    try:
        return int(text)
    except ValueError:
        return text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    '''    
    return [ atoi(c) for c in re.split('([-]?\d+)', text) ]

a = np.array(["3", "-2", "-1", "0", "2", "word"])
print(sorted(a,key=natural_keys))
# ['-2', '-1', '0', '2', '3', 'word']

a = np.array(["3", "-2", "-1", "0", "2", "word", "-1 word", "-2 up"])
print(sorted(a,key=natural_keys))
# ['-2', '-2 up', '-1', '-1 word', '0', '2', '3', 'word']
Sign up to request clarification or add additional context in comments.

3 Comments

That will get the wrong order if you try sort ["-1 word", "-2 up"], which is what I think the OP meant by "other words after the integers".
I posted the output when the array contains ["-1 word", "-2 up"]. I think the order is correct, no?
You're right. I misread your regex. Looks good to me, depending on how you want to handle the case where no integer appears at the beginning of a string! (Mine raises a ValueError.)
2

Assuming there's a space after the integer before the other words, then if a were a regluar python list you'd do:

a.sort(key = lambda s: int(s.split()[0]))

Not sure what the equivalent is in numpy (don't see how to specify a key), but one possibility is to convert to a list and back to an array.

1 Comment

Your version works with sorted(a, key = lambda s: int(s.split()[0])). Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.