3

I've tried to use Ned Batchelder code to sort in human order a NumPy matrix, as it was proposed in this following post:

Sort numpy string array with negative numbers?

The code runs on a one-dimensional array, the command being:

print (sorted(a, key=natural_keys))

Now, my problem is that my data is a 10 column matrix and I want to sort it according to one column (let's say MyColumn). I can't find a way to modify the code to print the whole matrix sorted according to this very column. All I could come up with is this:

print (sorted(a['MyColumn'], key=natural_keys))

But, of course, only MyColumn shows up in the output, although it is correctly sorted...

Is there a way to print the whole Matrix?

Here is the command I used to load my array (I simplified my original imputfile to a 3 column array):

data = np.loadtxt(inputfile, dtype={'names': ('ID', 'MyColumn', 'length'),
'formats': ('int32', 'S40', 'int32')},skiprows=1, delimiter='\t')

ID  MyColumn    length
164967  BFT_job13_q1_type2  426
197388  BFT_job8_q0_type2   244
164967  BFT_job13_q0_type1  944
72406   BFT_job1_q0_type3   696

Here is what the output would ideally look like:

ID  MyColumn    length
72406   BFT_job1_q0_type3   696
197388  BFT_job8_q0_type2   244
164967  BFT_job13_q0_type1  944
164967  BFT_job13_q1_type2  426

1 Answer 1

5

If you have a np.matrix, called m:

col = 1
m[np.array(m[:,col].argsort(axis=0).tolist()).ravel()]

If you have a np.ndarray, called a:

col = 1
a[a[:,col].argsort(axis=0)]

If you have a structured array with named columns:

def mysort(data, col_name, key=None):
    d = data.copy()
    cols = [i[0] for i in eval(str(d.dtype))]
    if key:
        argsort = np.array([key(i) for i in d[col_name]]).argsort()
    else:
        argsort = d[col_name].argsort()
    for col in cols:
        d[col] = d[col][argsort]
    return d

For your specific case you need the following key function:

def key(x):
    x = ''.join([i for i in x if i.isdigit() or i=='_'])
    return '{1:{f}{a}10}_{2:{f}{a}10}_{3:{f}{a}10}'.format(*x.split('_'), f='0', a='>')

d = mysort(data, 'MyColumn', key)
Sign up to request clarification or add additional context in comments.

11 Comments

Thanks. Still can't sort it right... 1-says "IndexError: too many indices" pointing at [:,col] 2- I changed [:,col] to [:][col] and it seems to do the job, but I don't know where to insert key=natural_keys to sort it the right way.
I might add... I don't know how to check whether I have a np.ndarray or a np.matrix! I loaded a text file with np.loadtxt
you have a np.ndarray when loading with np.loadtxt()
@Sara It should work you don't need natural_keys... you just have to inform which column you want to sort using col, remembering that 0 is the 1st column
for large job numbers just use a higher number of more digits in the formatting string, like '{1:{f}{a}10}_{2:{f}{a}10}_{3:{f}{a}10}'... which can handle a job number up to 9999999999
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.