2

I would like to use numpy to create a square matrix where rows other than the first are sorted by the contents of the first row. For example:

import numpy as np
a = array([['','z','b','d'],
           ['b','2','5','7'],
           ['d','0','1','3'],
           ['z','3','9','2']])

return:

[['','z','b','d']
 ['z','3','9','2']
 ['b','2','5','7']
 ['d','0','1','3']]
4
  • So you are looking to sort lexicographically by the 1st column? Commented Nov 6, 2012 at 23:50
  • @Bitwise: the sort should be based on the contents of the first row. I've revised the question's text and example to be more clear. Commented Nov 7, 2012 at 0:10
  • Your example seems confusing because the result is not simply a reorder of the rows (ie, the row starting with 'z' is ['z','3','9','2'] but in your return it is ['z','0','1','3']) Commented Nov 7, 2012 at 0:57
  • Not only did it seem confusing, it was confusing - when I renamed rows to clarify the goal, I entered the wrong values to return, as you point out. Sorry about that. Commented Nov 7, 2012 at 1:29

2 Answers 2

2

Here's another way, assuming that what you want is indeed a sort of the rows based on first row:

>>> a[[list(a[:, 0]).index(i) for i in a[0]]]
array([['', 'z', 'b', 'd'],
       ['z', '3', '9', '2'],
       ['b', '2', '5', '7'],
       ['d', '0', '1', '3']], 
       dtype='|S1')
Sign up to request clarification or add additional context in comments.

Comments

1

It is unclear why you want to have this data in a numpy array, when a dictionary would probably be more appropriate. I assume you want to do some calculations on the data, for which you probably don't want a string dtype.

In your example you want to sort from a key in the first row, presumably strings. If you want to access the array in a 'square' form (e.g. slices like a[:, 2]), this will mean all the elements will be converted to strings. Structured arrays will allow you do do a better sorting, but at the expense of having to do slices like a[:][2]. Here's an example with a structured array that puts your data into an array with a string dtype 'names', and the values as integers in a dtype 'values'. You can do the sorting by the strings in 'names':

a = np.array([('b', [2, 5, 7]),
              ('d', [0, 1, 3]), 
              ('z', [3, 9, 2])],
              dtype=[('names', 'S1'),
                     ('values', '3int')])

You can access the names and the values records separately:

>>> a['names']
array(['b', 'd', 'z'], 
      dtype='|S5')

>>> a['values']
array([[2, 5, 7],
       [0, 1, 3],
       [3, 9, 2]])

And you can sort the values array based on a lexicographic sort of the names:

>>> a['values'][np.argsort(a['names'])]
array([[2, 5, 7],
       [0, 1, 3],
       [3, 9, 2]])

Or just sort the array using another order of the names:

>>> a['values'][np.argsort(['z', 'b', 'd'])]
array([[0, 1, 3],
       [3, 9, 2],
       [2, 5, 7]])

3 Comments

My reading of the question suggests that the sort order is determined by the contents of row 0, whereas your code appears to sort rows lexicographically based on column 0.
@NPE That is in fact the case.
True, I just expanded on the answer to address the question. It is unclear from the question why numpy arrays are necessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.