0

I am trying to calculate which points in my data set (in the shape of a numpy array called "matrix") are closest to a vector (array called "vector") in ndimensional space. Then, I want to extract these same vectors from a data set which is identical to "matrix" but includes additional labels (="matrix_with_labels").

vector=([1,2,3,...])
matrix=[[1,2,3,...], [2,4,6,...], ...]]
matrix_with_labels=[[a,1,2,3,...], [b,2,4,6,...], ...]]

Thus, I compute the distances between the vector and each item in the matrix:

dist=scipy.spatial.distance.cdist(matrix,vector,'euclidean')

Then I sort these distances to identify the closest neighbors:

sorted_index=np.argsort(dist, axis=0)

Then I try to sort the "matrix_with_labels" by "sorted_index", using numpy.take as explained in this post on SO.

result= matrix_with_labels.take(sorted_index, 0)

The outcome looks just fine until I try to process it further - it seems to have changed shape:

print result.shape
(20, 1, 11)

When I look at the shape of the initial "matrix_with_labels", however:

matrix_with_labels.shape
(20, 11)

The documentation on take says:

subarray : ndarray The returned array has the same type as a.

What am I doing wrong? Any help is appreciated!

2
  • 1
    "same type" doesn't mean "same shape". Commented May 4, 2016 at 23:20
  • 1
    What is the shape of dist and x (aka sorted_index)? matrix_with_labels.take(x, 0) is the same as matrix_with_labels[x,:]. Commented May 5, 2016 at 2:28

1 Answer 1

1

If you're starting with a (20, 11) shape, I think the only way to get a (20, 1, 11) shape is if x has shape (1, 11).

Try result = matrix_with_labels.take(x.reshape(-1), 0).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.