How to select inverse of indexes of a numpy array?

Question

I have a large set of data in which I need to compare the distances of a set of samples from this array with all the other elements of the array. Below is a very simple example of my data set.

import numpy as np
import scipy.spatial.distance as sd

data = np.array(
    [[ 0.93825827,  0.26701143],
     [ 0.99121108,  0.35582816],
     [ 0.90154837,  0.86254049],
     [ 0.83149103,  0.42222948],
     [ 0.27309625,  0.38925281],
     [ 0.06510739,  0.58445673],
     [ 0.61469637,  0.05420098],
     [ 0.92685408,  0.62715114],
     [ 0.22587817,  0.56819403],
     [ 0.28400409,  0.21112043]]
)


sample_indexes = [1,2,3]

# I'd rather not make this
other_indexes = list(set(range(len(data))) - set(sample_indexes))

sample_data = data[sample_indexes]
other_data = data[other_indexes]

# compare them
dists = sd.cdist(sample_data, other_data)

Is there a way to index a numpy array for indexes that are NOT the sample indexes? In my above example I make a list called other_indexes. I'd rather not have to do this for various reasons (large data set, threading, a very VERY low amount of memory on the system this is running on etc. etc. etc.). Is there a way to do something like..

other_data = data[ indexes not in sample_indexes]

I read that numpy masks can do this but I tried...

other_data = data[~sample_indexes]

And this gives me an error. Do I have to create a mask?

Can data be arranged so that the first N rows form the sample_data and the remainder form the other_data? If so, you could define sample_data and other_data using basic slices, which return views. This would require very little extra memory since the views share the same underlying data. — unutbu
– unutbu, Commented Aug 15, 2014 at 18:32
Also, if you are very memory-constrained, you might consider making the file-based arrays using np.memmap. — unutbu
– unutbu, Commented Aug 15, 2014 at 18:33

Eelco Hoogendoorn · Accepted Answer · 2014-08-15 17:14:11Z

47

mask = np.ones(len(data), np.bool)
mask[sample_indexes] = 0
other_data = data[mask]

not the most elegant for what perhaps should be a single-line statement, but its fairly efficient, and the memory overhead is minimal too.

If memory is your prime concern, np.delete would avoid the creation of the mask, and fancy-indexing creates a copy anyway.

On second thought; np.delete does not modify the existing array, so its pretty much exactly the single line statement you are looking for.

answered Aug 15, 2014 at 17:14

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

b10hazard Over a year ago

doesn't np.delete create a new copy of the array with the specified elements deleted? I'd rather not create a new array, I'd rather read from the existing one in place.

Eelco Hoogendoorn Over a year ago

yes, delete creates a copy. if memory is really that tight, have you considered storing your data in a pytables array, and operating on that? see f.I. pytables.github.io/usersguide/…

Eelco Hoogendoorn Over a year ago

note; if the number of deletions is truly small, a python loop to swap these elements to the end of the array, and then creating a view of the array, would be a simple and efficient solution

b10hazard Over a year ago

Okay, that works for my purposes. You said I can do a python loop to swap elements to the end of an array. Can elements be swapped in-place in a numpy array? How would I do this? Thanks

pattivacek Over a year ago

This is precisely what the official documentation recommends: docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html

CT Zhu · Accepted Answer · 2014-08-15 17:49:11Z

13

You may want to try in1d

In [5]:

select = np.in1d(range(data.shape[0]), sample_indexes)
In [6]:

print data[select]
[[ 0.99121108  0.35582816]
 [ 0.90154837  0.86254049]
 [ 0.83149103  0.42222948]]
In [7]:

print data[~select]
[[ 0.93825827  0.26701143]
 [ 0.27309625  0.38925281]
 [ 0.06510739  0.58445673]
 [ 0.61469637  0.05420098]
 [ 0.92685408  0.62715114]
 [ 0.22587817  0.56819403]
 [ 0.28400409  0.21112043]]

answered Aug 15, 2014 at 17:49

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

2 Comments

dato nefaridze Over a year ago

run this ''''a=np.array([[1,2],[3,4], [5,6]]) a[~np.array([0,1])] ''''

Jeyes Unterwegs Over a year ago

For data of shape[0] == 25,000, @eelco-hoogendoorn's way to create the boolean mask is 25x faster on my machine. Which makes sense because you simply index the relevant positions, whereas here you do a lookup for each index. By the way, this lookup can be sped up 7x by using np.arange instead of range.

Polor Beer · Accepted Answer · 2018-03-17 17:30:45Z

5

You may also use setdiff1d:

In [11]: data[np.setdiff1d(np.arange(data.shape[0]), sample_indexes)]
Out[11]: 
array([[ 0.93825827,  0.26701143],
       [ 0.27309625,  0.38925281],
       [ 0.06510739,  0.58445673],
       [ 0.61469637,  0.05420098],
       [ 0.92685408,  0.62715114],
       [ 0.22587817,  0.56819403],
       [ 0.28400409,  0.21112043]])

answered Mar 17, 2018 at 17:30

Polor Beer

2,0431 gold badge21 silver badges19 bronze badges

Comments

dpdwilson · Accepted Answer · 2014-08-15 17:33:35Z

-3

I'm not familiar with the specifics on numpy, but here's a general solution. Suppose you have the following list:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].
You create another list of indices you don't want:
inds = [1, 3, 6].
Now simply do this:
good_data = [x for x in a if x not in inds], resulting in good_data = [0, 2, 4, 5, 7, 8, 9].

answered Aug 15, 2014 at 17:33

dpdwilson

1,01711 silver badges19 bronze badges

1 Comment

Eelco Hoogendoorn Over a year ago

this will create lots of python objects, and thus will be wildly memory inefficient compared to numpy

Collectives™ on Stack Overflow

How to select inverse of indexes of a numpy array?

4 Answers 4

5 Comments

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related