Pandas data frame filter by list values - most efficient

Question

I have following pandas data frame that I have build:

      dark  Mystery  adult  crime  action  comedy  cartoon  winter  snow  skiing
0001  0.00    0.000  0.000   0.00    0.00   0.000     0.00    0.56  0.65   0.789
0004  0.89    0.678 -0.423   0.12    0.00   0.000     0.00    0.00  0.00   0.000
0005  0.00    0.000  0.000   0.00    0.12   0.678    -0.89    0.00  0.00   0.000

I also have a list that has some of the row index values of the data frame. After filtering I want to have my new data frame with indexes matching the values in the list.

l = [001,005]

This is large data frame I am trying to figure out without iterating via loop.

[df.index[idx] for idx in l]

This is wrong but I feel I am close to the answer or may be not.

Result should be:

      dark  Mystery  adult  crime  action  comedy  cartoon  winter  snow  skiing
0001  0.00    0.000  0.000   0.00    0.00   0.000     0.00    0.56  0.65   0.789
0005  0.00    0.000  0.000   0.00    0.12   0.678    -0.89    0.00  0.00   0.000

df.ix[l] will return a view of the underlying data, where l is your list. Note that idx may be a more readable name than l. — Alexander
– Alexander, Commented Mar 18, 2015 at 23:34

ASGM · Accepted Answer · 2015-03-18 23:37:53Z

3

How about using .loc:

df.loc[l]

Note, in your actual example, your indices are probably strings rather than integers. When you declare l = [0001, 0005] it's going to be evaluated as [1,5]. So you might want to use l = ["0001", "0005"] or use string formatting to convert the integers (as Jonathan Eunice shows in his answer).

As an aside, you should also avoid using lowercase l as a variable name, since it looks similar to 1 in many monospace typefaces.

edited Mar 18, 2015 at 23:37

answered Mar 18, 2015 at 23:31

ASGM

11.5k1 gold badge37 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

add-semi-colons Over a year ago

values in my list are in following format -> u'0001'

ASGM Over a year ago

@Null-Hypothesis, I see. Your question seemed to indicate differently. Did it work for you?

Jonathan Eunice · Accepted Answer · 2015-03-18 23:30:57Z

1

If your DataFrame is in df:

newdf = df[df.index.isin(l)]

Of course, you have to be careful here. None of your items in l are truly in the index. l = [001,005] is the same as l = [1,5], whereas your index is really strings a la ['0001', '0002', ...]. Given that, you may want to "upgrade" your selection list l to be parallel to your index first:

l = ["{:04d}".format(i) for i in l ]
newdf = df[df.index.isin(l)]

answered Mar 18, 2015 at 23:30

Jonathan Eunice

22.6k9 gold badges79 silver badges78 bronze badges

2 Comments

add-semi-colons Over a year ago

values in my list are in following format -> u'0001' based on that I don't think the format is required thats by guess.

Jonathan Eunice Over a year ago

@Null-Hypothesis If your values are in that format, I agree. But in your question, you state l = [001,005] which would make them integers. Even if you assume they're strings, they have the wrong number of leading zeros. So, if the selection list is already more on-target than the question suggests, great! If not, you will need to homogenize the selection list with the DataFrame index.

Collectives™ on Stack Overflow

Pandas data frame filter by list values - most efficient

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related