2

Related: Selecting with complex criteria from pandas.DataFrame

I have some DataFrame:

df = pd.DataFrame({'name': ['apple1', 'apple2', 'apple3', 'apple4', 'orange1', 'orange2', 'orange3', 'orange4'], 
                   'A': [0, 0, 0, 0, 0, 0 ,0, 0], 
                  'B': [0.10, -0.15, 0.25, -0.55, 0.50, -0.51, 0.70, 0], 
                  'C': [0, 0, 0.25, -0.55, 0.50, -0.51, 0.70, 0.90],
                  'D': [0.10, -0.15, 0.25, 0, 0.50, -0.51, 0.70, 0.90]})
df
name    A   B   C   D
0   apple1  0   0.10    0.00    0.10
1   apple2  0   -0.15   0.00    -0.15
2   apple3  0   0.25    0.25    0.25
3   apple4  0   -0.55   -0.55   0.00
4   orange1 0   0.50    0.50    0.50
5   orange2 0   -0.51   -0.51   -0.51
6   orange3 0   0.70    0.70    0.70
7   orange4 0   0.00    0.90    0.90

Now let's say I want to select all the rows with values less than 0.25 in A, B, C, and D:

df[(df['A'] < 0.25) & 
  (df['B'] < 0.25) &
  (df['C'] < 0.25) &
  (df['D'] < 0.25)]
    name    A   B   C   D
0   apple1  0   0.10    0.00    0.10
1   apple2  0   -0.15   0.00    -0.15
3   apple4  0   -0.55   -0.55   0.00
5   orange2 0   -0.51   -0.51   -0.51

Great, but can I achieve the same thing using a list of columns as input?

Imagine that I wanted to filter on 100 columns instead of 4.

2 Answers 2

4

In this case, because you have the same condition over multiple columns, you can use .all over axis=1 to check if the condition is True for all the selected columns:

df[df.loc[:, 'A':].lt(0.25).all(axis=1)]

      name  A     B     C     D
0   apple1  0  0.10  0.00  0.10
1   apple2  0 -0.15  0.00 -0.15
3   apple4  0 -0.55 -0.55  0.00
5  orange2  0 -0.51 -0.51 -0.51

If your columns arn't so well ordered, use double brackets for selection of your data:

df[df[['A', 'B', 'C', 'D']].lt(0.25).all(axis=1)]

Note: .lt is same as < and stands for "less than"


In case you have lot of columns with a single letter, which would be too much to select manually, we can use DataFrame.filter with regex:

df[df.filter(regex='[A-Z]').lt(0.25).all(axis=1)]
Sign up to request clarification or add additional context in comments.

Comments

2

Use DataFrame.all with axis=1 to performance a boolean indexing.

df[(df[['A','B','C','D']]<0.25).all(axis=1)]

Output

      name  A     B     C     D
0   apple1  0  0.10  0.00  0.10
1   apple2  0 -0.15  0.00 -0.15
3   apple4  0 -0.55 -0.55  0.00
5  orange2  0 -0.51 -0.51 -0.51

Another approach: pd.Index.difference

df[df[df.columns.difference(['name'])].lt(0.25).all(axis=1)]

1 Comment

nice, i'd add in filter to show how he can filter columns by regex (since he possible has 100+) or iloc or .loc to integer or name slice columns

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.