1

I have this Python Pandas DataFrame DF :

DICT = {  'letter': ['A','B','C','A','B','C','A','B','C'],
          'number': [1,1,1,2,2,2,3,3,3],
          'word'  : ['one','two','three','three','two','one','two','one','three']}

DF = pd.DataFrame(DICT)

Which looks like :

  letter  number   word
0      A       1    one
1      B       1    two
2      C       1  three
3      A       2  three
4      B       2    two
5      C       2    one
6      A       3    two
7      B       3    one
8      C       3  three

And I want to extract the lines

  letter  number   word
       A       1    one
       B       2    two
       C       3  three

First I tired :

DF[(DF['letter'].isin(("A","B","C"))) & 
    DF['number'].isin((1,2,3))        &
    DF['word'].isin(('one','two','three'))]

Of course it didn't work, and everything has been selected

Then I tested :

Bool = DF[['letter','number','word']].isin(("A",1,"one"))
DF[np.all(Bool,axis=1)]

Good, it works ! but only for one line ... If we take the next step and give an iterable to .isin() :

Bool = DF[['letter','number','word']].isin((("A",1,"one"),
                                            ("B",2,"two"),
                                            ("C",3,"three")))

Then it fails, the Boolean array is full of False ...

What I'm doing wrong ? Is there a more elegant way to do this selection based on several columns ?

(Anyway, I want to avoid a for loop, because the real DataFrames I'm using are really big, so I'm looking for the fastest optimal way to do the job)

1 Answer 1

1

Idea is create new DataFrame with all triple values and then merge with original DataFrame:

L = [("A",1,"one"),
     ("B",2,"two"),
     ("C",3,"three")]

df1 = pd.DataFrame(L, columns=['letter','number','word'])
print (df1)
  letter  number   word
0      A       1    one
1      B       2    two
2      C       3  three

df = DF.merge(df1)
print (df)
  letter  number   word
0      A       1    one
1      B       2    two
2      C       3  three

Another idea is create list of tuples, convert to Series and then compare by isin:

s = pd.Series(list(map(tuple, DF[['letter','number','word']].values.tolist())),index=DF.index)
df1 = DF[s.isin(L)]
print (df1)
  letter  number   word
0      A       1    one
4      B       2    two
8      C       3  three
Sign up to request clarification or add additional context in comments.

1 Comment

Alternative solution with a conversion to tuples worked ... but it is not really intuitive, a better solution for this common problem should be implemented by pandas' developers ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.