1

I have a sample dataframe df and an array n as shown below. I want to filter based on the array values which are in index. The output dataframe is shown below as well. I have tried Out = df[df.index == n] , Out = df.loc[df.index == n] and df.loc[n] which is not working giving an error Lengths must match to compare. Can anyone help me in solving this. Here the array is the row number corresponding to data frame.

df = 
             Open   High    Low    Close    Adj Close   Volume
2007-06-18  0.33979 0.33979 0.33979 0.33979 0.33979 1591888
2007-06-29  0.33074 0.33074 0.33074 0.33074 0.33074 88440
2007-06-20  0.33526 0.33526 0.33526 0.33526 0.33526 3538
2007-06-21  0.32113 0.32113 0.32113 0.32113 0.32113 3550
2007-06-22  0.34713 0.34713 0.34713 0.34713 0.34713 670
2007-06-16  0.33979 0.33979 0.33979 0.33979 0.33979 1591888
2007-06-30  0.33074 0.33074 0.33074 0.33074 0.33074 88440
2007-06-31  0.33526 0.33526 0.33526 0.33526 0.33526 3538
2007-06-44  0.32113 0.32113 0.32113 0.32113 0.32113 3550
2007-06-22  0.34713 0.34713 0.34713 0.34713 0.34713 670

n = array([0, 1, 2, 3])

Out  = 
            Open      High  Low     Close   Adj Close   Volume
2007-06-18  0.33979 0.33979 0.33979 0.33979 0.33979 1591888
2007-06-29  0.33074 0.33074 0.33074 0.33074 0.33074 88440
2007-06-20  0.33526 0.33526 0.33526 0.33526 0.33526 3538
2007-06-21  0.32113 0.32113 0.32113 0.32113 0.32113 3550
2
  • @Ben I did already try using the above statement. But it is giving me an empty data fame. Commented Jul 17, 2018 at 13:59
  • 2
    Possible duplicate of Select Pandas rows based on list index Commented Jul 17, 2018 at 14:04

3 Answers 3

3

Use DataFrame.iloc for select by positions:

n = np.array([     0,      1,      2, 3])
df = df.iloc[n]
print (df)
               Open     High      Low    Close  Adj Close   Volume
2007-06-18  0.33979  0.33979  0.33979  0.33979    0.33979  1591888
2007-06-29  0.33074  0.33074  0.33074  0.33074    0.33074    88440
2007-06-20  0.33526  0.33526  0.33526  0.33526    0.33526     3538
2007-06-21  0.32113  0.32113  0.32113  0.32113    0.32113     3550
Sign up to request clarification or add additional context in comments.

Comments

3

Pandas notation for slicing:

df.iloc[0:4,:]

2 Comments

Thanks for your answer. can you please explain me what is going inside this code and where did I make mistake?
Sure, the iloc stands for integer location, basically you are giving the 'position' if your indexes were ordered from 0 to the size of your dataframe. So ;2007-06-18' is at position 0, to recover that row you could do either df.loc['2007-06-18',:] or df.iloc[0,:]. The mistake you made was that you were using loc instead of iloc. Loc requires you to give an index in the same datatype as the dataframe's index, that's why df.loc[n] didn't work
1

Replace everything between the <> with your input

# slice by column position
df.iloc[<start_row>:<end_row>, <column_start_position>:<column_end_position>]
# for everything in a column
df.iloc[:, <column_position>]


# slice by column name
df.loc[<start_row>:<end_row>, <column_name>]
# for everything in a column
df.loc[:, <column_name>]

Review Index and Selecting Data in the pandas docs too. Super informative, if not a bit confusing on the first pass.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.