0

Good morning,

I have a some error and time data in two columns:

edf = pd.DataFrame({'error':error, 'time':time})

Which gives:

            error    time
0     0.000000e+00 -10.000
1     4.219215e-28  -9.995
2     8.870728e-28  -9.990
3     1.398745e-27  -9.985
4     1.960445e-27  -9.980
5     2.575915e-27  -9.975
6     3.249142e-27  -9.970
7     3.984379e-27  -9.965
8     4.786157e-27  -9.960
9     5.659303e-27  -9.955
10    6.608959e-27  -9.950

According to documentation, I can use edf.set_index('time', drop=True) in order to set the time column as my index, and drop it from the its previous place in the data frame (I believe it drops by default). However, this does absolutely nothing. In fact, I was so confused, that I decided to copy and paste the code example straight from documentation, and indeed it does not work either.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]})

Which gives,

   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

After which, df.set_index('month') also gives:

   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Instead of what documentation advertises:

       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

What gives?

0

2 Answers 2

1

set_index returns the new dataframe by default. So use:

# recommended
edf.set_index('time', drop=True, inplace=True)

or

edf = edf.set_index('time', drop=True)
Sign up to request clarification or add additional context in comments.

7 Comments

Personally, I always prefer to be explicit and never use inplace (i.e. I would always use the second method). In fact, inplace is expected to be deprecated. github.com/pandas-dev/pandas/issues/16529
The linked GitHub issue is still discussion and not final. In my experience, inplace=True does sometimes save a lot of memory.
It is certainly debatable, which is why I highlighted the difference of opinion. I don't believe there would be any difference in memory usage. Do you have any references where I could learn more about that? Here is some more SO discussion on inplace: stackoverflow.com/questions/45570984/…
@Alexander I don't have any reference. That came purely from my experience. However, from your links, inplace is good does show the difference in memory usage.
And the in-place is bad! link shows they that example may not hold in practice. I think we can agree that it is a point of disagreement as to best practice.
|
1

Most dataframe operations don't modify the original dataframe by default. Instead, they return a new dataframe as a result.

You could assign that result to a new variable, or to the same one:

df = df.set_index('month')

Or you could pass a parameter to the function to tell it to modify the original dataframe in place:

df.set_index('month', inplace=True)

This tripped me up a lot when I started working with Pandas.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.