Using the DataFrame.set_index() method [duplicate]

Question

Good morning,

I have a some error and time data in two columns:

edf = pd.DataFrame({'error':error, 'time':time})

Which gives:

            error    time
0     0.000000e+00 -10.000
1     4.219215e-28  -9.995
2     8.870728e-28  -9.990
3     1.398745e-27  -9.985
4     1.960445e-27  -9.980
5     2.575915e-27  -9.975
6     3.249142e-27  -9.970
7     3.984379e-27  -9.965
8     4.786157e-27  -9.960
9     5.659303e-27  -9.955
10    6.608959e-27  -9.950

According to documentation, I can use edf.set_index('time', drop=True) in order to set the time column as my index, and drop it from the its previous place in the data frame (I believe it drops by default). However, this does absolutely nothing. In fact, I was so confused, that I decided to copy and paste the code example straight from documentation, and indeed it does not work either.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]})

Which gives,

   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

After which, df.set_index('month') also gives:

   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Instead of what documentation advertises:

       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

What gives?

Quang Hoang · Accepted Answer · 2019-10-15 18:32:34Z

1

set_index returns the new dataframe by default. So use:

# recommended
edf.set_index('time', drop=True, inplace=True)

or

edf = edf.set_index('time', drop=True)

answered Oct 15, 2019 at 18:32

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Alexander Over a year ago

Personally, I always prefer to be explicit and never use inplace (i.e. I would always use the second method). In fact, inplace is expected to be deprecated. github.com/pandas-dev/pandas/issues/16529

Quang Hoang Over a year ago

The linked GitHub issue is still discussion and not final. In my experience, inplace=True does sometimes save a lot of memory.

Alexander Over a year ago

It is certainly debatable, which is why I highlighted the difference of opinion. I don't believe there would be any difference in memory usage. Do you have any references where I could learn more about that? Here is some more SO discussion on inplace: stackoverflow.com/questions/45570984/…

Quang Hoang Over a year ago

@Alexander I don't have any reference. That came purely from my experience. However, from your links, inplace is good does show the difference in memory usage.

Alexander Over a year ago

And the in-place is bad! link shows they that example may not hold in practice. I think we can agree that it is a point of disagreement as to best practice.

|

Dave Costa · Accepted Answer · 2019-10-15 18:34:03Z

1

Most dataframe operations don't modify the original dataframe by default. Instead, they return a new dataframe as a result.

You could assign that result to a new variable, or to the same one:

df = df.set_index('month')

Or you could pass a parameter to the function to tell it to modify the original dataframe in place:

df.set_index('month', inplace=True)

This tripped me up a lot when I started working with Pandas.

answered Oct 15, 2019 at 18:34

Dave Costa

48.3k8 gold badges61 silver badges73 bronze badges

Collectives™ on Stack Overflow

Using the DataFrame.set_index() method [duplicate]

2 Answers 2

7 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Linked

Related