6

I would like to know how can I find the difference between maximum and minimum values of three columns in python. (The columns name are POPESTIMATE2010-POPESTIMATE2012) Then I should find the maximum result among all my records. in other words, Which county has had the largest absolute change in population within the period 2010-2012?

e.g. If County Population in the 3 year period is 100, 80, 130, then its largest change in the period would be |130-80| = 50.

enter image description here Here is my code:

import pandas as pd
census_df = pd.read_csv('census.csv')

def answer_one():
    return ((census_df['POPESTIMATE2010'],census_df ['POPESTIMATE2011'],census_df ['POPESTIMATE2012']).max()-(census_df['POPESTIMATE2010'],census_df ['POPESTIMATE2011'],census_df ['POPESTIMATE2012']).min()).max()

answer_one()
1
  • Are those the only three columns in the DataFrame? Commented Dec 4, 2016 at 14:42

5 Answers 5

7

I'm not sure what should be the end result, but if you want to get the column with biggest difference between max and min value in it, then you can do it like this:

>>> df = pd.DataFrame({'a':[3,4,6], 'b':[22,15,6], 'c':[7,18,9]})
>>> df
   a   b   c
0  3  22   7
1  4  15  18
2  6   6   9
>>> diff = df.max() - df.min()
>>> diff
a     3
b    16
c    11
dtype: int64
>>> diff.nlargest(1)
b    16
dtype: int64

and if you need just a number then

>>> diff.max()
16

And if you want to get difference between max and min value in each row, then just do it on different axis:

>>> diff = df.max(axis=1) - df.min(axis=1)
>>> diff
0    19
1    14
2     3
>>> diff.max()
19
Sign up to request clarification or add additional context in comments.

17 Comments

But I believe, using your numbers, saeed would want the result to be 19 (22 - 3), not 16.
@pshep123 I was editing the answer as you wrote the comment :) The description is not totally clear so I decided to give more options
What if 22 and 3 were not in the same column? Would that yield the correct result?
Thanks for your replying. What dose axis=1 do here? dose it mean the column?
Axis=1 makes aggreagate to calculate max / min values for each row instead of each colum
|
3
import pandas as pd
d = {'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
df = pd.DataFrame(d)

def answer_one():
    max_1 = max(df.max())
    min_1 = min(df.min())
    return max_1 - min_1

print answer_one()

and if you want to use a select group of columns:

max_1 = max(df[['a','b']].max())

2 Comments

why list? max( df.max() ) work the same, and same apply to min
You're absolutely right Copperfield. Thanks. Edited the answer.
1

max(list) gives you the max element in the list.

min(list) gives you the min element in the list.

The rest I assume should be fairly straightforward to understand!

2 Comments

I used max and min according to my code, but I couldn't extract it.
you have to use it like max(list) not list.max()
1

You need to clean your data first and keep only the columns you need. Then transpose your data frame, and get the difference between max and min from them, and finally from the diff series get idxmax.

import pandas as pd
census_df = pd.read_csv('census.csv')
ans_df = census_df[census_df["SUMLEV"] == 50]    
ans_df = ans_df[["STNAME", "CTYNAME", "POPESTIMATE2010", "POPESTIMATE2011", "POPESTIMATE2012"]]
ans_df = ans_df.set_index(["STNAME", "CTYNAME"])
diff = ans_df.T.max() - ans_df.T.min()
diff.idxmax()[1]

Comments

0

I had the same problem, as I solved:

f1 = census_df[census_df['SUMLEV'] == 50].set_index(['STNAME','CTYNAME'])
f1 = f1.ix[:,'POPESTIMATE2010','POPESTIMATE2011','POPESTIMATE2012','POPESTIMATE2013'
,'POPESTIMATE2014','POPESTIMATE2015']].stack()
f2 = f1.max(level=['STNAME','CTYNAME']) - f1.min(level=['STNAME','CTYNAME'])
return f2.idxmax()[1]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.