1

I am trying to sort a pandas dataframe. The data looks like-

year state district Party rank share in votes
2010 haryana kaithal Winner 1 40.12
2010 haryana kaithal bjp 2 30.52
2010 haryana kaithal NOTA 3 29
2010 goa panji Winner 3 10
2010 goa panji INC 2 40
2010 goa panji BJP 1 50
2013 up meerut Winner 2 40
2013 up meerut SP 1 60
2015 haryana kaithal Winner 2 15
2015 haryana kaithal BJP 3 35
2015 haryana kaithal INC 1 50

This data is for multiple states for multiple years. In this dataset, there are multiple values for each district. I want to calculate the margin of share for each district in this manner. I have tried this, but not able to write fully. I am not able to write code for defining the margin of share and get a dataframe with only one (margin of share) value corresponding to each district instead of party wise shares.

for year in df['YEAR']:
 for state in df['STATE']:
    for district in df['DISTRICT']:
        for rank in df['RANK']:
            for party in df['PARTY']:
                if rank==1 and party=='WINNER': 

then margin of share =Share of Winner-Share of party at rank 2. If share WINNER does not have rank 1 then Margin of Share= share of winner - share of party at rank 1.

I am basically trying to get this output-

|      year     |     state   |district| margin of share|
|---------------|-------------|--------|----------------|
|          2010    | haryana  |kaithal | 9.6            |
|          2010    | goa      |panji   | -40            |
|          2010    | up       |kaithal | -20            |
|          2015    | haryana  |kaithal | -35            |

I wish to have create a different data frame with columns Year, State, District and margin of SHARE.

2 Answers 2

2

Create MultiIndex by first 3 columns by DataFrame.set_index, create masks, filter with DataFrame.loc and subtract values, last use Series.fillna for replace not matched values by condition m3:

df1 = df.set_index(['year', 'state', 'district'])
m1 = df1.Party=='Winner'
m2 = df1['rank']==2
m3 = df1['rank']==1

s1 = (df1.loc[m1 & m3,'share in votes']
        .sub(df1.loc[m2,'share in votes']))
print (s1)
year  state    district
2010  goa      panji       NaN
      haryana  kaithal     9.6
2013  up       meerut      NaN
2015  haryana  kaithal     NaN
Name: share in votes, dtype: float64

s2 = (df1.loc[m1,'share in votes']
        .sub(df1.loc[m3,'share in votes']))
print (s2)
year  state    district
2010  haryana  kaithal      0.0
      goa      panji      -40.0
2013  up       meerut     -20.0
2015  haryana  kaithal    -35.0
Name: share in votes, dtype: float64

df = s1.fillna(s2).reset_index()
print (df)
   year    state district  share in votes
0  2010      goa    panji           -40.0
1  2010  haryana  kaithal             9.6
2  2013       up   meerut           -20.0
3  2015  haryana  kaithal           -35.0
Sign up to request clarification or add additional context in comments.

3 Comments

thanks a ton. This is also working for me in the desired manner. I appreciate your help.
@NoobCoderPy - You are welcome, should be faster, because not used groupby , per groups working by MultiIndex ;)
hi, I have another question, related to this dataset, I have posted it here <stackoverflow.com/q/74701564/20147620>, please go through it and comment of you could help.
1

use groupby and where with conditions

g = df.groupby(['year', 'state', 'district'])
cond1 = df['Party'].eq('Winner')
cond2 = df['rank'].eq(1)
cond3 = df['rank'].eq(2)
df1 = g['share in votes'].agg(lambda x: (x.where(cond1).sum() - x.where(cond3).sum()) if x.where(cond1 & cond2).sum() != 0 else (x.where(cond1).sum() - x.where(cond2).sum())).reset_index()

result(df1):

    year    state   district    share in votes
0   2010    goa     panji       -40.0
1   2010    haryana kaithal     9.6
2   2013    up      meerut      -20.0
3   2015    haryana kaithal     -35.0



if you want sort like df use following code:

df.iloc[:, :3].drop_duplicates().merge(df1)

result:

    year    state   district    share in votes
0   2010    haryana kaithal     9.6
1   2010    goa     panji       -40.0
2   2013    up      meerut      -20.0
3   2015    haryana kaithal     -35.0

1 Comment

hi, I have another question, related to this dataset, I have posted it here <stackoverflow.com/q/74701564/20147620>, please go through it and comment of you colud help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.