Python Pandas: How To Set Columns as an Index?

Question

I was wondering if I might be missing an easy way to pull in a set of column names in as an index in a data frame.

The following is the example code I set up with my current (messy) solution:

df1 = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3'],
'B' : ['b1', 'b2', 'b3', 'b4'],
'D1' : [1,0,0,0],
'D2' : [0,1,1,0],
'D3' : [0,0,1,1],
})

df1 = df1.set_index(['A','B'])
b = df1.unstack().unstack()
c = b.reset_index()
c.columns = ['D','B','A','Value']
d = c.set_index(['A','B','D'])
final1 = d.unstack()

df2 = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3'],
'B' : ['b1', 'b2', 'b3', 'b4'],
'D1' : [1,0,0,0],
'D2' : [0,0,0,0],
'D3' : [0,0,0,1],
})

df2 = df2.set_index(['A','B'])
b = df2.unstack().unstack()
c = b.reset_index()
c.columns = ['D','B','A','Value']
d = c.set_index(['A','B','D'])
final2 = d.unstack()

result = (final1*final2).dropna()

So just by way of more background, the actual problem I am trying to solve is as follows: I have N number of data frames (e.g. df1, df2) which consist of 1s and 0s and I am trying to find a way to use Pandas to multiply them all together based on a 3-dimensional index in order to find the intersection of them (i.e. result).

In order to do so, I thought why not convert the data set into Pandas data frames and then set the index to be the 3 dimensions. Then as shown above it should just be an easy multiplication job and Pandas will take care of the rest.

However, the data comes in the format shown in df1/df2. As such, the code above highlights my messy attempt at converting the data into a Pandas data frame with 3 indices. So, again, was wondering if there was an easier way to move a set of column names into an index.

Thanks!

Jeff · Accepted Answer · 2014-05-20 17:44:28Z

1

I think that you can just put all of your frames in a list and reduce. They will align each time; including the fill_value=1 will propogate the values when multiplied vs a NaN (which is what I think you want).

In [39]: list_of_dfs = [df1,df2]

In [40]: reduce(lambda x,y: x.mul(y,fill_value=1), list_of_dfs[1:], list_of_dfs[0])
Out[40]: 
       D1  D2  D3
A  B             
a1 b1   1   0   0
   b2   0   0   0
a2 b3   0   0   0
a3 b4   0   0   1

answered May 20, 2014 at 17:44

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

slee Over a year ago

Ah...cool, thanks. That's a much cleaner way of solving the problem. Out of curiosity though, if one were ever to want to convert a set of columns into an index (i.e. the little block of code with the unstacking), is there a better way to do that? I can't articulate a specific example of why I'd want to do that atm, but it feels like that might be a helpful bit of knowledge.

Jeff Over a year ago

df.T.set_index(.....).T is the idiom (until we allow an axis argument to set_index)

Collectives™ on Stack Overflow

Python Pandas: How To Set Columns as an Index?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related