1

I was wondering if I might be missing an easy way to pull in a set of column names in as an index in a data frame.

The following is the example code I set up with my current (messy) solution:

df1 = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3'],
'B' : ['b1', 'b2', 'b3', 'b4'],
'D1' : [1,0,0,0],
'D2' : [0,1,1,0],
'D3' : [0,0,1,1],
})

df1 = df1.set_index(['A','B'])
b = df1.unstack().unstack()
c = b.reset_index()
c.columns = ['D','B','A','Value']
d = c.set_index(['A','B','D'])
final1 = d.unstack()

df2 = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3'],
'B' : ['b1', 'b2', 'b3', 'b4'],
'D1' : [1,0,0,0],
'D2' : [0,0,0,0],
'D3' : [0,0,0,1],
})

df2 = df2.set_index(['A','B'])
b = df2.unstack().unstack()
c = b.reset_index()
c.columns = ['D','B','A','Value']
d = c.set_index(['A','B','D'])
final2 = d.unstack()

result = (final1*final2).dropna()

So just by way of more background, the actual problem I am trying to solve is as follows: I have N number of data frames (e.g. df1, df2) which consist of 1s and 0s and I am trying to find a way to use Pandas to multiply them all together based on a 3-dimensional index in order to find the intersection of them (i.e. result).

In order to do so, I thought why not convert the data set into Pandas data frames and then set the index to be the 3 dimensions. Then as shown above it should just be an easy multiplication job and Pandas will take care of the rest.

However, the data comes in the format shown in df1/df2. As such, the code above highlights my messy attempt at converting the data into a Pandas data frame with 3 indices. So, again, was wondering if there was an easier way to move a set of column names into an index.

Thanks!

1 Answer 1

1

I think that you can just put all of your frames in a list and reduce. They will align each time; including the fill_value=1 will propogate the values when multiplied vs a NaN (which is what I think you want).

In [39]: list_of_dfs = [df1,df2]

In [40]: reduce(lambda x,y: x.mul(y,fill_value=1), list_of_dfs[1:], list_of_dfs[0])
Out[40]: 
       D1  D2  D3
A  B             
a1 b1   1   0   0
   b2   0   0   0
a2 b3   0   0   0
a3 b4   0   0   1
Sign up to request clarification or add additional context in comments.

2 Comments

Ah...cool, thanks. That's a much cleaner way of solving the problem. Out of curiosity though, if one were ever to want to convert a set of columns into an index (i.e. the little block of code with the unstacking), is there a better way to do that? I can't articulate a specific example of why I'd want to do that atm, but it feels like that might be a helpful bit of knowledge.
df.T.set_index(.....).T is the idiom (until we allow an axis argument to set_index)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.