0

I have two dataframes as:

df1.ix[1:3]
DateTime
2018-01-02    [-0.0031537018416199097, 0.006451397621428631,...
2018-01-03    [-0.0028882814454597745, -0.005829869983964528...


df2.ix[1:3]
DateTime
2018-01-02    [-0.03285881500135208, -0.027806145786217932, ...
2018-01-03    [-0.0001314381449719178, -0.006278235444742629...

len(df1.ix['2018-01-02'][0])
500

len(df2.ix['2018-01-02'][0])
500

When I do df1 + df2 I get:

len((df1 + df2).ix['2018-01-02'][0])
1000

So, the lists instead of being summation is being concatenated.

How do I add element wise the lists in the dataframes df1 and df2.

1
  • Your dataframes have just one column? Commented Aug 29, 2018 at 17:04

2 Answers 2

1

When an operation is applied between two dataframes, it gets broadcasted at element level. Element in your case is a list and when '+' operator is applied between two lists, it concatenates them. That's why resulting dataframe contains concatenated lists.

There can be multiple approaches for actually summing up elements of lists instead of concatenating.

One approach can be converting list elements into columns and then adding dataframes and then merging columns back to a single list.(which has been suggested in first answer but in a wrong way)

Step 1: Converting list elements to columns

df1=df1.apply(lambda row:pd.Series(row[0]), axis=1)
df2=df2.apply(lambda row:pd.Series(row[0]), axis=1)

We need to pass row[0] instead of row to get rid of column index associated with series.

Step 2: Add dataframes

df=df1+df2 #this dataframe will have 500 columns

Step 3: Merge columns back to lists

df=df.apply(lambda row:pd.Series({0:list(row)}),axis=1)

This is an interesting part. Why are we returning a series here? Why only returning list(row) doesn't work and keep retaining 500 columns?

Reason is - if length of list returned is same as length of columns in the beginning, then this list gets fit in columns and to us it seems nothing happened. Whereas if length of the list is not equal to number of columns, then it is returned as single list.

Let's look at an example.

Suppose I've a dataframe, having columns 0 ,1 and 2.

df=pd.DataFrame({0:[1,2,3],1:[4,5,6],2:[7,8,9]})

     0 1 2
0   1 4 7
1   2 5 8
2   3 6 9

Number of columns in original dataframe are 3. If I try to return a list with two columns, it works and a series is returned,

df1=df.apply(lambda row:[row[0],row[1]],axis=1)

0  [1, 4]
1  [2, 5]
2  [3, 6]
dtype: object

Instead if try to return list of three numbers, it would get fit in columns.

df1=df.apply(list,axis=1)

     0 1 2
0   1 4 7
1   2 5 8
2   3 6 9

So if we want to return list of same size as number of columns, we'll have to return it in form of Series where one row's value has been given as list.

Another approach can be, introduce one column of a dataframe into other and then add columns using apply function.

df1[1]=df2[0]
df=df1.apply(lambda r: list(np.array(r[0])+np.array(r[1])),axis=1)

We can take advantage of numpy arrays here. '+' operator on numpy arrays sums up corresponding values and gives a single numpy array.

Sign up to request clarification or add additional context in comments.

Comments

1

Cast them to series so that they become columns, then add your dfs:

df1 = df1.apply(pd.Series, axis=1)
df2 = df2.apply(pd.Series, axis=1)

df1 + df2

5 Comments

How do I convert (df1 + df2) back to lists ?
(df1 + df2).apply(list, axis=1)
this list function still retains 500 columns
With axis=1? I've tested and works on some sample data on my machine
A series with column index will be passed to apply method. Applying series method over it again will return same thing. So above method won't work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.