7

I'm trying to modify the values field of a pandas data frame with a numpy array [same size]. something like this does not work

import pandas as pd
# create 2d numpy array, called arr
df = pd.DataFrame(arr, columns=some_list_of_names)
df.values = myfunction(arr)

any alternatives?

1
  • Why don't you just do myfunction first and then pass the result to DataFrame when you initially create it? Commented Feb 6, 2015 at 22:33

3 Answers 3

16

The .values attribute is often a copy - especially for mixed dtypes (so assignment to it is not guaranteed to work - in newer versions of pandas this will raise).

You should assign to the specific columns (note the order is important).

df = pd.DataFrame(arr, columns=some_list_of_names)
df[some_list_of_names] = myfunction(arr)

Example (in pandas 0.15.2):

In [11]: df = pd.DataFrame([[1, 2.], [3, 4.]], columns=['a', 'b'])

In [12]: df.values = [[5, 6], [7, 8]]
AttributeError: can't set attribute

In [13]: df[['a', 'b']] = [[5, 6], [7, 8]]

In [14]: df
Out[14]:
   a  b
0  5  6
1  7  8

In [15]: df[['b', 'a']] = [[5, 6], [7, 8]]

In [16]: df
Out[16]:
   a  b
0  6  5
1  8  7
Sign up to request clarification or add additional context in comments.

1 Comment

In the same spirit, you can also use: df.iloc[:, :] = [[5, 6], [7, 8]]
3

I think this is the method you are looking for:

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.applymap.html

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

Example:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.rand(3,4), columns = list('abcd'))
>>> df
          a         b         c         d
0  0.394819  0.662614  0.752139  0.396745
1  0.802134  0.934494  0.652150  0.698127
2  0.518531  0.582429  0.189880  0.168490
>>> f = lambda x: x*100
>>> df.applymap(f)
           a          b          c          d
0  39.481905  66.261374  75.213857  39.674529
1  80.213437  93.449447  65.215018  69.812667
2  51.853097  58.242895  18.988020  16.849014
>>>

Comments

1

Hopefully this is clear:

import pandas as pd
df = pd.DataFrame(columns=some_list_of_names)
df.loc[:] = arr  # use this to replace the values with the numpy array

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.