13

I'm using the excellent read_csv()function from pandas, which gives:

In [31]: data = pandas.read_csv("lala.csv", delimiter=",")

In [32]: data
Out[32]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12083 entries, 0 to 12082
Columns: 569 entries, REGIONC to SCALEKER
dtypes: float64(51), int64(518)

but when i apply a function from scikit-learn i loose the informations about columns:

from sklearn import preprocessing
preprocessing.scale(data)

gives numpy array.

Is there a way to apply scikit or numpy function to DataFrames without loosing the information?

2 Answers 2

19

This can be done by wrapping the returned data in a dataframe, with index and columns information in.

import pandas as pd
pd.DataFrame(preprocessing.scale(data), index = data.index, columns = data.columns) 
Sign up to request clarification or add additional context in comments.

Comments

9

A (slightly naive) way would be to store the structure of your data frame, i.e. its columns and index, separately, and then create a new data frame from your preprocessed results like so:

In [15]: data = np.zeros((2,2))

In [16]: data
Out[16]: 
array([[ 0.,  0.],
       [ 0.,  0.]])

In [17]: from pandas import DataFrame

In [21]: df  = DataFrame(data, index = ['first', 'second'], columns=['c1','c2'])

In [22]: df
Out[22]: 
        c1  c2
first    0   0
second   0   0

In [26]: i = df.index

In [27]: c = df.columns

# generate new data as a numpy array    
In [29]: df  = DataFrame(np.random.rand(2,2), index=i, columns=c)

In [30]: df
Out[30]: 
              c1        c2
first   0.821354  0.936703
second  0.138376  0.482180

As you can see in Out[22], we start off with a data frame, and then in In[29] we place some new data inside the frame, leaving the rows and columns unchanged. I am assuming your preprocessing will not shuffle the rows/ columns of the data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.