how to encoding several column (but not all column) in dataframe python using pandas

Question

I want to build a naive bayes model using two dataframes (test dataframe, train dataframe)

The dataframe contains 13 columns, but I just want to encode the dataframe from str to int value in just 5-6 columns. How can I do that with one code so that 6 columns can directly be encoded, I follow this answer:

https://stackoverflow.com/a/37159615/12977554

import pandas as pd
from sklearn.preprocessing import LabelEncoder

    df = pd.DataFrame({
    'colors':  ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
    'skills':  ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
    })
    
    def encode_df(dataframe):
        le = LabelEncoder()
        for column in dataframe.columns:
            dataframe[column] = le.fit_transform(dataframe[column])
        return dataframe
    
    #encode the dataframe
    encode_df(df)

but it just only encodes 1 column, instead what I want is 6 columns with 1 code.

imdevskp · Accepted Answer · 2021-04-26 07:49:28Z

1

You can loop through the columns and fit_transform

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6']

for col in cols:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col].astype('str'))
    
df

Ideally you want to use same trasnfomer for both train and test dataset
For that you need to use

for col in cols:
    le = LabelEncoder()
    le.fit(df_train[col].astype('str'))
    df_train[col] = le.transform(df_train[col].astype('str'))
    df_test[col] = le.transform(df_test[col].astype('str'))
        
df

edited Apr 26, 2021 at 7:49

answered Apr 26, 2021 at 7:06

imdevskp

2,2532 gold badges13 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

18818181881 Over a year ago

is it automatically saved or it needs to be put inplace=True just like when u used fillna?

imdevskp Over a year ago

It is automatically saved. df_train[col] = le.transform(df_train[col]) is equivalent to le.transform(df_train[col], inplace=True). You can either use the assignment operator '=' or use inplace=True. Both gives you the same result. But don't use them together.

18818181881 Over a year ago

TypeError: argument must be a string or number it produce when i build code like this

18818181881 Over a year ago

from sklearn.preprocessing import LabelEncoder cols = ['Length_Employed', 'Home_Owner', 'Income_Verified', 'Purpose_Of_Loan', 'Gender'] for col in cols: le = LabelEncoder() le.fit(df[col]) df[col] = le.transform(df[col]) df1[col] = le.transform(df1[col])

18818181881 Over a year ago

Hi, when i tried it, i also transform le.fit(df_train[col]) into le.fit(df_train[col].astype('str')) and it worked

|

Ludi · Accepted Answer · 2021-04-26 07:08:47Z

0

Have you tried apply(), yet ?

 le = LabelEncoder()
 df['colors'] = df['colors'].apply(lambda x: le.fit_transform(x))

answered Apr 26, 2021 at 7:08

Ludi

4734 silver badges18 bronze badges

Collectives™ on Stack Overflow

how to encoding several column (but not all column) in dataframe python using pandas

2 Answers 2

10 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related