1

I want to build a naive bayes model using two dataframes (test dataframe, train dataframe)

The dataframe contains 13 columns, but I just want to encode the dataframe from str to int value in just 5-6 columns. How can I do that with one code so that 6 columns can directly be encoded, I follow this answer:

https://stackoverflow.com/a/37159615/12977554

import pandas as pd
from sklearn.preprocessing import LabelEncoder

    df = pd.DataFrame({
    'colors':  ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
    'skills':  ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
    })
    
    def encode_df(dataframe):
        le = LabelEncoder()
        for column in dataframe.columns:
            dataframe[column] = le.fit_transform(dataframe[column])
        return dataframe
    
    #encode the dataframe
    encode_df(df)

but it just only encodes 1 column, instead what I want is 6 columns with 1 code.

2 Answers 2

1

You can loop through the columns and fit_transform

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6']

for col in cols:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col].astype('str'))
    
df

Ideally you want to use same trasnfomer for both train and test dataset
For that you need to use

for col in cols:
    le = LabelEncoder()
    le.fit(df_train[col].astype('str'))
    df_train[col] = le.transform(df_train[col].astype('str'))
    df_test[col] = le.transform(df_test[col].astype('str'))
        
df
Sign up to request clarification or add additional context in comments.

10 Comments

is it automatically saved or it needs to be put inplace=True just like when u used fillna?
It is automatically saved. df_train[col] = le.transform(df_train[col]) is equivalent to le.transform(df_train[col], inplace=True). You can either use the assignment operator '=' or use inplace=True. Both gives you the same result. But don't use them together.
TypeError: argument must be a string or number it produce when i build code like this
from sklearn.preprocessing import LabelEncoder cols = ['Length_Employed', 'Home_Owner', 'Income_Verified', 'Purpose_Of_Loan', 'Gender'] for col in cols: le = LabelEncoder() le.fit(df[col]) df[col] = le.transform(df[col]) df1[col] = le.transform(df1[col])
Hi, when i tried it, i also transform le.fit(df_train[col]) into le.fit(df_train[col].astype('str')) and it worked
|
0

Have you tried apply(), yet ?

 le = LabelEncoder()
 df['colors'] = df['colors'].apply(lambda x: le.fit_transform(x))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.