Classifying comments into positive and negative using Scikit-Learn with Python

Question

I have tried to write a code that will classify comments into positive and negative (0 for negative and 1 for positive).

I have a pandas dataframe with two columns, comments and results. I have used Logistic Regression in Python Scikit-Learn library (I will try other classifiers such as Decision Tree, SVM, KNN...) but it gives me an error (I want to do this without sentiment analysis). I think that the problem is because i input a string not a number. My program should take a comment (string value) and to evaluate it is it 0 or 1. This is the code:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model



full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)

This is the error that I get:

ValueError: X has 1 features per sample; expecting 5155

I have also tried this:

input_values = ["I love this comment"]

prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]

And I get this error:

ValueError: X has 3 features per sample; expecting ...

You need to do cv.fit_transform(input_values) and then feed it's output to logistic_regression.predict(); — vb_rises
– vb_rises, Commented Jul 12, 2019 at 11:57
@Vishal I have tried that, but It does not work. I have also updated the question, please check it — taga
– taga, Commented Jul 12, 2019 at 12:05
Can you show me that In code. I have updated the question and added the sample data — taga
– taga, Commented Jul 12, 2019 at 12:11

vb_rises · Accepted Answer · 2019-07-12 12:15:59Z

5

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model

full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)

Output: 0

answered Jul 12, 2019 at 12:15

vb_rises

1,9071 gold badge11 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Classifying comments into positive and negative using Scikit-Learn with Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related