2

I have tried to write a code that will classify comments into positive and negative (0 for negative and 1 for positive).

I have a pandas dataframe with two columns, comments and results. I have used Logistic Regression in Python Scikit-Learn library (I will try other classifiers such as Decision Tree, SVM, KNN...) but it gives me an error (I want to do this without sentiment analysis). I think that the problem is because i input a string not a number. My program should take a comment (string value) and to evaluate it is it 0 or 1. This is the code:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model



full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)

This is the error that I get:

ValueError: X has 1 features per sample; expecting 5155

I have also tried this:

input_values = ["I love this comment"]

prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]

And I get this error:

ValueError: X has 3 features per sample; expecting ...
4
  • You need to do cv.fit_transform(input_values) and then feed it's output to logistic_regression.predict(); Commented Jul 12, 2019 at 11:57
  • @Vishal I have tried that, but It does not work. I have also updated the question, please check it Commented Jul 12, 2019 at 12:05
  • Sorry just use cv.transform() method Commented Jul 12, 2019 at 12:09
  • Can you show me that In code. I have updated the question and added the sample data Commented Jul 12, 2019 at 12:11

1 Answer 1

5
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model

full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)

Output: 0

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.