I've been using NLTK in python for doing sentiment analysis, it only has positive, neutral and negative class, what if we want to do sentiment analysis and having a number to show how much a sentence can be negative or positive. Sort of seeing it as a regression problem. Is there any pre-trained library out there to do so?
1 Answer
I know of a few ways to do this:
- Vader returns score as a gradation (between zero and one)
- Stanford NLP returns a categorical classification (i.e. 0, 1, 2, 3).
An NLTK way:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as sia
sentences = ['This is the worst lunch I ever had!',
'This is the best lunch I have ever had!!',
'I don\'t like this lunch.',
'I eat food for lunch.',
'Red is a color.',
'A really bad, horrible book, the plot was .']
hal = sia()
for sentence in sentences:
print(sentence)
ps = hal.polarity_scores(sentence)
for k in sorted(ps):
print('\t{}: {:>1.4}'.format(k, ps[k]), end=' ')
print()
Example output:
This is the worst lunch I ever had!
compound: -0.6588 neg: 0.423 neu: 0.577 pos: 0.0
A Stanford-NLP, Python way:
(Note that this way requires you to start an instance of the CoreNLP server to run e.g.: java -mx1g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000)
from pycorenlp import StanfordCoreNLP
stanford = StanfordCoreNLP('http://localhost:9000')
for sentence in sentences:
print(sentence)
result = stanford.annotate(sentence,
properties={
'annotators': 'sentiment',
'outputFormat': 'json',
'timeout': '5000'
})
for s in result['sentences']:
score = (s['sentimentValue'], s['sentiment'])
print(f'\tScore: {score[0]}, Value: {score[1]}')
Example output:
This is the worst lunch I ever had!
Score: 0, Value: Verynegative