simple nltk sentiment analysis code using python3

Question

I am trying to do some classification on customer emails.

Is the email happy or sad (sentiment analysis)
Is the email related to billing or not.

I am using Python3 and think I have to use nltk and scikit NLTK - will help understand and read the text I beleive scikit - will do the classification (happy, sad and billing or not)

Training data set 1: A few phrases...anywhere from one word to a sentence with 5 to 6 words. (1 being happy and 0 being not happy)...a few examples below

Apprecaite the help..1
great job..1
Awesome..1
terrible..0
confusing...0
slow down...0

Training data set 2: a few phrases indicating billing related question..(few examples below)

question on my bill
billing fee
my bill is too high
payment rejected

Now this seems to be straight forward from a concept stand point where can I find some basic code, that will tell me

how I can use my own training data
how I can load the email text as input and spit out an answer happy or sad...and billing or not.

Useful: github.com/hb20007/hands-on-nltk-tutorial/blob/master/… — hb20007
– hb20007, Commented May 17, 2018 at 12:50

clemtoy · Accepted Answer · 2015-07-15 17:22:31Z

3

Regarding your data sets, your approach is nearly lexicon-based as the items contains very few words.

For billing, the lexicon-based approach should be a good idea. You should give importance to the subjects of the emails.

For sentiment analysis you have two options:

Machine learning: In this case you should use a bigger data set (in my view, each item should be a full email). You can implement a Naive Bayes classifier following this tutorial.
Lexicon-based approach: There are several lexicons for sentiment analysis e.g. SentiWordNet (downloadable from nltk.download()), MPQA, SentiStrength, WordNet-Affect via WNAffect,... Preprocessings: tokenization (nltk.word_tokenize()) and POS tagging (nltk.pos_tag(text)). You should also think about negation (polarity shifting is a good approach to manage with negation).

Machine Learning provide best results so if you have enough annotated emails it is the good choice.

edited Jul 15, 2015 at 17:22

answered Jul 11, 2015 at 9:56

clemtoy

1,7412 gold badges20 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kumar Over a year ago

thx Clemtoy..! further question to followup on the lexicon based approach.(billing) .I'm going to be using NLTK to derive meaning full data from my text(remove stop words..etc)...then do I simply compare words to my own training data ? (billing phrases ?) #1. compare single words with single words in my training data....#2 compare bi grams with 2 word phrases from my data...#3 compare n (3 word) grams with n (4 words) in my training data...and then 4 words till I am thinkin maybe 7 word phrases is the max I have for now..ex. "I have a question on my bill".. so guess I look and compare ngrams?

clemtoy Over a year ago

You can try to do this yes!

Kumar Over a year ago

btw...other than emails...which is going to be a small portion of my data...majority is going to be phone calls transcribed to text...will keep my fingers crossed !

Collectives™ on Stack Overflow

simple nltk sentiment analysis code using python3

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related