I am trying to do some classification on customer emails.
- Is the email happy or sad (sentiment analysis)
- Is the email related to billing or not.
I am using Python3 and think I have to use nltk and scikit NLTK - will help understand and read the text I beleive scikit - will do the classification (happy, sad and billing or not)
Training data set 1: A few phrases...anywhere from one word to a sentence with 5 to 6 words. (1 being happy and 0 being not happy)...a few examples below
- Apprecaite the help..1
- great job..1
- Awesome..1
- terrible..0
- confusing...0
- slow down...0
Training data set 2: a few phrases indicating billing related question..(few examples below)
- question on my bill
- billing fee
- my bill is too high
- payment rejected
Now this seems to be straight forward from a concept stand point where can I find some basic code, that will tell me
- how I can use my own training data
- how I can load the email text as input and spit out an answer happy or sad...and billing or not.