Sentiment Analysis 
A descriptive approach towards its Classi
cation 
Techniques 
Sangeeth Nagarajan 
MR1-CSE Roll No :13 
Guided By 
Asst Prof: Rejimoan R 
July 29, 2014 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 1 / 54
Content 
1 Introdction 
2 Sentiment Analysis 
3 Related Works 
4 General Procedure 
5 Data Preparation And Feature Extraction 
6 Fuzzy Control System for Sentiment Analysis 
7 Neuro Fuzzy Inference System for Sentiment Analysis 
8 Hidden Markov Model for Sentiment Analysis 
9 Hybrid Structure 
10 Conclusion 
11 References 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 2 / 54
Case Study 1 
In late 1980s, a person is planning to buy a black and television . What 
can be done to verify the quality and performance of the system? 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 3 / 54
Case Study 1 
Solutions: 
He can check with the person who were using the system 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 4 / 54
Case Study 1 
Solutions: 
He can check with the person who were using the system 
He can directly discuss with customer care person. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 4 / 54
Case Study 2 
Imagine you want to buy a smartphone with latest features. What will you 
don to know the features provided by dierent companies? 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 5 / 54
Case Study 2 
Solutions: 
He can check with the person who were using the system 
He can directly discuss with customer care person. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 6 / 54
Case Study 2 
Solutions: 
He can check with the person who were using the system 
He can directly discuss with customer care person. 
Added to these, you can surf website that provide compartive features 
of smartphone and Users Review 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 6 / 54
Sentiment Analysis 
Process of determining the overall rating of a commodity from users 
review is called Sentiment Analysis 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 7 / 54
Introduction I 
Process used to determine the attitude/ opinion/ emotion expressed 
by a person about a particular topic. 
Uses natural language processing and text analytics to identify and 
extract subjective information in source materials. 
Automatically characterize the overall feeling or mood of consumers 
toward a speci
c brand or company and determine whether they are 
viewed positively or negatively. 
Companies and organizations are interested in
nding out customers 
opinions about products and services via social media. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 8 / 54
Introduction II 
The main Goal is for : 
Detecting whether a segment of text contains an expression of 
opinion. 
Detecting the overall polarity of the text :- positive or negative. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 9 / 54
Sentiment Analysis I 
Sentiment Analysis have many other name 
1 Opinion Extraction 
2 Opinion Mining 
3 Sentiment Mining 
4 Subjective Analysis 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 10 / 54
Sentiment Analysis II 
Why Sentiment Analysis? 
Movie : Is this review postive or negative? 
Products: What do people think about the new iPhone? 
Politics: What do people think about the candidate or issue? 
Prediction : Predict emotion outcomes or market trends from 
sentiment. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 11 / 54
Related Works I 
1 Learning Methods: The dierent learning types are as follows 
Supervised learning: Learning classi
er from training data and assign 
class labels to test data. 
Unsupervised learning: Learning without training data. 
Semi-supervised learning: Amalgamate both labeled and unlabeled 
training data. 
2 Classi
cation methods: There are Natural Language Processing and 
pattern-based, machine learning algorithms,such as 
Naive Bayes (NB) 
Maximum Entropy (ME) 
Support Vector Machines (SVM) 
Fuzzy Interface System Class
cation 
Neural Fuzzy Interface System Class
cation 
Hidden Markov Model 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 12 / 54
Related Works II 
3 Feature Extaction Methods: There are four feature categories feature 
extraction methods used in sentiment analysis studies. These include 
Syntactic Feature 
Semantic Feature 
Link-Based Feature 
Stylistic Feature 
Based On Occurences of Word in Corpus 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 13 / 54
General Procedure I 
Figure : Steps And Techniques used in Sentiment Analysis 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 14 / 54
General Procedure II 
1 Text Preprocessing:- Divided into two subcategories. 
Tokenization:- The documents are separated as tokens and used for 
further processing. 
Removal of Stop Words:- Some of the more frequently used stop words 
for English include a, of, the, I, it, you, and and these 
are generally regarded as 'functional words' which do not carry 
meaning. It is practical to remove those words which appear too often. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 15 / 54
General Procedure III 
2 Text Transformation 
The score of each sentence is calculated by sum of weight of each term 
in the corresponding sentences. 
The weight of each term is calculated by multiplication of TF and IDF 
of that word based on adjective word extracted from Parts of speech 
tags. 
TF(t) = 
P 
N 
(1) 
where, 
P=Number of times the adjective term occurs in document(d) 
N=Total Number of adjective in document (d). 
IDF (t) = log 
ND 
DF(t) 
(2) 
ND = total number of document in the document collection 
DF (t) = number of documents in which adjective term (t) 
occurs in the document collection. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 16 / 54
General Procedure IV 
3 Feature Selection 
The simplest statistical approach for feature selection is to use the 
most frequently occurring words in the corpus as polarity indicators. 
The majority of the approaches for sentiment analysis involve a 
two-step process: 
i. Identify the parts of the document to contribute the positive or 
negative sentiments. 
ii. Join these parts of the document in ways that increase the odds of the 
document falling into one of these two polar categories. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 17 / 54
General Procedure V 
4 Sentiment Classi
cation 
Classi
cation of sentences into postive, negative and neutral polarity. 
Not a clear boundary between the concepts of positive,negative 
and neutral. 
We can use fuzzy set classi
cation or machine learning methods for 
this process. 
In fuzzy set, membership function for each set is de
ned and options 
havings highest membership function is allocated to the group set. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 18 / 54
General Procedure VI 
5 Parameters for Evaluation 
Table : Contegency Table 
Correct labels 
Positive Negative 
Classi
ed 
labels 
Positive TP(True Positive) FP(False Positive) 
Negative FN(False Negative) TN(True Negative) 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 19 / 54
General Procedure VII 
Accuracy = 
TP + TN 
TP + TN + FP + FN 
(3) 
Precision = 
TP 
TP + FP 
(4) 
Recall = 
TP 
TP + FN 
(5) 
F = 
2  Precision  Recall 
Precision + Recall 
(6) 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 20 / 54
Data Preparation And Feature Extraction I 
Feature Extraction algorithm consists of two parts: 
i. Data Preparation. 
ii. Calculation of Feature Vectors. 
Data Preparation 
Use a sentiment polarity dataset v2.0. 
Machine learning based Classi
cation consist of two set 
1) Training set: Used by an automatic classi
er to learn the dierentiating 
characteristics of documents. 
2) Test set : Validate the performance of the automatic classi
er. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 21 / 54
Data Preparation And Feature Extraction II 
Operations carried out are as follows: 
* Combine all
les from the corpus and make one text
le. 
* Convert the text to an array of words. 
* Sort the array of words ascending order. 
* Code:V = fv1; :::; vMg, where M is the number of dierent words 
(terms) in the corpusCombine all
les from the corpus and make one 
text
le. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 22 / 54
Data Preparation And Feature Extraction III 
Calculation of Feature Vectors 
r2 
r1 
N is the number of classes. 
M is the number of dierent words (terms) in the corpus. 
R is the number of observed sequences in the training process. 
W = fw;w; ::;wrT 
r 
g are the reviews in the training dataset, where 
Tr is the length of r-th review, r = 1; 2; :::; R. 
i ;j describes the association between i-th term (word) and the j-th 
class. 
ci ;j is the number of times i-th term occurred in the j-th class. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 23 / 54
Data Preparation And Feature Extraction IV 
ei is the normalized entropy of the i-th term in the corpus 
ei =  1 
lgN 
NP 
j=1 
( ci ;j 
ti 
lg ci ;j 
ti 
), i = 1; :::;M;j = 1; :::;N. 
ti = 
P 
j 
ci ;j denotes the occurrence times of the i-th term in the 
corpus. 
Calucate Membership degree of term by an analytical formula: 
i ;j = 
8 
: 
ci ;j 
MP 
v=1 
cv;j 
(1ei ) 
NP 
f 
t=1 
MP 
v=1 
cv;j (1ei )g 
; ni  nmin 
0; ni  nmin 
(7) 
where nmin = 40 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 24 / 54
Fuzzy Control System for Sentiment Analysis I 
Fuzzy inference is the process of formulating the mapping from given 
input(s) to output(s) using fuzzy logic. 
The process involves membership functions, logic operations, and 
if-then rules. 
At
rst stage membership function is estimated, then apply fuzzy 
operations and modify parameters by the back-propagation 
algorithm. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 25 / 54
Fuzzy Control System for Sentiment Analysis II 
Figure : Realization scheme of fuzzy control process 
1 The membership degree of terms (ri 
;j ) of the r -th review are 
calculated by (7). 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 26 / 54
Fuzzy Control System for Sentiment Analysis III 
2 Maximum membership degree is found with respect to the classes for 
every term of the r-th sentiment 
ri 
;j = ri 
;j ; j = arg ( max 
1vN 
ri 
;v ); i = 1; ::;M: (8) 
3 Means of maxima are calculated for all classes: 
r 
j = 
P 
k2Zr 
j 
r 
k;j 
l r 
j 
; Zr 
j = fi : ri 
;v ) = max 
1vN 
ri 
;v g; j = 1; ::;N: (9) 
where l r 
j = jZr 
j j is the number of elements of the set Zr 
j 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 27 / 54
Fuzzy Control System for Sentiment Analysis IV 
Use the centre of gravity defuzzi
cation method for defuzzi
cation 
operation. 
It avoids ambiguities which may arise when an output degree of 
membership comes from more than one crisp output value. 
The objective function is de
ned as follows: 
E(y) = 
1 
2 
XR 
r=1 
( 
r 
j yj 
NP 
j=1 
r 
j 
 dr )2 ! min 
y2RN 
; (10) 
y = y1; y2; ::; yN; dr 2 f1; 2; ::;Ng 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 28 / 54
Fuzzy Control System for Sentiment Analysis V 
The partial derivatives of this function are calculated in following 
form: 
@E(y) 
@yt 
= 
XR 
r=1 
r 
j 
NP 
j=1 
r 
j 
( 
NP 
r 
j yj 
NP 
j=1 
j=1 
r 
j 
 dr ); t = 1; 2; ::N: (11) 
Rounding of y shows the index of the classes obtained in the result 
y = 
NP 
j=1 
jy 
j 
NP 
j=1 
j 
(12) 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 29 / 54
Fuzzy Control System for Sentiment Analysis VI 
Acceptance strategy (s): 
s = 
( 
is 2 I ; if y 2 (is 41; is +41) 
reject; otherwise 
(13) 
where is is the index of the appropriate class, I =1,2,...,N.Here 
41 2 [0; 0:5) is the main quantity, which in
uences the reliability of 
the system. Results of sentiment analysis of movie reviews with 
dierent values of 41 in Table 2 is as shown. 
Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 30 / 54
Fuzzy Control System for Sentiment Analysis VII 
Table : Result of FCS for Classi

Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms

  • 1.
    Sentiment Analysis Adescriptive approach towards its Classi
  • 2.
    cation Techniques SangeethNagarajan MR1-CSE Roll No :13 Guided By Asst Prof: Rejimoan R July 29, 2014 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 1 / 54
  • 3.
    Content 1 Introdction 2 Sentiment Analysis 3 Related Works 4 General Procedure 5 Data Preparation And Feature Extraction 6 Fuzzy Control System for Sentiment Analysis 7 Neuro Fuzzy Inference System for Sentiment Analysis 8 Hidden Markov Model for Sentiment Analysis 9 Hybrid Structure 10 Conclusion 11 References Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 2 / 54
  • 4.
    Case Study 1 In late 1980s, a person is planning to buy a black and television . What can be done to verify the quality and performance of the system? Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 3 / 54
  • 5.
    Case Study 1 Solutions: He can check with the person who were using the system Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 4 / 54
  • 6.
    Case Study 1 Solutions: He can check with the person who were using the system He can directly discuss with customer care person. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 4 / 54
  • 7.
    Case Study 2 Imagine you want to buy a smartphone with latest features. What will you don to know the features provided by dierent companies? Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 5 / 54
  • 8.
    Case Study 2 Solutions: He can check with the person who were using the system He can directly discuss with customer care person. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 6 / 54
  • 9.
    Case Study 2 Solutions: He can check with the person who were using the system He can directly discuss with customer care person. Added to these, you can surf website that provide compartive features of smartphone and Users Review Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 6 / 54
  • 10.
    Sentiment Analysis Processof determining the overall rating of a commodity from users review is called Sentiment Analysis Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 7 / 54
  • 11.
    Introduction I Processused to determine the attitude/ opinion/ emotion expressed by a person about a particular topic. Uses natural language processing and text analytics to identify and extract subjective information in source materials. Automatically characterize the overall feeling or mood of consumers toward a speci
  • 12.
    c brand orcompany and determine whether they are viewed positively or negatively. Companies and organizations are interested in
  • 13.
    nding out customers opinions about products and services via social media. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 8 / 54
  • 14.
    Introduction II Themain Goal is for : Detecting whether a segment of text contains an expression of opinion. Detecting the overall polarity of the text :- positive or negative. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 9 / 54
  • 15.
    Sentiment Analysis I Sentiment Analysis have many other name 1 Opinion Extraction 2 Opinion Mining 3 Sentiment Mining 4 Subjective Analysis Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 10 / 54
  • 16.
    Sentiment Analysis II Why Sentiment Analysis? Movie : Is this review postive or negative? Products: What do people think about the new iPhone? Politics: What do people think about the candidate or issue? Prediction : Predict emotion outcomes or market trends from sentiment. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 11 / 54
  • 17.
    Related Works I 1 Learning Methods: The dierent learning types are as follows Supervised learning: Learning classi
  • 18.
    er from trainingdata and assign class labels to test data. Unsupervised learning: Learning without training data. Semi-supervised learning: Amalgamate both labeled and unlabeled training data. 2 Classi
  • 19.
    cation methods: Thereare Natural Language Processing and pattern-based, machine learning algorithms,such as Naive Bayes (NB) Maximum Entropy (ME) Support Vector Machines (SVM) Fuzzy Interface System Class
  • 20.
    cation Neural FuzzyInterface System Class
  • 21.
    cation Hidden MarkovModel Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 12 / 54
  • 22.
    Related Works II 3 Feature Extaction Methods: There are four feature categories feature extraction methods used in sentiment analysis studies. These include Syntactic Feature Semantic Feature Link-Based Feature Stylistic Feature Based On Occurences of Word in Corpus Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 13 / 54
  • 23.
    General Procedure I Figure : Steps And Techniques used in Sentiment Analysis Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 14 / 54
  • 24.
    General Procedure II 1 Text Preprocessing:- Divided into two subcategories. Tokenization:- The documents are separated as tokens and used for further processing. Removal of Stop Words:- Some of the more frequently used stop words for English include a, of, the, I, it, you, and and these are generally regarded as 'functional words' which do not carry meaning. It is practical to remove those words which appear too often. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 15 / 54
  • 25.
    General Procedure III 2 Text Transformation The score of each sentence is calculated by sum of weight of each term in the corresponding sentences. The weight of each term is calculated by multiplication of TF and IDF of that word based on adjective word extracted from Parts of speech tags. TF(t) = P N (1) where, P=Number of times the adjective term occurs in document(d) N=Total Number of adjective in document (d). IDF (t) = log ND DF(t) (2) ND = total number of document in the document collection DF (t) = number of documents in which adjective term (t) occurs in the document collection. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 16 / 54
  • 26.
    General Procedure IV 3 Feature Selection The simplest statistical approach for feature selection is to use the most frequently occurring words in the corpus as polarity indicators. The majority of the approaches for sentiment analysis involve a two-step process: i. Identify the parts of the document to contribute the positive or negative sentiments. ii. Join these parts of the document in ways that increase the odds of the document falling into one of these two polar categories. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 17 / 54
  • 27.
    General Procedure V 4 Sentiment Classi
  • 28.
  • 29.
    cation of sentencesinto postive, negative and neutral polarity. Not a clear boundary between the concepts of positive,negative and neutral. We can use fuzzy set classi
  • 30.
    cation or machinelearning methods for this process. In fuzzy set, membership function for each set is de
  • 31.
    ned and options havings highest membership function is allocated to the group set. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 18 / 54
  • 32.
    General Procedure VI 5 Parameters for Evaluation Table : Contegency Table Correct labels Positive Negative Classi
  • 33.
    ed labels PositiveTP(True Positive) FP(False Positive) Negative FN(False Negative) TN(True Negative) Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 19 / 54
  • 34.
    General Procedure VII Accuracy = TP + TN TP + TN + FP + FN (3) Precision = TP TP + FP (4) Recall = TP TP + FN (5) F = 2 Precision Recall Precision + Recall (6) Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 20 / 54
  • 35.
    Data Preparation AndFeature Extraction I Feature Extraction algorithm consists of two parts: i. Data Preparation. ii. Calculation of Feature Vectors. Data Preparation Use a sentiment polarity dataset v2.0. Machine learning based Classi
  • 36.
    cation consist oftwo set 1) Training set: Used by an automatic classi
  • 37.
    er to learnthe dierentiating characteristics of documents. 2) Test set : Validate the performance of the automatic classi
  • 38.
    er. Sangeeth Nagarajan(SCTEC) Sentiment Analysis July 29, 2014 21 / 54
  • 39.
    Data Preparation AndFeature Extraction II Operations carried out are as follows: * Combine all
  • 40.
    les from thecorpus and make one text
  • 41.
    le. * Convertthe text to an array of words. * Sort the array of words ascending order. * Code:V = fv1; :::; vMg, where M is the number of dierent words (terms) in the corpusCombine all
  • 42.
    les from thecorpus and make one text
  • 43.
    le. Sangeeth Nagarajan(SCTEC) Sentiment Analysis July 29, 2014 22 / 54
  • 44.
    Data Preparation AndFeature Extraction III Calculation of Feature Vectors r2 r1 N is the number of classes. M is the number of dierent words (terms) in the corpus. R is the number of observed sequences in the training process. W = fw;w; ::;wrT r g are the reviews in the training dataset, where Tr is the length of r-th review, r = 1; 2; :::; R. i ;j describes the association between i-th term (word) and the j-th class. ci ;j is the number of times i-th term occurred in the j-th class. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 23 / 54
  • 45.
    Data Preparation AndFeature Extraction IV ei is the normalized entropy of the i-th term in the corpus ei = 1 lgN NP j=1 ( ci ;j ti lg ci ;j ti ), i = 1; :::;M;j = 1; :::;N. ti = P j ci ;j denotes the occurrence times of the i-th term in the corpus. Calucate Membership degree of term by an analytical formula: i ;j = 8 : ci ;j MP v=1 cv;j (1ei ) NP f t=1 MP v=1 cv;j (1ei )g ; ni nmin 0; ni nmin (7) where nmin = 40 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 24 / 54
  • 46.
    Fuzzy Control Systemfor Sentiment Analysis I Fuzzy inference is the process of formulating the mapping from given input(s) to output(s) using fuzzy logic. The process involves membership functions, logic operations, and if-then rules. At
  • 47.
    rst stage membershipfunction is estimated, then apply fuzzy operations and modify parameters by the back-propagation algorithm. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 25 / 54
  • 48.
    Fuzzy Control Systemfor Sentiment Analysis II Figure : Realization scheme of fuzzy control process 1 The membership degree of terms (ri ;j ) of the r -th review are calculated by (7). Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 26 / 54
  • 49.
    Fuzzy Control Systemfor Sentiment Analysis III 2 Maximum membership degree is found with respect to the classes for every term of the r-th sentiment ri ;j = ri ;j ; j = arg ( max 1vN ri ;v ); i = 1; ::;M: (8) 3 Means of maxima are calculated for all classes: r j = P k2Zr j r k;j l r j ; Zr j = fi : ri ;v ) = max 1vN ri ;v g; j = 1; ::;N: (9) where l r j = jZr j j is the number of elements of the set Zr j Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 27 / 54
  • 50.
    Fuzzy Control Systemfor Sentiment Analysis IV Use the centre of gravity defuzzi
  • 51.
  • 52.
    cation operation. Itavoids ambiguities which may arise when an output degree of membership comes from more than one crisp output value. The objective function is de
  • 53.
    ned as follows: E(y) = 1 2 XR r=1 ( r j yj NP j=1 r j dr )2 ! min y2RN ; (10) y = y1; y2; ::; yN; dr 2 f1; 2; ::;Ng Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 28 / 54
  • 54.
    Fuzzy Control Systemfor Sentiment Analysis V The partial derivatives of this function are calculated in following form: @E(y) @yt = XR r=1 r j NP j=1 r j ( NP r j yj NP j=1 j=1 r j dr ); t = 1; 2; ::N: (11) Rounding of y shows the index of the classes obtained in the result y = NP j=1 jy j NP j=1 j (12) Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 29 / 54
  • 55.
    Fuzzy Control Systemfor Sentiment Analysis VI Acceptance strategy (s): s = ( is 2 I ; if y 2 (is 41; is +41) reject; otherwise (13) where is is the index of the appropriate class, I =1,2,...,N.Here 41 2 [0; 0:5) is the main quantity, which in uences the reliability of the system. Results of sentiment analysis of movie reviews with dierent values of 41 in Table 2 is as shown. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 30 / 54
  • 56.
    Fuzzy Control Systemfor Sentiment Analysis VII Table : Result of FCS for Classi
  • 57.
    cation of moviereviews Folds 41 = 0:4 42 = 0:45 No Rejection Corr(%) Rej(%) Err(%) Corr(%) Rej(%) Err(%) Correct(%) 1 65 24.5 10.5 74.5 12 13.5 81 2 73.5 19 7.5 79.5 9 11.5 84 3 66 24.5 9.5 73.5 12.5 14 81 4 71.5 22 6.5 77 11.5 11.5 83.5 5 72.5 19.5 8 81 7.5 11.5 84.5 6 70 19.5 10.5 77.5 7.5 15 81.5 7 70.5 19 10.5 76.5 9 14.5 81 8 69 22 9 75.5 11.5 13 82 9 66 24 10 73 12 15 81.5 10 71.5 19.5 9 80 7.5 12.5 84.5 Averg 69.55 21.35 9.1 76.8 10 13.2 82.45 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 31 / 54
  • 58.
    Neuro Fuzzy InferenceSystem for Sentiment Analysis I Fuzzy logic systems are good at explaining their decisions, but they cannot automatically acquire the rules they use to make those decisions. Neural networks are good at recognizing patterns, they are not good at explaining how they reach their decisions. Creation of intelligent hybrid systems where two or more techniques are combined in an appropriate manner can overcome the limitations of individual techniques Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 32 / 54
  • 59.
    Neuro Fuzzy InferenceSystem for Sentiment Analysis II Figure : The Structure of ANFIS Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 33 / 54
  • 60.
    Neuro Fuzzy InferenceSystem for Sentiment Analysis III Table : Results of ANFIS for Classi
  • 61.
    cation of moviereviews Folds 42 = 0:5;43 = 0:5 42 = 0:1;43 = 0:5 No Rejection Corr(%) Rej(%) Err(%) Corr(%) Rej(%) Err(%) Correct(%) 1 63.5 26.5 10 73.5 12.5 14 81 2 68 26 6 79 10 11 85.5 3 65 27 8 72.5 15.5 12 81 4 70.5 23.5 6 77 11 12 835 5 64 29.5 6.5 80 9 11 86 6 69 21 10 76 10 14 82.5 7 70 21 9 77 8 15 81.5 8 65.5 26 8.5 75 12 13 82.5 9 66 22.5 11.5 73.5 13 13.5 81 10 68.5 23 8.5 80 7.5 12.5 85 Averg 67 24 8.4 76.35 10.85 12.8 83 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 34 / 54
  • 62.
    Hidden Markov ModelFor Sentiment Analysis I Bayes' theorem Bayes' theorem gives the relationship between the probabilities of A and B, P(A) and P(B), and the conditional probabilities of A given B and B given A, P(AjB) and P(BjA) . In its most common form, it is: P(AjB) = P(BjA) P(B) P(A) (14) Helps to use a known outcome to predict the sequence of events leading up to that outcome. Example We need to know which party is ruling based on tax cut. Let there be two party A and B. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 35 / 54
  • 63.
    Hidden Markov ModelFor Sentiment Analysis II We know since there is two parties, probability of each party to rule is 0.5. i.e. * P(A) = 0.5 * P(B) = 0.5 From the previous history we can get the details of probability of tax cut given party A or B was elected. Let * P(tjA) = 0.25, then P(t0jA)= 0.75 * P(tjB) = 0.85, then P(t0jB)= 0.15 Figure : Tree Model Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 36 / 54
  • 64.
    Hidden Markov ModelFor Sentiment Analysis III From the tree model, we can calculate the probability of P(t). P(t) = P(tjA) P(A) + P(tjB) P(B) = (0:25 0:5) + (0:85 0:5) = 0:125 + 0:425 = 0:55 Therefore by Bayes' Theorem, P(Bjt) = P(tjB) P(B)=P(t) = (0:85 0:5)=0:55 = 0:772 P(Ajt) = P(tjA) P(A)=P(t) = (0:25 0:5)=0:55 = 0:227 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 37 / 54
  • 65.
    Hidden Markov ModelFor Sentiment Analysis IV Markov Model Figure : Markov Model 3 states- Bull,Bear and Even 3 observations- Up, Down and Unchanged. For given a sequence of observations,up-down-down, the state sequences is Bull-Bear-Bear Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 38 / 54
  • 66.
    Hidden Markov ModelFor Sentiment Analysis V Figure : Hidden Markov Model The key dierence is that if we have the observation sequence, up-down-down, then we cannot say exactly what state sequence produced and thus the state sequence is hidden. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 39 / 54
  • 67.
    Hidden Markov ModelFor Sentiment Analysis VI Calculate the probability that the model produced the sequence, as well as which state sequence was most likely to have produced the observations. Applied in many areas of signal processing, and in particular speech processing. Also been applied with success to low level NLP tasks such as part-of-speech tagging, phrase chunking, and extracting target information from documents. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 40 / 54
  • 68.
    Hidden Markov ModelFor Sentiment Analysis VII The parameters of the HMM applied in the system as follows 1 N is the number of states. 2 M is the number of dierent words (terms) of reviews taking part in the training process for the given problem. 3 V includes all possible observations sets, V = fv1; :::; vMg (The elements of these sets in the understanding problem are dierent words than are in the reviews taking part in the training process) 4 = figi = 1N are initial state distributions: i = P(q1 = i) Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 41 / 54
  • 69.
    Hidden Markov ModelFor Sentiment Analysis VIII 5 A = bai ;j c is the state transition probability matrix, ai ;j = P(qt = 1jqj = i); 1 i ; j N. 6 B = fbj (ot )gNj =1 are the state-dependent observation probabilities. Here, for every state j, bj (ot ) = P(ot jqt = j) is the probability distribution of words occurring in states. 7 O(r ) = [o(r ) 1 ; o(r ) 2 ; :::; o(r ) T ] are the observation sequences, where R is the number of observed sequences, Tr is the length of r-th observed sequence, Tr T , T is the given quantity, r=1,2 ,...,R. The parameters of the HMM are estimated according to each of the corresponding classes and trained by the Baum-Welch algorithm. The calculated probabilities are passed to a decision-making block. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 42 / 54
  • 70.
    Hidden Markov ModelFor Sentiment Analysis IX Table : Results of HMM For Polarity Reviews Folds 1 state 2 state 3 state 5 state 1 79.5 81 84 78 2 83.5 83.5 82 82 3 77.5 81.5 81 79.5 4 80.5 82.5 84.5 81 5 84.5 83 86.5 81 6 82.5 82.5 83.5 81 7 82.5 83 82 80 8 83.5 84.5 84.5 84 9 77 79 78.5 77 10 83.5 83.5 83 83 Averg 81.45 82.4 82.95 80.65 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 43 / 54
  • 71.
    Hybrid Structure I Hybrid-I. This system con
  • 72.
  • 73.
    ed by theFCS, ANFIS and HMM approaches. If some of these models discard understanding, then the system does not accept any decision. This system prevents the error in the understanding process and therefore is more reliable. Table : Results of Hybrid-I Folds FCS(%) ANFIS(%) HMM-3(%) Hybrid- I Corr(%) Rej(%) Err(%) 1 81 81.5 84 75.5 14 10.5 2 84 85.5 82 77.5 12 10.5 3 81 81 81 74.5 12.5 13 4 83.5 83.5 84.5 76.5 15 8.5 5 84.5 86 86.5 80 11 9 6 81.5 82.5 83.5 78 9 13 7 81 81.5 82 74.5 14 11.5 8 82 82.5 84.5 78.5 10 11.5 9 81.5 81 78.5 74 12 14 10 84.5 85 83 80 7.5 12.5 Averg 82.45 83 82.95 76.9 11.7 11.4 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 44 / 54
  • 74.
    Hybrid Structure II Hybrid-II. The method suggest in this system is a sequential method. The procedure is that if one classi
  • 75.
    er fails toclassify a document, the classi
  • 76.
    er will passthe document onto the next classi
  • 77.
    er, until the document is classi
  • 78.
    ed or noother classi
  • 79.
    er exists. Thisapproach minimizes the number of rejected reviews. Table : Results of Hybrid-II Folds ANFIS(%) HMM-3(%) Hybrid- II(%) Corr(%) Rej(%) Err(%) 1 60 32 8 84 84 2 61 35 4 82 84 3 60.5 34.5 5 81 81.5 4 66.5 30 3.5 84.5 87 5 58 37 5 86.5 87.5 6 66 26 8 83.5 84 7 63.5 29.5 7 82 83 8 63.5 30 6.5 84.5 84.5 9 61 30 9 78.5 79.5 10 63 30 7 83 84.5 Averg 62.3 31.4 6.3 82.95 83.95 Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 45 / 54
  • 80.
    Questions Sangeeth Nagarajan(SCTEC) Sentiment Analysis July 29, 2014 46 / 54
  • 81.
    Conclusion Sentiment analysisis the process used to determine the attitude/opinion/emotion expressed by a person about a particular topic. Sentiment analysis or opinion mining uses natural language processing and text analytics to identify and extract subjective information in source materials. Several Machine learning algorithms can be used for classi
  • 82.
    cation of document-levelsentence:-SVM, NB, ME, HMM, FCS and ANFIS The combination of multiple classi
  • 83.
    ers can resultin better accuracy than that achieved by any individual classi
  • 84.
    er. Sangeeth Nagarajan(SCTEC) Sentiment Analysis July 29, 2014 47 / 54
  • 85.
    Reference I 1A. Abbasi, H. C., and Salem, A. Sentiment analysis in multiple languages: Feature selection for opinion classi
  • 86.
    cation in webforums. ACM Trans. Inf. Syst., 26(3):134 (2008). 2 B. Pang, L. L., and Vaithyanathan, S. Thumbs up? sentiment classi
  • 87.
    cation using machinelearning techniques. In In Proceedings of CoRR (2002). 3 Blunsom, P. Hidden Markov Models Lecture notes. 2004. 4 C. Whitelaw, N. G., and Argamon, S. Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM Conference on Information and Knowledge Management. (2005). 5 D.M. Blei, Ng, Y. A., and Jordan, M. Latent dirichlet allocation. In Journal of Machine Learning Research (2003). 6 D.Rutkovskiy, M.Pilinskiy, L. Neural networks, genetic algorithms and fuzzy systems. 2006. Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 48 / 54
  • 88.
    Reference II 7Efron, M. Cultural orientations: Classifying subjective documents by cocitation analysis. In Proceedings of the AAAI Fall Symposium Series on Style and Meaning in Language, Art, Music, and Design (2004). 8 Fuller, R. Neural Fuzzy Systems. 1995. 9 He, Y. Incorporating sentiment prior knowledge for weakly- supervised sentiment analysis. ACM TALIP (2012). 10 J. Carrillo, L. P., and Gervas, P. A hybrid approach to emotional sentence polarity and intensity classi
  • 89.
    cation. In CoNLL(2010). 11 K.R. Aida-zade, S. R., and U, Ch, B. The application of hidden markov model in human-computer dialogue understanding system. In Trans. of ANAS, series of physical-mathematical and technical sciences, Baku, vol. XXXII, No 3, (2012). Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 49 / 54
  • 90.
    Reference III 12Lin, C., and He., Y. Joint sentiment/ topic model for sentiment analysis. In In CIKM 09: Proceeding of the 18th ACM conference on Information and knowledge management, New York, USA, ACM (2009). 13 M. Helmi, S. M. T. A. Human activity recognition using a fuzzy inference system. FUZZ-IEEE 2009, Korea (2009). 14 M. Taboada, J. Brooke, M. T. K. V., and Stede, M. Lexicon- based methods for sentiment analysis. In Computational Linguistics (2011). 15 Martineau, J., and Finin, T. Delta t
  • 91.
    df: An improvedfeature space for sentiment analysis. In Proceedings of the 3rd AAAI International Conference on Weblogs and Social Media (2009). 16 Meena, A., and Prabhakar, T. Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. In In Proceedings of ECIR (2007). Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 50 / 54
  • 92.
    Reference IV 17Ms K Mouthami, M. K. N. D., and Bhaskaran, D. M. Senti- ment analysis and classi
  • 93.
    cation based ontexual reviews. IEEE ASSP Magazine (2013). 18 Mullen, T., and Collier, N. Sentiment analysis using support vector machines with diverse information sources. In In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP, Barcelona, Spain, July. Association for Computational Linguistics (2004). 19 O.F. Zaidan, J. E., and Piatko, C. Using annotator rationales to improve machine learning for text categorization. In Proceedings of NAACL HLT (2007). 20 Paltoglou, G., and Thelwall, M. A study of information retrieval weighting schemes for sentiment analysis. ACL (2010). 21 Pang, B., and Lee, L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceed- ings of the ACL (2004). Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 51 / 54
  • 94.
    Reference V 22Pang, B., and Lee, L. Opinion mining and sentiment analysis. Now Publishers Inc. (2008). 23 Prabowo, R., and Thelwall, M. Sentiment analysis: A combined approach. In Journal of Informetrics (2009). 24 Samir Rustamov, E. M., and Clements, M. A. Sentiment analysis using neuro-fuzzy and hidden markov model of text. IEE Proceedings. Nanobiotechnology (2014). 25 Sh. Gao, W. Wu, C. L. T. C. A maximal
  • 95.
    gure-of-merit (mfom)- learningapproach to robast classi
  • 96.
    er design fortext categorization. ACM Transactions on Information Systems, Vol. 24, No. 2 (2006). 26 T. Wilson, J. W., and Homan, P. Recognizing contextual po- larity in phraselevel sentiment analysis. In In Proceedings of the HLT- EMNLP (2005). Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 52 / 54
  • 97.
    Reference VI 27Turney, P. Thumbs up or thumbs down?: Semantic orientation ap- plied to unsupervised classi
  • 98.
    cation of reviews.In Proceedings of the 40th Annual Meeting of the ACL (2002). Sangeeth Nagarajan (SCTEC) Sentiment Analysis July 29, 2014 53 / 54
  • 99.
    Sangeeth Nagarajan (SCTEC)Sentiment Analysis July 29, 2014 54 / 54