“Hybrid Sentiment
Analysis Utilizing Multiple
Indicators To Determine
Temporal Shifts of
Opinion in OSNs”
April 19th
, 2016
Joshua White, Robert Hall,
Jeremy Fields, Holly White
2

Introduction

Shifts in Opinions

Dataset
– Dataset Storage Schema

Analysis
– Language Characteristics
– Demographic Characteristics
• Gender
• Location
• Group Affiliation

Conclusion / Future Work

References / Contact Info
Overview
3

Social networks allow individuals to share ideals with like minded
people at a faster/broader rate than ever before.
– This is true for “extreme” ideals as well (Danger)

We continue to attempt to understand the mechanisms of change
in opinion
– Both public opinion and individuals (over time, not suddenly)

Two Major Findings:
– We find that groups are affected most by high confidence level “experts”,
typically males, who imbue trust
• Equally, Undecided or uninformed individuals have a positive affect on these
groups . (Increasing group rationality)
– We find clusters of low confidence, like minded individuals, increase
overall confidence in a group through positive feedback mechanisms
• Women are more likely to comprise the other two groups [1]
Introduction
4

Shifts of public opinion has been the object of
research for some time (psychology / sociology)
– Doing so at scale is fairly new
– Most progress in the area has resulted from increased
computation capabilities
• The ability to simulate or replay long term changes
– Actual lab investigations at this level would be impractical
– Researchers have identified three primary actors in
change of sentiment (As discussed previously):
• The expert
• The undecided/uninformed
• Clusters of low confidence individuals
Shifts in Opinions
5

Experts are actors with a high level of certainty
(confidence)
– Doesn't need to actually be an expert
– If the percentage of experts within a group hits ~15% then
they can affect group opinion
– Often the only offer vague amounts of actual knowledge

Shifts of individuals who are (uninformed or undecided),
not due to expert influence are considered to be noise

Clusters of low confidence individuals with congruent
opinions great stable state (majority rule)
– This also creates a positive “boost” feedback in their own
confidence.
Shifts in Opinions
6

Trust
– In the case of this work was found to be important when
compounded with distance of similarity
– Higher trust = higher shifts in opinion
• Especially if the trust was for an “Expert”
– Actors with similar interests were found to increase confidence in
a bidirectional manner
– Actors with high dissimilarity between ideas were found to have
negligible effects on each others opinions [2]
– Example:
• Democrats and Independents who trusted scientist became
increasingly concerned with global warming where as increased
knowledge was uncorrelated to concern in skeptics of scientists and
among Republicans [3]
Shifts in Opinions
7

Started with a series of political hashtags that
were collected as part of a previous research
project, researchers at SUNY Polytechnic
collected 9Million+ tweets from the trickler API.
Dataset Selection

This dataset is
available upon
request in full or
summarized form,
under a data sharing
agreement. A
complete summation
of the dataset is also
available in report
form.
8

As will be discussed in another presentation:
– We represent the data within a semantic model which
expresses relationships within the social network
– We define this model as Fine-Grained User Diffusion
(FGUD)
– This model allows for analytic traversal at the user level
– Sample: (:Post attribute)
Dataset Storage Schema
9

“Simple” Language Analysis
– K-Means Clustering of Shannon's Entropy
• Language Agnostic Calculation [9, 10]
• Represent the calculated entropy of each message
in the dataset as a 1-dimensional array in R and
compute the initial graph
Entropy K-Means
Analysis
10
●
Entropy scale 1-8
●
Previous work has shown that Twitter has 3 distinct
groups: Human, Bot, Cyborg
Analysis
11

Allowing K-Means automatic cluster number
selection, we get 27 distinct groups:
Analysis
12

Gender Detection
– Both, name (if known) of author, and message content
is used
– Utilizes a Naive Bayesian classifier based on Mustafa
Atik, and Nejdet Yucesoy’s, (Genderizer) [13]
• Gender was determined for 82.05% of all messages
– Did not use S. Sakaki, et. al method combined gender
inference due to the 6 fold increase in computation for
0.48% increased detection
Analysis
13

Time Zone subdivision
– Dataset contained only 0.116% geo-tagged
– Cheap Geo-inferencing
– Concentrated on only US Time Zones
– Broke into Male/Female for each
Analysis
14

Still working to impliment: M. Conover et. al.
work “Predicting the Political Alignment of
Twitter Users” [15].
– This is a TF-IDF (Term Frequency – Inverse
Document Frequency) method
– Allows categorization of “Left” and “Right”
affiliations
– This method has not been implemented on data
subsets like ours: (human only, gender, and
geographic specific)
Analysis (Group Affiliation Issue)
15

M. Conover et. al. work only addresses network
membership and use of specific hashtags
– Leaves out a number of scenarios:
• Joining a network just to troll it or try to sway others
• Frequent communication with a group/network that
they are not a part of, etc.
Analysis (Group Affiliation Issue)
16

Presented a down selection approach to select posts

Examined group affiliation detection and found that
work needs to be done in this area before methods can
be implemented in order to lower inaccuracies

We are continuing this work currently
– Traversing and collecting “snapshots” of all posts,
following/followed relationships, profiles at moments in
time
– 1 complete snapshot of the same accounts each quarter
for 1.5 years before and after the 2016 US presidential
election
– Measuring resultant changes in individuals
Conclusion / Future Work
17
For more information
contact:
Joshua S. White
Josh@rsignia.com
References / Contact Info

Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Determine Temporal Shifts of Opinion in OSNs

  • 1.
    “Hybrid Sentiment Analysis UtilizingMultiple Indicators To Determine Temporal Shifts of Opinion in OSNs” April 19th , 2016 Joshua White, Robert Hall, Jeremy Fields, Holly White
  • 2.
    2  Introduction  Shifts in Opinions  Dataset –Dataset Storage Schema  Analysis – Language Characteristics – Demographic Characteristics • Gender • Location • Group Affiliation  Conclusion / Future Work  References / Contact Info Overview
  • 3.
    3  Social networks allowindividuals to share ideals with like minded people at a faster/broader rate than ever before. – This is true for “extreme” ideals as well (Danger)  We continue to attempt to understand the mechanisms of change in opinion – Both public opinion and individuals (over time, not suddenly)  Two Major Findings: – We find that groups are affected most by high confidence level “experts”, typically males, who imbue trust • Equally, Undecided or uninformed individuals have a positive affect on these groups . (Increasing group rationality) – We find clusters of low confidence, like minded individuals, increase overall confidence in a group through positive feedback mechanisms • Women are more likely to comprise the other two groups [1] Introduction
  • 4.
    4  Shifts of publicopinion has been the object of research for some time (psychology / sociology) – Doing so at scale is fairly new – Most progress in the area has resulted from increased computation capabilities • The ability to simulate or replay long term changes – Actual lab investigations at this level would be impractical – Researchers have identified three primary actors in change of sentiment (As discussed previously): • The expert • The undecided/uninformed • Clusters of low confidence individuals Shifts in Opinions
  • 5.
    5  Experts are actorswith a high level of certainty (confidence) – Doesn't need to actually be an expert – If the percentage of experts within a group hits ~15% then they can affect group opinion – Often the only offer vague amounts of actual knowledge  Shifts of individuals who are (uninformed or undecided), not due to expert influence are considered to be noise  Clusters of low confidence individuals with congruent opinions great stable state (majority rule) – This also creates a positive “boost” feedback in their own confidence. Shifts in Opinions
  • 6.
    6  Trust – In thecase of this work was found to be important when compounded with distance of similarity – Higher trust = higher shifts in opinion • Especially if the trust was for an “Expert” – Actors with similar interests were found to increase confidence in a bidirectional manner – Actors with high dissimilarity between ideas were found to have negligible effects on each others opinions [2] – Example: • Democrats and Independents who trusted scientist became increasingly concerned with global warming where as increased knowledge was uncorrelated to concern in skeptics of scientists and among Republicans [3] Shifts in Opinions
  • 7.
    7  Started with aseries of political hashtags that were collected as part of a previous research project, researchers at SUNY Polytechnic collected 9Million+ tweets from the trickler API. Dataset Selection  This dataset is available upon request in full or summarized form, under a data sharing agreement. A complete summation of the dataset is also available in report form.
  • 8.
    8  As will bediscussed in another presentation: – We represent the data within a semantic model which expresses relationships within the social network – We define this model as Fine-Grained User Diffusion (FGUD) – This model allows for analytic traversal at the user level – Sample: (:Post attribute) Dataset Storage Schema
  • 9.
    9  “Simple” Language Analysis –K-Means Clustering of Shannon's Entropy • Language Agnostic Calculation [9, 10] • Represent the calculated entropy of each message in the dataset as a 1-dimensional array in R and compute the initial graph Entropy K-Means Analysis
  • 10.
    10 ● Entropy scale 1-8 ● Previouswork has shown that Twitter has 3 distinct groups: Human, Bot, Cyborg Analysis
  • 11.
    11  Allowing K-Means automaticcluster number selection, we get 27 distinct groups: Analysis
  • 12.
    12  Gender Detection – Both,name (if known) of author, and message content is used – Utilizes a Naive Bayesian classifier based on Mustafa Atik, and Nejdet Yucesoy’s, (Genderizer) [13] • Gender was determined for 82.05% of all messages – Did not use S. Sakaki, et. al method combined gender inference due to the 6 fold increase in computation for 0.48% increased detection Analysis
  • 13.
    13  Time Zone subdivision –Dataset contained only 0.116% geo-tagged – Cheap Geo-inferencing – Concentrated on only US Time Zones – Broke into Male/Female for each Analysis
  • 14.
    14  Still working toimpliment: M. Conover et. al. work “Predicting the Political Alignment of Twitter Users” [15]. – This is a TF-IDF (Term Frequency – Inverse Document Frequency) method – Allows categorization of “Left” and “Right” affiliations – This method has not been implemented on data subsets like ours: (human only, gender, and geographic specific) Analysis (Group Affiliation Issue)
  • 15.
    15  M. Conover et.al. work only addresses network membership and use of specific hashtags – Leaves out a number of scenarios: • Joining a network just to troll it or try to sway others • Frequent communication with a group/network that they are not a part of, etc. Analysis (Group Affiliation Issue)
  • 16.
    16  Presented a downselection approach to select posts  Examined group affiliation detection and found that work needs to be done in this area before methods can be implemented in order to lower inaccuracies  We are continuing this work currently – Traversing and collecting “snapshots” of all posts, following/followed relationships, profiles at moments in time – 1 complete snapshot of the same accounts each quarter for 1.5 years before and after the 2016 US presidential election – Measuring resultant changes in individuals Conclusion / Future Work
  • 17.
    17 For more information contact: JoshuaS. White Josh@rsignia.com References / Contact Info