ONE DOES NOT SIMPLY
CROWDSOURCE THE
SEMANTIC WEB
TECHNOLOGY DESIGN AND INCENTIVES
Elena Simperl
e.simperl@soton.ac.uk
@esimperl
January 26th, 2016
1
CROWDSOURCING
PROBLEM SOLVING VIA OPEN CALLS
“Crowdsourcing represents the act of a company or
institution taking a function once performed by
employees and outsourcing it to an undefined (and
generally large) network of people in the form of an
open call. “
[Howe, 2006]
2
THE SEMANTIC WEB
WEB OF DATA THAT CAN BE
PROCESSED BY MACHINES
3
“The Semantic Web provides a common framework
that allows data to be shared and reused across
application, enterprise, and community boundaries “
[W3C, 2011]
MAKING THE SEMANTIC WEB
HUMANLY POSSIBLE
Crowdsourcing increasingly
used to help algorithms solve
Semantic Web problems
Great challenges
 How to run a crowdsourcing
project effectively?
 Which form of crowdsourcing for
which task?
 How to combine crowd and
machine intelligence?
 How to encourage participation?
4
DESIGNING
CROWDSOURCING
PROJECTS 5
DIFFERENT FORMS AND
PLATFORMS TO CHOOSE FROM
6
Macrotasks
Microtasks
Challenges
Self-organized crowds
Crowdfunding
Source:
[Prpić et al.,
2015]
MANY QUESTIONS TO ANSWER
TASK DESIGN
WORKFLOW DESIGN
AND EXECUTION
TASK INTERFACES
QUALITY
ASSURANCE
TASK ASSIGNMENT
CROWD TRAINING
AND FEEDBACK
INCENTIVES
ENGINEERING
COLLABORATION,
COMPETITION, SELF-
ORGANIZATION
REAL-TIME DELIVERY NICHESOURCING
EXTENSIONS TO
TECHNOLOGIES
SOCIAL MACHINES
ENGINEERING
SOME ANSWERS
8
IMPROVING PAID MICROTASKS
@WWW15Compared effectivity of microtasks on
CrowdFlower vs self-developed game
 Image labelling on ESP data set as gold
standard
 Evaluated accuracy, #labels, cost per label,
avg/max #labels/contributor
 For three types of tasks
 Nano: 1 image
 Micro: 11 images
 Small: up to 2000 images
 Probabilistic reasoning to personalize
furtherance incentives
Findings
 Gamification and payments work well together
 Furtherance incentives particularly interesting
HYBRID NER ON TWITTER
@ESWC15
Identified content and crowd factors that impact effectivity
Findings
 Shorter tweets with fewer entities work better
 Crowd is more familiar with people and places from recent news
 MISC as a NER category sometimes confusing but useful to identify
partial and implicitly named entities
#entities
in post
types of
entities
content
sentiment
skipped TP
posts
avg.
time/task
UI
interaction
CROWD-EMPOWERED SPARQL
QUERIES @KCAP2015
A hybrid machine/human SPARQL
query engine that enhances query
answers.
 Uses novel RDF completeness model, to
identify portions of a query with missing
values
 Resorts to microtask crowdsourcing to
resolve the missing values
 Evaluated # of answers/delivery
time/accuracy
 50 queries against Dbpedia in five domains: History,
Life Sciences, Movies, Music, and Sports.
Findings
Size of query answer set increased on avg.
3.13 times
12 minutes to get 98% of all answers
Accuracy between 84 And 96%
11
OPEN QUESTIONS
12
NOT CROWDSOURCING AS USUAL
Knowledge-intensive tasks
Structured, interlinked content
Content meant for machine consumption
Scale, shape, and quality of the data
Context is critical
Open-set answers
13
FUNDAMENTAL CHALLENGES
SCALE
No‘Big Crowd’
TIME
From one-off and short-term to mid and long-term
SCOPE
Problems technology cannot solve
14
PATHWAYS TO SOLUTIONSSCALE
Aligning
incentives
Better
reuse of
crowd
outputs
TIME
Sustaining
engagement
Building
relationship
s
Better
integration
SCOPE
New
problems
and problem
solving
paradigms
Novel
human-
15
THANKS
e.simperl@soton.ac.
uk
@esimperl
16

One does not simply crowdsource the Semantic Web

  • 1.
    ONE DOES NOTSIMPLY CROWDSOURCE THE SEMANTIC WEB TECHNOLOGY DESIGN AND INCENTIVES Elena Simperl e.simperl@soton.ac.uk @esimperl January 26th, 2016 1
  • 2.
    CROWDSOURCING PROBLEM SOLVING VIAOPEN CALLS “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. “ [Howe, 2006] 2
  • 3.
    THE SEMANTIC WEB WEBOF DATA THAT CAN BE PROCESSED BY MACHINES 3 “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries “ [W3C, 2011]
  • 4.
    MAKING THE SEMANTICWEB HUMANLY POSSIBLE Crowdsourcing increasingly used to help algorithms solve Semantic Web problems Great challenges  How to run a crowdsourcing project effectively?  Which form of crowdsourcing for which task?  How to combine crowd and machine intelligence?  How to encourage participation? 4
  • 5.
  • 6.
    DIFFERENT FORMS AND PLATFORMSTO CHOOSE FROM 6 Macrotasks Microtasks Challenges Self-organized crowds Crowdfunding Source: [Prpić et al., 2015]
  • 7.
    MANY QUESTIONS TOANSWER TASK DESIGN WORKFLOW DESIGN AND EXECUTION TASK INTERFACES QUALITY ASSURANCE TASK ASSIGNMENT CROWD TRAINING AND FEEDBACK INCENTIVES ENGINEERING COLLABORATION, COMPETITION, SELF- ORGANIZATION REAL-TIME DELIVERY NICHESOURCING EXTENSIONS TO TECHNOLOGIES SOCIAL MACHINES ENGINEERING
  • 8.
  • 9.
    IMPROVING PAID MICROTASKS @WWW15Comparedeffectivity of microtasks on CrowdFlower vs self-developed game  Image labelling on ESP data set as gold standard  Evaluated accuracy, #labels, cost per label, avg/max #labels/contributor  For three types of tasks  Nano: 1 image  Micro: 11 images  Small: up to 2000 images  Probabilistic reasoning to personalize furtherance incentives Findings  Gamification and payments work well together  Furtherance incentives particularly interesting
  • 10.
    HYBRID NER ONTWITTER @ESWC15 Identified content and crowd factors that impact effectivity Findings  Shorter tweets with fewer entities work better  Crowd is more familiar with people and places from recent news  MISC as a NER category sometimes confusing but useful to identify partial and implicitly named entities #entities in post types of entities content sentiment skipped TP posts avg. time/task UI interaction
  • 11.
    CROWD-EMPOWERED SPARQL QUERIES @KCAP2015 Ahybrid machine/human SPARQL query engine that enhances query answers.  Uses novel RDF completeness model, to identify portions of a query with missing values  Resorts to microtask crowdsourcing to resolve the missing values  Evaluated # of answers/delivery time/accuracy  50 queries against Dbpedia in five domains: History, Life Sciences, Movies, Music, and Sports. Findings Size of query answer set increased on avg. 3.13 times 12 minutes to get 98% of all answers Accuracy between 84 And 96% 11
  • 12.
  • 13.
    NOT CROWDSOURCING ASUSUAL Knowledge-intensive tasks Structured, interlinked content Content meant for machine consumption Scale, shape, and quality of the data Context is critical Open-set answers 13
  • 14.
    FUNDAMENTAL CHALLENGES SCALE No‘Big Crowd’ TIME Fromone-off and short-term to mid and long-term SCOPE Problems technology cannot solve 14
  • 15.
    PATHWAYS TO SOLUTIONSSCALE Aligning incentives Better reuseof crowd outputs TIME Sustaining engagement Building relationship s Better integration SCOPE New problems and problem solving paradigms Novel human- 15
  • 16.