One does not simply crowdsource the Semantic Web

ONE DOES NOT SIMPLY
CROWDSOURCE THE
SEMANTIC WEB
TECHNOLOGY DESIGN AND INCENTIVES
Elena Simperl
e.simperl@soton.ac.uk
@esimperl
January 26th, 2016
1

CROWDSOURCING
PROBLEM SOLVING VIA OPEN CALLS
“Crowdsourcing represents the act of a company or
institution taking a function once performed by
employees and outsourcing it to an undefined (and
generally large) network of people in the form of an
open call. “
[Howe, 2006]
2

THE SEMANTIC WEB
WEB OF DATA THAT CAN BE
PROCESSED BY MACHINES
3
“The Semantic Web provides a common framework
that allows data to be shared and reused across
application, enterprise, and community boundaries “
[W3C, 2011]

MAKING THE SEMANTIC WEB
HUMANLY POSSIBLE
Crowdsourcing increasingly
used to help algorithms solve
Semantic Web problems
Great challenges
 How to run a crowdsourcing
project effectively?
 Which form of crowdsourcing for
which task?
 How to combine crowd and
machine intelligence?
 How to encourage participation?
4

DESIGNING
CROWDSOURCING
PROJECTS 5

DIFFERENT FORMS AND
PLATFORMS TO CHOOSE FROM
6
Macrotasks
Microtasks
Challenges
Self-organized crowds
Crowdfunding
Source:
[Prpić et al.,
2015]

MANY QUESTIONS TO ANSWER
TASK DESIGN
WORKFLOW DESIGN
AND EXECUTION
TASK INTERFACES
QUALITY
ASSURANCE
TASK ASSIGNMENT
CROWD TRAINING
AND FEEDBACK
INCENTIVES
ENGINEERING
COLLABORATION,
COMPETITION, SELF-
ORGANIZATION
REAL-TIME DELIVERY NICHESOURCING
EXTENSIONS TO
TECHNOLOGIES
SOCIAL MACHINES
ENGINEERING

IMPROVING PAID MICROTASKS
@WWW15Compared effectivity of microtasks on
CrowdFlower vs self-developed game
 Image labelling on ESP data set as gold
standard
 Evaluated accuracy, #labels, cost per label,
avg/max #labels/contributor
 For three types of tasks
 Nano: 1 image
 Micro: 11 images
 Small: up to 2000 images
 Probabilistic reasoning to personalize
furtherance incentives
Findings
 Gamification and payments work well together
 Furtherance incentives particularly interesting

HYBRID NER ON TWITTER
@ESWC15
Identified content and crowd factors that impact effectivity
Findings
 Shorter tweets with fewer entities work better
 Crowd is more familiar with people and places from recent news
 MISC as a NER category sometimes confusing but useful to identify
partial and implicitly named entities
#entities
in post
types of
entities
content
sentiment
skipped TP
posts
avg.
time/task
UI
interaction

CROWD-EMPOWERED SPARQL
QUERIES @KCAP2015
A hybrid machine/human SPARQL
query engine that enhances query
answers.
 Uses novel RDF completeness model, to
identify portions of a query with missing
values
 Resorts to microtask crowdsourcing to
resolve the missing values
 Evaluated # of answers/delivery
time/accuracy
 50 queries against Dbpedia in five domains: History,
Life Sciences, Movies, Music, and Sports.
Findings
Size of query answer set increased on avg.
3.13 times
12 minutes to get 98% of all answers
Accuracy between 84 And 96%
11

NOT CROWDSOURCING AS USUAL
Knowledge-intensive tasks
Structured, interlinked content
Content meant for machine consumption
Scale, shape, and quality of the data
Context is critical
Open-set answers
13

FUNDAMENTAL CHALLENGES
SCALE
No‘Big Crowd’
TIME
From one-off and short-term to mid and long-term
SCOPE
Problems technology cannot solve
14

PATHWAYS TO SOLUTIONSSCALE
Aligning
incentives
Better
reuse of
crowd
outputs
TIME
Sustaining
engagement
Building
relationship
s
Better
integration
SCOPE
New
problems
and problem
solving
paradigms
Novel
human-
15

THANKS
e.simperl@soton.ac.
uk
@esimperl
16

One does not simply crowdsource the Semantic Web

More Related Content

What's hot

Viewers also liked

Similar to One does not simply crowdsource the Semantic Web

More from Elena Simperl

Recently uploaded

One does not simply crowdsource the Semantic Web