Linked Data Query Processing
Tutorial at the 22nd International World Wide Web Conference (WWW 2013)
May 14, 2013
http://db.uwaterloo.ca/LDQTut2013/
Olaf Hartig
University of Waterloo
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 2
Tutorial Outline
(1) Introduction
(2) Theoretical Foundations
(3) Source Selection Strategies
(4) Execution Process
(5) Query Planning and Optimization
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 3
Linked Data Query Processing
Tutorial at the 22nd International World Wide Web Conference (WWW 2013)
May 14, 2013
http://db.uwaterloo.ca/LDQTut2013/
1. Introduction
Olaf Hartig
University of Waterloo
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 4
Outline
 The Linked Data Principles
 Paradigms for Querying Linked Data
 Characteristics of the “Database System”
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 5
The Traditional, Hypertext Web
MovieDB
Data exposed
to the Web
via HTML
CIA World
Factbook
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 6
Towards a Web of Linked Data
MovieDB
:
( Albania , unemployment rate , 13.2% )
:
Data model: RDF
( War Child , release date , 12 July 1999 )
( War Child , filming location , Albania )
( Michael Davie , directed , War Child )
:
CIA World
Factbook
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 7
Towards a Web of Linked Data
MovieDB
( http://...imdb.../WarChild , release date , 12 July 1999 )
( http://...imdb.../WarChild , filming location , http://cia.../Albania )
( http://...imdb.../MichaelDavie , directed , http://...imdb.../WarChild )
:
( http://cia.../Albania ,
unemployment rate , 13.2% )
:
Data model: RDF
Global identifier: URI
CIA World
Factbook
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 8
Towards a Web of Linked Data
MovieDB
( http://cia.../Albania ,
unemployment rate , 13.2% )
:
Data model: RDF
Global identifier: URI
Access mechanism: HTTP
( http://...imdb.../WarChild , release date , 12 July 1999 )
( http://...imdb.../WarChild , filming location , http://cia.../Albania )
( http://...imdb.../MichaelDavie , directed , http://...imdb.../WarChild )
:
CIA World
Factbook
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 9
Towards a Web of Linked Data
MovieDB
CIA World
Factbook
( http://...imdb.../WarChild , release date , 12 July 1999 )
( http://...imdb.../WarChild , filming location , http://cia.../Albania )
( http://...imdb.../MichaelDavie , directed , http://...imdb.../WarChild )
:
( http://cia.../Albania ,
unemployment rate , 13.2% )
:
Data model: RDF
Global identifier: URI
Access mechanism: HTTP
Connection: data links
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 10
Supplementary Access Methods
● RDF dump: the whole dataset provided as a big file
● SPARQL endpoint: Web service that allows for executing
SPARQL queries over the dataset
● Caveat: these access method cannot be assumed
to be available for all datasets
● Creating dumps is not feasible if data changes very frequently
● Dumps or endpoints may not be feasible if Linked Data
interface is simply a wrapper for some other data source
● Providing and maintaining a reliable SPARQL endpoint
is a significant additional effort
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 11
Outline
 The Linked Data Principles
 Paradigms for Querying Linked Data
 Characteristics of the “Database System”
√
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 12
Traditional Paradigm 1: Warehousing
● Copy data into a centralized repository
● Query this repository
+ Almost instant results
– Misses unknown or new sources
– Collection possibly out of date
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 13
?
??
?
Traditional Paradigm 2: Federation
● Distribute query execution over a
federation of SPARQL endpoints
+ Current data
– Misses sources without
SPARQL endpoint
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 14
Principle 1: Rely on the Linked Data principles only
Principle 2: On-line execution
Linked Data Query Processing
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 15
Principle 1: Rely on the Linked Data principles only
Principle 2: On-line execution
Consequence: Obtain data for executing a given query by
looking up URIs during the query execution process itself
Linked Data Query Processing
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 16
“Ingredients” for LD Query Execution
Query-local data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 17
“Ingredients” for LD Query Execution
● Data retrieval approach
● Data source selection
● Data source ranking
(optional, for optimization)
Query-local data
GET http://.../movie2449
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 18
“Ingredients” for LD Query Execution
● Data retrieval approach
● Data source selection
● Data source ranking
(optional, for optimization)
Query-local data
http://mdb.../Paul http://geo.../Berlin
http://mdb.../Ric http://geo.../Rome
?loc?actor
GET http://.../movie2449
● Result construction approach
● i.e., query-local data processing
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 19
“Ingredients” for LD Query Execution
● Data retrieval approach
● Data source selection
● Data source ranking
(optional, for optimization)
Query-local data
http://mdb.../Paul http://geo.../Berlin
http://mdb.../Ric http://geo.../Rome
?loc?actor
GET http://.../movie2449
● Result construction approach
● i.e., query-local data processing
● Combining data retrieval
and result construction
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 20
+ Current data
+ May make use of any Linked Data available on the Web
– Least efficient due to data shipping
Use cases: live querying where freshness and discovery of
results is more important than an almost instant answer
Properties of LD Query Processing
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 21
Combination with other Paradigms
● Linked Data query processing with a query-local dataset
● Query-local dataset contains additional data [LT11]
● Query-local dataset for caching [Har11b, HH11]
● Linked Data query processing with a SPARQL endpoint
● SPARQL endpoint exposes a cache of Linked Data [UKH+12]
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 22
Our Topic Today …
… pure Linked Data query processing
Linked Data query: a query that ranges over
data made available using
the Linked Data principles
Web of Linked Data: network of data that evolves
by publishing data according
to the Linked Data principles
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 23
Outline
 The Linked Data Principles
 Paradigms for Querying Linked Data
 Characteristics of the “Database System”
√
√
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 24
An Analogy ...
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 25
Traditional, Central Database Systems
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 26
Distributed Database Systems
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 27
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 28
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 29
● Number of
potential data
sources infinite
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 30
● Number of
potential data
sources infinite
● No (a priori)
information
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 31
● Number of
potential data
sources infinite
● No (a priori)
information
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 32
● Number of
potential data
sources infinite
● No (a priori)
information
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 33
● Number of
potential data
sources infinite
● No (a priori)
information
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 34
● Number of
potential data
sources infinite
● No (a priori)
information
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 35
● Number of
potential data
sources infinite
● No (a priori)
information
● Number of
actual data
sources infinite
The Web of Linked Data
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 36
Issues due to the Openness
● Data quality issues
● Accuracy
● Freshness / timeliness
● Believability / trustworthiness
● Data source quality issues
● Availability
● Reliability
● Data integration issues
● Coreferences: Publishers may use different URIs
for denoting the same entity
● Schema heterogeneity: Publishers may use different
vocabularies for their data
For the purpose of discussing
execution of queries in this tutorial,
we largely ignore these issues.
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 37
Outline
 The Linked Data Principles
 Paradigms for Querying Linked Data
 Characteristics of the “Database System”
√
√
√
Next part: 2. Theoretical Foundations ...
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 38
These slides have been created by
Olaf Hartig
for the
WWW 2013 tutorial on
Link Data Query Processing
Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)

Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)

  • 1.
    Linked Data QueryProcessing Tutorial at the 22nd International World Wide Web Conference (WWW 2013) May 14, 2013 http://db.uwaterloo.ca/LDQTut2013/ Olaf Hartig University of Waterloo
  • 2.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 2 Tutorial Outline (1) Introduction (2) Theoretical Foundations (3) Source Selection Strategies (4) Execution Process (5) Query Planning and Optimization
  • 3.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 3 Linked Data Query Processing Tutorial at the 22nd International World Wide Web Conference (WWW 2013) May 14, 2013 http://db.uwaterloo.ca/LDQTut2013/ 1. Introduction Olaf Hartig University of Waterloo
  • 4.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 4 Outline  The Linked Data Principles  Paradigms for Querying Linked Data  Characteristics of the “Database System”
  • 5.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 5 The Traditional, Hypertext Web MovieDB Data exposed to the Web via HTML CIA World Factbook
  • 6.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 6 Towards a Web of Linked Data MovieDB : ( Albania , unemployment rate , 13.2% ) : Data model: RDF ( War Child , release date , 12 July 1999 ) ( War Child , filming location , Albania ) ( Michael Davie , directed , War Child ) : CIA World Factbook
  • 7.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 7 Towards a Web of Linked Data MovieDB ( http://...imdb.../WarChild , release date , 12 July 1999 ) ( http://...imdb.../WarChild , filming location , http://cia.../Albania ) ( http://...imdb.../MichaelDavie , directed , http://...imdb.../WarChild ) : ( http://cia.../Albania , unemployment rate , 13.2% ) : Data model: RDF Global identifier: URI CIA World Factbook
  • 8.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 8 Towards a Web of Linked Data MovieDB ( http://cia.../Albania , unemployment rate , 13.2% ) : Data model: RDF Global identifier: URI Access mechanism: HTTP ( http://...imdb.../WarChild , release date , 12 July 1999 ) ( http://...imdb.../WarChild , filming location , http://cia.../Albania ) ( http://...imdb.../MichaelDavie , directed , http://...imdb.../WarChild ) : CIA World Factbook
  • 9.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 9 Towards a Web of Linked Data MovieDB CIA World Factbook ( http://...imdb.../WarChild , release date , 12 July 1999 ) ( http://...imdb.../WarChild , filming location , http://cia.../Albania ) ( http://...imdb.../MichaelDavie , directed , http://...imdb.../WarChild ) : ( http://cia.../Albania , unemployment rate , 13.2% ) : Data model: RDF Global identifier: URI Access mechanism: HTTP Connection: data links
  • 10.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 10 Supplementary Access Methods ● RDF dump: the whole dataset provided as a big file ● SPARQL endpoint: Web service that allows for executing SPARQL queries over the dataset ● Caveat: these access method cannot be assumed to be available for all datasets ● Creating dumps is not feasible if data changes very frequently ● Dumps or endpoints may not be feasible if Linked Data interface is simply a wrapper for some other data source ● Providing and maintaining a reliable SPARQL endpoint is a significant additional effort
  • 11.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 11 Outline  The Linked Data Principles  Paradigms for Querying Linked Data  Characteristics of the “Database System” √
  • 12.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 12 Traditional Paradigm 1: Warehousing ● Copy data into a centralized repository ● Query this repository + Almost instant results – Misses unknown or new sources – Collection possibly out of date
  • 13.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 13 ? ?? ? Traditional Paradigm 2: Federation ● Distribute query execution over a federation of SPARQL endpoints + Current data – Misses sources without SPARQL endpoint
  • 14.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 14 Principle 1: Rely on the Linked Data principles only Principle 2: On-line execution Linked Data Query Processing
  • 15.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 15 Principle 1: Rely on the Linked Data principles only Principle 2: On-line execution Consequence: Obtain data for executing a given query by looking up URIs during the query execution process itself Linked Data Query Processing
  • 16.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 16 “Ingredients” for LD Query Execution Query-local data
  • 17.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 17 “Ingredients” for LD Query Execution ● Data retrieval approach ● Data source selection ● Data source ranking (optional, for optimization) Query-local data GET http://.../movie2449
  • 18.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 18 “Ingredients” for LD Query Execution ● Data retrieval approach ● Data source selection ● Data source ranking (optional, for optimization) Query-local data http://mdb.../Paul http://geo.../Berlin http://mdb.../Ric http://geo.../Rome ?loc?actor GET http://.../movie2449 ● Result construction approach ● i.e., query-local data processing
  • 19.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 19 “Ingredients” for LD Query Execution ● Data retrieval approach ● Data source selection ● Data source ranking (optional, for optimization) Query-local data http://mdb.../Paul http://geo.../Berlin http://mdb.../Ric http://geo.../Rome ?loc?actor GET http://.../movie2449 ● Result construction approach ● i.e., query-local data processing ● Combining data retrieval and result construction
  • 20.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 20 + Current data + May make use of any Linked Data available on the Web – Least efficient due to data shipping Use cases: live querying where freshness and discovery of results is more important than an almost instant answer Properties of LD Query Processing
  • 21.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 21 Combination with other Paradigms ● Linked Data query processing with a query-local dataset ● Query-local dataset contains additional data [LT11] ● Query-local dataset for caching [Har11b, HH11] ● Linked Data query processing with a SPARQL endpoint ● SPARQL endpoint exposes a cache of Linked Data [UKH+12]
  • 22.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 22 Our Topic Today … … pure Linked Data query processing Linked Data query: a query that ranges over data made available using the Linked Data principles Web of Linked Data: network of data that evolves by publishing data according to the Linked Data principles
  • 23.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 23 Outline  The Linked Data Principles  Paradigms for Querying Linked Data  Characteristics of the “Database System” √ √
  • 24.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 24 An Analogy ...
  • 25.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 25 Traditional, Central Database Systems
  • 26.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 26 Distributed Database Systems
  • 27.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 27 The Web of Linked Data
  • 28.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 28 The Web of Linked Data
  • 29.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 29 ● Number of potential data sources infinite The Web of Linked Data
  • 30.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 30 ● Number of potential data sources infinite ● No (a priori) information The Web of Linked Data
  • 31.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 31 ● Number of potential data sources infinite ● No (a priori) information The Web of Linked Data
  • 32.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 32 ● Number of potential data sources infinite ● No (a priori) information The Web of Linked Data
  • 33.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 33 ● Number of potential data sources infinite ● No (a priori) information The Web of Linked Data
  • 34.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 34 ● Number of potential data sources infinite ● No (a priori) information The Web of Linked Data
  • 35.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 35 ● Number of potential data sources infinite ● No (a priori) information ● Number of actual data sources infinite The Web of Linked Data
  • 36.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 36 Issues due to the Openness ● Data quality issues ● Accuracy ● Freshness / timeliness ● Believability / trustworthiness ● Data source quality issues ● Availability ● Reliability ● Data integration issues ● Coreferences: Publishers may use different URIs for denoting the same entity ● Schema heterogeneity: Publishers may use different vocabularies for their data For the purpose of discussing execution of queries in this tutorial, we largely ignore these issues.
  • 37.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 37 Outline  The Linked Data Principles  Paradigms for Querying Linked Data  Characteristics of the “Database System” √ √ √ Next part: 2. Theoretical Foundations ...
  • 38.
    WWW 2013 Tutorialon Linked Data Query Processing [ Introduction ] 38 These slides have been created by Olaf Hartig for the WWW 2013 tutorial on Link Data Query Processing Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/ This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)