The Web Observatory Extension: Facilitating Web
Science

Collaboration through Semantic Markup"
Dominic DiFranzo, John S. Erickson, Marie Joan Kristine T. Gloria,
Joanne S. Luciano, Deborah McGuinness, James Hendler
The Tetherless World Constellation &
Institute for Data Exploration and Applications
Rensselaer Polytechnic Institute, Troy, NY
Introduction
6
•  Web Science involves using and producing large amounts of
heterogeneous data about and from the web"
"
•  As we (Web Science researchers) strive to collaborate and work
together, we must find ways to share, link and reuse each other’s
data and tools."
"
•  To do this, we are striving to build “Web Observatories” – a
common infrastructure for enhancing this sharing, and to extend
it to also include tools, research project results(papers &
experiments), etc."
Tiropanis,T., Hall,W., Shadbolt, N., DeRoure, D., Contractor, N. and Hendler, J.,
TheWeb Science Observatory, IEEE Intelligent Systems, March/April, 2013.
Web Observatory Concept
WO Portal
Engaging communities with analytics
Publication of catalogues (schema.org)
Access with/without credentials
Searching and Indexing
Distributed Queries
Plugged in Datastores and App Servers
Harvesting
Dataset enrichment/curation
Dataset management
Provenance
Optimisation
WO Datastores
Hosting of analytic apps
Hosting of visualisation apps
Monitoring dependency on
datasets
Monitoring dependency on tools
Explicit links between
tools & datasets used
WO Apps
WO Portal
WO AppsWO Datastores
WO Portal
WO AppsWO Datastores
Links to resources in other
Web Observatories
Thanassis Tiropanis – University of Southampton
RPI Observatory Themes
Science Data Observatory Health & Life Sciences
Observatory
Open Government Observatory Social Spaces Observatory
Example:
Indian Election Twitter Dataset
Example:
Deep Carbon Obs. Datasets
Example:
Cancer Treatment Datasets
Example:
Int’l Open Govt Metadata
Data use (Social Spaces)
6
Data use (Open Govt Data)
6
Problem: putting these together across
laboratories (and fields)
6
Schema.org
6
•  An initiative launched by the leading search
engine providers to create and support a
common set of schemas for structured data
markup on Web pages.
•  These vocabularies enable the metadata to be
more machine readable, allowing for better
search, discover and display this information
Example RDFA Lite
6
<div http://schema.org/ >
<h1 >Avatar</h1>
<span>Director:
<span ">James Cameron</span>
</span>
<span >Science fiction</span>
<a href="../movies/avatar-theatrical-trailer.html"
>Trailer</a>
</div>
Schema.org in action
6
Schema.org in action
6
http://datasets.schema-labs.appspot.com/
Goals
6
•  Describe Web Observatories
•  Interconnect Web Observatories	

•  Facilitate discovery of tools, datasets,
and projects for researchers
Overview
6
Web
Observatory	

Project	

Dataset	

 Tool
Without Schema.org:	

Search
6
Web
Observatory	

Project	

Dataset	

 Tool	

Web
Observatory	

Project	

Dataset	

 Tool	

Web
Observatory	

Project	

Dataset	

 Tool	

Search	

With Schema.org:
Schema.org vocabulary extension
Web Observatory Class"
Schema.org vocabulary extension
Web Observatory Project"
Schema.org vocabulary extension
Web Observatory Dataset"
Schema.org vocabulary extension
Web Observatory Tool"
Schema.org vocabulary demo
Schema.org vocabulary demo
Social Spaces
WO	

WO
Project:	

Cosmic	

WO
Project:	

First
Responder
Schema.org vocabulary demo
Health/Life
Science WO	

WO
Project:	

Mobile
Health	

WO
Project:	

Health
Data
Challenge	

WO Dataset:	

Health Data
Challenge
Conclusions
Science Data Observatory
Social Spaces Observatory
•  Integrating data on the Web, in general, is
growing
•  Schema.org is a data embedding model
showing great success
•  Schema.org/Dataset became official April
2013
•  Search Engine tools are increasingly making
used of embedded markup
•  Web Observatory extension aimed at use in
(Web) scientific community
•  Also being used by AGU and DCO scientific
Future Work
Science Data Observatory
Social Spaces Observatory
•  Further extend the vocabulary to fit more web
observatories
•  Subcommunities can extend terminologies
•  Build better tools to use and embed
schema.org vocabulary into web observatories
•  Integrate into “telescope” toolbox
•  Build tools to make use of schema.org WO
metadata (search engines, crawlers, etc)
•  Google Domain Search underway

Facilitating Web Science Collaboration through Semantic Markup

  • 1.
    The Web ObservatoryExtension: Facilitating Web Science
 Collaboration through Semantic Markup" Dominic DiFranzo, John S. Erickson, Marie Joan Kristine T. Gloria, Joanne S. Luciano, Deborah McGuinness, James Hendler The Tetherless World Constellation & Institute for Data Exploration and Applications Rensselaer Polytechnic Institute, Troy, NY
  • 2.
    Introduction 6 •  Web Scienceinvolves using and producing large amounts of heterogeneous data about and from the web" " •  As we (Web Science researchers) strive to collaborate and work together, we must find ways to share, link and reuse each other’s data and tools." " •  To do this, we are striving to build “Web Observatories” – a common infrastructure for enhancing this sharing, and to extend it to also include tools, research project results(papers & experiments), etc." Tiropanis,T., Hall,W., Shadbolt, N., DeRoure, D., Contractor, N. and Hendler, J., TheWeb Science Observatory, IEEE Intelligent Systems, March/April, 2013.
  • 3.
    Web Observatory Concept WOPortal Engaging communities with analytics Publication of catalogues (schema.org) Access with/without credentials Searching and Indexing Distributed Queries Plugged in Datastores and App Servers Harvesting Dataset enrichment/curation Dataset management Provenance Optimisation WO Datastores Hosting of analytic apps Hosting of visualisation apps Monitoring dependency on datasets Monitoring dependency on tools Explicit links between tools & datasets used WO Apps WO Portal WO AppsWO Datastores WO Portal WO AppsWO Datastores Links to resources in other Web Observatories Thanassis Tiropanis – University of Southampton
  • 4.
    RPI Observatory Themes ScienceData Observatory Health & Life Sciences Observatory Open Government Observatory Social Spaces Observatory Example: Indian Election Twitter Dataset Example: Deep Carbon Obs. Datasets Example: Cancer Treatment Datasets Example: Int’l Open Govt Metadata
  • 5.
  • 6.
    Data use (OpenGovt Data) 6
  • 7.
    Problem: putting thesetogether across laboratories (and fields) 6
  • 8.
    Schema.org 6 •  An initiativelaunched by the leading search engine providers to create and support a common set of schemas for structured data markup on Web pages. •  These vocabularies enable the metadata to be more machine readable, allowing for better search, discover and display this information
  • 9.
    Example RDFA Lite 6 <divhttp://schema.org/ > <h1 >Avatar</h1> <span>Director: <span ">James Cameron</span> </span> <span >Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html" >Trailer</a> </div>
  • 10.
  • 11.
  • 12.
    Goals 6 •  Describe WebObservatories •  Interconnect Web Observatories •  Facilitate discovery of tools, datasets, and projects for researchers
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Schema.org vocabulary demo SocialSpaces WO WO Project: Cosmic WO Project: First Responder
  • 22.
    Schema.org vocabulary demo Health/Life ScienceWO WO Project: Mobile Health WO Project: Health Data Challenge WO Dataset: Health Data Challenge
  • 23.
    Conclusions Science Data Observatory SocialSpaces Observatory •  Integrating data on the Web, in general, is growing •  Schema.org is a data embedding model showing great success •  Schema.org/Dataset became official April 2013 •  Search Engine tools are increasingly making used of embedded markup •  Web Observatory extension aimed at use in (Web) scientific community •  Also being used by AGU and DCO scientific
  • 24.
    Future Work Science DataObservatory Social Spaces Observatory •  Further extend the vocabulary to fit more web observatories •  Subcommunities can extend terminologies •  Build better tools to use and embed schema.org vocabulary into web observatories •  Integrate into “telescope” toolbox •  Build tools to make use of schema.org WO metadata (search engines, crawlers, etc) •  Google Domain Search underway