biofin-project.eu 1
FAIR data principles and Metadata:
or why build ontologies
Christopher Brewster (University of Maastricht):
Christopher.Brewster@maastrichtuniversity.nl
biofin-project.eu 2
Of metadata, of ontologies, and of FAIR
data Principles
∙ Strange terms, strange concepts for most
people
∙ Why are we interested in this in the BioFIN
project?
∙ What is this?
∙ How do we do it?
∙ Why do we do this?
∙ ... but first a little story ... almost a history
lesson
The time has come,' the Walrus said,
To talk of many things:
Of shoes — and ships — and sealing-wax —
Of cabbages — and kings —
And why the sea is boiling hot —
And whether pigs have wings.'
The time has come,' the Walrus said,
To talk of many things:
Of shoes — and ships — and sealing-wax —
Of cabbages — and kings —
And why the sea is boiling hot —
And whether pigs have wings.’
-- Lewis Caroll
biofin-project.eu 3
Story 1 – The Web of Data
∙ 1992 Tim Berners-Lee invents the World-Wide-Web. This was designed as a web of documents.
∙ TBL realises that a web of documents was insufficient and what was needed was a “web of data”.
∙ From this realization arose a series of technologies we generally call "semantic web" - to gradually turn a web of documents into a web of data
∙ Includes standards such RDF, RDFS, OWL, SPARQL, .... and lots more under the aegis of W3C
∙ In 00s, TBL proposed the idea of "Linked Data“:
∙ One for online in any format – “open data”
∙ Two for online in machine readable format e.g Excel
∙ Three for online, in non-proprietary format e.g. csv
∙ Four for online, non-proprietary format, use open standards to identify stuff (i.e. use URIs, RDF etc.)
∙ Five for online, non-proprietary format, use open standards, link to other data sets
biofin-project.eu 4
Story 2 Linked Open Data Cloud
biofin-project.eu 5
Story 3 Open Science
∙ Two contrary movements
∙ Panic about "open data", problem especially in health but
general move towards respecting privacy, ownership etc.
∙ Frustration with research being paid for but not open,
accessible, frustration both from scientists and funding
agencies, some parts of general public/politicians
∙ Frustration also research gets lost, inaccessible, loss of
context etc.
∙ Result (cutting a long story short)
∙ European Open Science Cloud - from the EC
∙ FAIR Data Principles - from the Life Science community
https://unsplash.com/photos/PdDBTrk
GYLo
biofin-project.eu 6
The FAIR Data Principles
∙ Important paper laid the foundations:
Wilkinson, M. D., et al. (2016). The FAIR
Guiding Principles for scientific data
management and stewardship. Scientific
Data, 3, 160018.
https://doi.org/10.1038/sdata.2016.18
∙ Has had a huge impact … generally
adopted by the EC and many other
funding agencies
∙ What does it mean?
biofin-project.eu 7
FAIR Consequences
https://www.dtls.nl/fair-data/fair-data-knowledge-expertise/
https://www.dtls.nl/fair-data/data-stewardship/
biofin-project.eu 8
FAIR Consequences – semantic technologies
∙ Metadata for a NbS to be findable
∙ Need to use an ontology/taxonomy/vocabulary that is widely used to label
the NBS/data with appropriate keywords/concepts
∙ Need have unique identifiers
∙ Metadata for NBS to be accessible
∙ Need to use a commonly used protocol to access the NbS/data
∙ Need to have access controls – who is allowed to have access to that data?
∙ Metadata for aNbS to be interoperable
∙ Need to agreed ontology to describe the NbS/data, even more important if
data is to be machine readable
∙ Ontology must be following FAIR principles as well
∙ Metadata for a NbS to be reusable
∙ Need for provenance data – where did this NbS come? Who made it?
∙ Need for suitable machine readable licences
biofin-project.eu 9
Example Metadata
Zenodo – some article - DC format
Wikidata – Maastricht – in RDF
<rdf:RDF>
<rdf:Description rdf:about="https://www.wikidata.org/wiki/Special:EntityData/Q1309">
<rdf:type rdf:resource="http://schema.org/Dataset"/>
<schema:about rdf:resource="http://www.wikidata.org/entity/Q1309"/>
<cc:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
<schema:softwareVersion>1.0.0</schema:softwareVersion>
<schema:version
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1831142180</schema:version>
<schema:dateModified
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-02-10T18:20:04Z</schema:date
Modified>
<wikibase:statements
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">178</wikibase:statements>
<wikibase:sitelinks
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">121</wikibase:sitelinks>
<wikibase:identifiers
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">51</wikibase:identifiers>
</rdf:Description>
<rdf:Description rdf:about="http://www.wikidata.org/entity/Q1309">
<rdf:type rdf:resource="http://wikiba.se/ontology#Item"/>
</rdf:Description>
<rdf:Description rdf:about="https://af.wikipedia.org/wiki/Maastricht">
<rdf:type rdf:resource="http://schema.org/Article"/>
<schema:about rdf:resource="http://www.wikidata.org/entity/Q1309"/>
biofin-project.eu 10
Ontologies
∙ What is an ontology? Just a machine readable, formal way of
describing a part of the world.
∙ There are lost of ontologies …. Central to Linked Data, central to
any form of “knowledge representation”
∙ Typically use RDF/RDFS/OWL formalisms to be machine readable
∙ Lots and lots of agriculture, forestry and environment ontologies
e.g. AGROVOC, FOODON, AGRO, ENVO
∙ Too many, often lack of agreement means every
organisation goes and creates another one …
∙ However, necessary to achieve interoperability
http://finto.fi/yso/en/page/p5454
Agrovoc
ENVO
biofin-project.eu 11
Why do we do this?
• An NbS is for the platform a bundle
of data.
• We need to be able to clearly specify
what data is necessary and what is
optional.
• We need to be able to share this
data e.g. with financial institution is
a machine-readable format.
• We need to get other actors to
provide data is a standardized a
manner as possible.
• Creating a standard enables other
people to provide services that
interact with the platform.

BIOFIN-EU: Training "FAIR Data Principles and Metadata: or Why Build Ontologies"

  • 1.
    biofin-project.eu 1 FAIR dataprinciples and Metadata: or why build ontologies Christopher Brewster (University of Maastricht): Christopher.Brewster@maastrichtuniversity.nl
  • 2.
    biofin-project.eu 2 Of metadata,of ontologies, and of FAIR data Principles ∙ Strange terms, strange concepts for most people ∙ Why are we interested in this in the BioFIN project? ∙ What is this? ∙ How do we do it? ∙ Why do we do this? ∙ ... but first a little story ... almost a history lesson The time has come,' the Walrus said, To talk of many things: Of shoes — and ships — and sealing-wax — Of cabbages — and kings — And why the sea is boiling hot — And whether pigs have wings.' The time has come,' the Walrus said, To talk of many things: Of shoes — and ships — and sealing-wax — Of cabbages — and kings — And why the sea is boiling hot — And whether pigs have wings.’ -- Lewis Caroll
  • 3.
    biofin-project.eu 3 Story 1– The Web of Data ∙ 1992 Tim Berners-Lee invents the World-Wide-Web. This was designed as a web of documents. ∙ TBL realises that a web of documents was insufficient and what was needed was a “web of data”. ∙ From this realization arose a series of technologies we generally call "semantic web" - to gradually turn a web of documents into a web of data ∙ Includes standards such RDF, RDFS, OWL, SPARQL, .... and lots more under the aegis of W3C ∙ In 00s, TBL proposed the idea of "Linked Data“: ∙ One for online in any format – “open data” ∙ Two for online in machine readable format e.g Excel ∙ Three for online, in non-proprietary format e.g. csv ∙ Four for online, non-proprietary format, use open standards to identify stuff (i.e. use URIs, RDF etc.) ∙ Five for online, non-proprietary format, use open standards, link to other data sets
  • 4.
    biofin-project.eu 4 Story 2Linked Open Data Cloud
  • 5.
    biofin-project.eu 5 Story 3Open Science ∙ Two contrary movements ∙ Panic about "open data", problem especially in health but general move towards respecting privacy, ownership etc. ∙ Frustration with research being paid for but not open, accessible, frustration both from scientists and funding agencies, some parts of general public/politicians ∙ Frustration also research gets lost, inaccessible, loss of context etc. ∙ Result (cutting a long story short) ∙ European Open Science Cloud - from the EC ∙ FAIR Data Principles - from the Life Science community https://unsplash.com/photos/PdDBTrk GYLo
  • 6.
    biofin-project.eu 6 The FAIRData Principles ∙ Important paper laid the foundations: Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18 ∙ Has had a huge impact … generally adopted by the EC and many other funding agencies ∙ What does it mean?
  • 7.
  • 8.
    biofin-project.eu 8 FAIR Consequences– semantic technologies ∙ Metadata for a NbS to be findable ∙ Need to use an ontology/taxonomy/vocabulary that is widely used to label the NBS/data with appropriate keywords/concepts ∙ Need have unique identifiers ∙ Metadata for NBS to be accessible ∙ Need to use a commonly used protocol to access the NbS/data ∙ Need to have access controls – who is allowed to have access to that data? ∙ Metadata for aNbS to be interoperable ∙ Need to agreed ontology to describe the NbS/data, even more important if data is to be machine readable ∙ Ontology must be following FAIR principles as well ∙ Metadata for a NbS to be reusable ∙ Need for provenance data – where did this NbS come? Who made it? ∙ Need for suitable machine readable licences
  • 9.
    biofin-project.eu 9 Example Metadata Zenodo– some article - DC format Wikidata – Maastricht – in RDF <rdf:RDF> <rdf:Description rdf:about="https://www.wikidata.org/wiki/Special:EntityData/Q1309"> <rdf:type rdf:resource="http://schema.org/Dataset"/> <schema:about rdf:resource="http://www.wikidata.org/entity/Q1309"/> <cc:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/> <schema:softwareVersion>1.0.0</schema:softwareVersion> <schema:version rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1831142180</schema:version> <schema:dateModified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-02-10T18:20:04Z</schema:date Modified> <wikibase:statements rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">178</wikibase:statements> <wikibase:sitelinks rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">121</wikibase:sitelinks> <wikibase:identifiers rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">51</wikibase:identifiers> </rdf:Description> <rdf:Description rdf:about="http://www.wikidata.org/entity/Q1309"> <rdf:type rdf:resource="http://wikiba.se/ontology#Item"/> </rdf:Description> <rdf:Description rdf:about="https://af.wikipedia.org/wiki/Maastricht"> <rdf:type rdf:resource="http://schema.org/Article"/> <schema:about rdf:resource="http://www.wikidata.org/entity/Q1309"/>
  • 10.
    biofin-project.eu 10 Ontologies ∙ Whatis an ontology? Just a machine readable, formal way of describing a part of the world. ∙ There are lost of ontologies …. Central to Linked Data, central to any form of “knowledge representation” ∙ Typically use RDF/RDFS/OWL formalisms to be machine readable ∙ Lots and lots of agriculture, forestry and environment ontologies e.g. AGROVOC, FOODON, AGRO, ENVO ∙ Too many, often lack of agreement means every organisation goes and creates another one … ∙ However, necessary to achieve interoperability http://finto.fi/yso/en/page/p5454 Agrovoc ENVO
  • 11.
    biofin-project.eu 11 Why dowe do this? • An NbS is for the platform a bundle of data. • We need to be able to clearly specify what data is necessary and what is optional. • We need to be able to share this data e.g. with financial institution is a machine-readable format. • We need to get other actors to provide data is a standardized a manner as possible. • Creating a standard enables other people to provide services that interact with the platform.