Introduction to Open Data Policies in H2020
Nancy Pontika, PhD
Open Access Aggregation Officer, CORE
The Open University, UK
25th October 2017
Idea
Methodology
Data
Collection
Analysis
Publish
Journal article,
Dissertation,
Book, Source
Code, etc.
Experiments,
Interviews,
Observations, etc.
Numbers,
Code, Text,
Images, sound
records, etc.
Statistics,
processes,
analysis,
documentation,
etc.
Research Lifecycle
Open Access routes
Gold Route-
Journals
Pure Open
Access Journals
Hybrid Open
Access Journals
Green Route -
Repositories
Institutional
repositories
Disciplinary
repositories
Open-access (OA) literature is digital, online, free of charge, and free of most
copyright and licensing restrictions. What makes it possible is the internet and
the consent of the author or copyright-holder.
(Source: https://legacy.earlham.edu/~peters/fos/brief.htm)
Idea
Methodology
Data
Collection
Analysis
Publish
Experiments,
Interviews,
Observations, etc.
Numbers,
Code, Text,
Images, sound
records, etc.
Statistics,
processes,
analysis,
documentation,
etc.
Journal article,
Dissertation,
Book, Source
Code, etc.
ResearchLifecycle: focus on the publications
Open Access in Horizon2020
“The European Commission sees open access not as an end in itself but as a tool to facilitate and improve
the circulation of information in the European Research Area (ERA) and beyond.”
(Source: https://ec.europa.eu/programmes/horizon2020/sites/horizon2020/files/FactSheet_Open_Access.pdf)
It is mandated:
• Immediate deposit of peer reviewed scientific publications
– i.e. journal articles
• Deposit in a repository (machine readable form and even Gold OA)
– Respect embargo periods
• If possible use an open license
– e.g. Creative Commons Attribution, CC-BY
• When possible deposit research data
(Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-
pilot-guide_en.pdf)
Research Lifecycle: focus on data
Idea
Methodology
Data
Collection
Analysis
Publish
Experiments,
Interviews,
Observations, etc.
Numbers,
Code, Text,
Images, sound
records, etc.
Statistics,
processes,
analysis,
documentation,
etc.
Journal article,
Dissertation,
Book, Source
Code, etc.
Versioning
control, Storage &
Management
Workflow
Management
Systems
Interactive
computing
Wikis, Blogs,
Social Media
What constitutes research data?
‘Research data’ refers to information, in particular facts or numbers,
collected to be examined and considered as a basis for reasoning,
discussion or calculation.
In a research context, examples of data include statistics, results of
experiments, measurements, observations resulting from fieldwork,
survey results, interview recordings and images. The focus is on
research data that is available in digital form.
(Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf)
H2020 areas participating in pilot (2016-2017)
• Future and Emerging Technologies
• Research Infrastructures
• Leadership in enabling and industrial technologies – Information and Communication Technologies
• Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing, and Biotechnology
• Societal Challenge: Food security, sustainable agriculture and forestry, marine and maritime and
inland water research and the bioeconomy
• Societal Challenge: ‘Climate action, Environment, Resource Efficiency and Raw Materials’ – except
raw materials
• Societal Challenge: ‘Europe in a changing world – inclusive, innovative and reflective Societies’
• Science with and for Society
• Cross-cutting activities – focus areas – part Smart and Sustainable Cities.
* Projects in other areas can participate on a voluntary basis
(Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf)
The scope of participation is growing...
• In 2014-15 work programme, 7 areas participated in the pilot.
• In the 2016 work programme, new topics joined in 3 areas (research
infrastructures, nanotechnologies and food security)
• All calls covered by the 2017 work programme will be part of the pilot. A move
from a pilot to a mandate.
H2020 Open Research Data Pilot (ORD)
(Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-
guide_en.pdf and https://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf)
Open access to research data
Refers to the right to access and reuse digital
research data under the terms and conditions set
out in the Grant Agreement.
Why have Open Access to both
Publications and Data?
• build on previous research results (improved
quality of results)
• encourage collaboration and avoid duplication
of effort (greater efficiency)
• speed up innovation (faster progress to market
means faster growth)
• involve citizens and society (improved
transparency of the scientific process).
FAIR Data
The data, including associated metadata, needed to
validate the results presented in scientific publications
Other curated and/or raw data, including associated
metadata, as specified in the data management plan
Doesn’t apply to all data (researchers to define as appropriate)
Don’t have to share data if inappropriate – exemptions apply
Which data does the ORD pilot apply to?
Beneficiaries participating in the ORD pilot will:
• Deposit data in a research data repository
• Take measures to enable third parties to access, mine, exploit, reproduce
and disseminate (free of charge for any user) this research data
• Provide information via the chosen repository about tools and
instruments necessary for validating the results (where possible, provide
the tools and instruments themselves)
Key requirements of the ORD pilot
• If results are expected to be commercially or industrially exploited
• If participation is incompatible with the need for confidentiality in connection with security
issues
• Incompatible with existing rules on the protection of personal data
• Would jeopardise the achievement of the main aim of the action
• If the project will not generate / collect any research data
• If there are other legitimate reasons to not take part in the Pilot
Projects can opt out at any stage
Can opt out totally or partially (i.e. for some data only)
Should describe issues in the project DMP
Exemptions – reasons for opting out
Data Management Plans
Projects participating in the pilot will be required to develop
a Data Management plan (DMP), in which they will specify
what data will be open.
Note that the Commission does NOT require applicants to submit
a DMP at the proposal stage.
A DMP is therefore NOT part of the evaluation.
DMPs are a deliverable.
Where relevant*, H2020 proposals can
include a section on data management which
is evaluated under the criterion ‘Impact’
• What types of data will the project
generate/collect?
• What standards will be used?
• How will this data be shared/made
available? If not, why?
• How will this data be curated and
preserved?
* For “Research and Innovation actions” and “Innovation Actions”
DMPs are a project deliverable for those
participating in the open data pilot.
Not a fixed document – should evolve and
gain precision
– Deliver first version within initial 6 months
of project
– More elaborate versions whenever important
changes to the project occur. At least at the
mid-term and final review.
Informantion on RDM: what and when
PROPOSAL STAGE IN PROJECT
How can researchers make data open?
1. Choose the dataset(s) to share
– What can be made open? This step may need to be revisited if problems
are encountered later.
2. Apply an open license
– Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
– Provide the data in a suitable format. Use repositories.
4. Make it discoverable
– Post on the web, get a unique ID, register in catalogues…
https://okfn.org
Licensing research data openly
This DCC guide outlines the pros and cons of
each approach and gives practical advice on
how to implement a data licence
Source: http://www.dcc.ac.uk/resources/how-
guides/license-research-data
Which licenses are appropriate?
Creative Commons clauses that limit sharing
NC NonCommercial
What counts as commercial?
ND NoDerivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access guidelines point to:
or
Deposit in research data repositories
The EC guidelines point to Re3data as one of the registries that can be
searched to find a home for data
Source: http://service.re3data.org/search
Zenodo is a multi-disciplinary repository that can be used
for the long-tail of research data
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software
• Assigns a Digital Object Identifier (DOI)
• Links funding, publications, data & software
Zenodo
(Source: www.zenodo.org)
Metadata and documentation
Metadata is basic descriptive information to help others identify and understand the structure of the data
e.g. title, author...
Documentation provides the wider context e.g. the methodology / workflow, software, tools and any
information needed to understand and reuse the data
Relevant standards should be used for interoperability – check out the Metadata Standards Directory from
the Research Data Alliance
(Source: http://rd-alliance.github.io/metadata-directory)
FOSTER project
Facilitate Open Science Training for European Research
• Network of open access trainers
• Programme of open science courses
• Portal to training materials
• E-learning courses on open
access and open data
(Source: www.fosteropenscience.eu)
FOSTER Course
Source: https://www.fosteropenscience.eu/content/horizon-2020-open-research-data-pilot-0
Slides attribution:
Slides 12-24 are based on ”The Horizon2020 Open Data Pilot” by Sara Jones,
Digital Curation Center https://www.slideshare.net/sjDCC/h2020-open-data-
pilot
Thank you!

General introduction to Open Data Policies H2020, influence of OD policies on Open Science workflows

  • 1.
    Introduction to OpenData Policies in H2020 Nancy Pontika, PhD Open Access Aggregation Officer, CORE The Open University, UK 25th October 2017
  • 2.
    Idea Methodology Data Collection Analysis Publish Journal article, Dissertation, Book, Source Code,etc. Experiments, Interviews, Observations, etc. Numbers, Code, Text, Images, sound records, etc. Statistics, processes, analysis, documentation, etc. Research Lifecycle
  • 3.
    Open Access routes GoldRoute- Journals Pure Open Access Journals Hybrid Open Access Journals Green Route - Repositories Institutional repositories Disciplinary repositories Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions. What makes it possible is the internet and the consent of the author or copyright-holder. (Source: https://legacy.earlham.edu/~peters/fos/brief.htm)
  • 4.
    Idea Methodology Data Collection Analysis Publish Experiments, Interviews, Observations, etc. Numbers, Code, Text, Images,sound records, etc. Statistics, processes, analysis, documentation, etc. Journal article, Dissertation, Book, Source Code, etc. ResearchLifecycle: focus on the publications
  • 5.
    Open Access inHorizon2020 “The European Commission sees open access not as an end in itself but as a tool to facilitate and improve the circulation of information in the European Research Area (ERA) and beyond.” (Source: https://ec.europa.eu/programmes/horizon2020/sites/horizon2020/files/FactSheet_Open_Access.pdf) It is mandated: • Immediate deposit of peer reviewed scientific publications – i.e. journal articles • Deposit in a repository (machine readable form and even Gold OA) – Respect embargo periods • If possible use an open license – e.g. Creative Commons Attribution, CC-BY • When possible deposit research data (Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa- pilot-guide_en.pdf)
  • 6.
    Research Lifecycle: focuson data Idea Methodology Data Collection Analysis Publish Experiments, Interviews, Observations, etc. Numbers, Code, Text, Images, sound records, etc. Statistics, processes, analysis, documentation, etc. Journal article, Dissertation, Book, Source Code, etc. Versioning control, Storage & Management Workflow Management Systems Interactive computing Wikis, Blogs, Social Media
  • 7.
    What constitutes researchdata? ‘Research data’ refers to information, in particular facts or numbers, collected to be examined and considered as a basis for reasoning, discussion or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form. (Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf)
  • 8.
    H2020 areas participatingin pilot (2016-2017) • Future and Emerging Technologies • Research Infrastructures • Leadership in enabling and industrial technologies – Information and Communication Technologies • Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing, and Biotechnology • Societal Challenge: Food security, sustainable agriculture and forestry, marine and maritime and inland water research and the bioeconomy • Societal Challenge: ‘Climate action, Environment, Resource Efficiency and Raw Materials’ – except raw materials • Societal Challenge: ‘Europe in a changing world – inclusive, innovative and reflective Societies’ • Science with and for Society • Cross-cutting activities – focus areas – part Smart and Sustainable Cities. * Projects in other areas can participate on a voluntary basis (Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf)
  • 9.
    The scope ofparticipation is growing... • In 2014-15 work programme, 7 areas participated in the pilot. • In the 2016 work programme, new topics joined in 3 areas (research infrastructures, nanotechnologies and food security) • All calls covered by the 2017 work programme will be part of the pilot. A move from a pilot to a mandate.
  • 10.
    H2020 Open ResearchData Pilot (ORD) (Source: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot- guide_en.pdf and https://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf) Open access to research data Refers to the right to access and reuse digital research data under the terms and conditions set out in the Grant Agreement. Why have Open Access to both Publications and Data? • build on previous research results (improved quality of results) • encourage collaboration and avoid duplication of effort (greater efficiency) • speed up innovation (faster progress to market means faster growth) • involve citizens and society (improved transparency of the scientific process).
  • 11.
  • 12.
    The data, includingassociated metadata, needed to validate the results presented in scientific publications Other curated and/or raw data, including associated metadata, as specified in the data management plan Doesn’t apply to all data (researchers to define as appropriate) Don’t have to share data if inappropriate – exemptions apply Which data does the ORD pilot apply to?
  • 13.
    Beneficiaries participating inthe ORD pilot will: • Deposit data in a research data repository • Take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data • Provide information via the chosen repository about tools and instruments necessary for validating the results (where possible, provide the tools and instruments themselves) Key requirements of the ORD pilot
  • 14.
    • If resultsare expected to be commercially or industrially exploited • If participation is incompatible with the need for confidentiality in connection with security issues • Incompatible with existing rules on the protection of personal data • Would jeopardise the achievement of the main aim of the action • If the project will not generate / collect any research data • If there are other legitimate reasons to not take part in the Pilot Projects can opt out at any stage Can opt out totally or partially (i.e. for some data only) Should describe issues in the project DMP Exemptions – reasons for opting out
  • 15.
    Data Management Plans Projectsparticipating in the pilot will be required to develop a Data Management plan (DMP), in which they will specify what data will be open. Note that the Commission does NOT require applicants to submit a DMP at the proposal stage. A DMP is therefore NOT part of the evaluation. DMPs are a deliverable.
  • 16.
    Where relevant*, H2020proposals can include a section on data management which is evaluated under the criterion ‘Impact’ • What types of data will the project generate/collect? • What standards will be used? • How will this data be shared/made available? If not, why? • How will this data be curated and preserved? * For “Research and Innovation actions” and “Innovation Actions” DMPs are a project deliverable for those participating in the open data pilot. Not a fixed document – should evolve and gain precision – Deliver first version within initial 6 months of project – More elaborate versions whenever important changes to the project occur. At least at the mid-term and final review. Informantion on RDM: what and when PROPOSAL STAGE IN PROJECT
  • 17.
    How can researchersmake data open? 1. Choose the dataset(s) to share – What can be made open? This step may need to be revisited if problems are encountered later. 2. Apply an open license – Determine what IP exists. Apply a suitable licence e.g. CC-BY 3. Make the data available – Provide the data in a suitable format. Use repositories. 4. Make it discoverable – Post on the web, get a unique ID, register in catalogues… https://okfn.org
  • 18.
    Licensing research dataopenly This DCC guide outlines the pros and cons of each approach and gives practical advice on how to implement a data licence Source: http://www.dcc.ac.uk/resources/how- guides/license-research-data
  • 19.
    Which licenses areappropriate? Creative Commons clauses that limit sharing NC NonCommercial What counts as commercial? ND NoDerivatives Severely restricts use These clauses are not open licenses Horizon 2020 Open Access guidelines point to: or
  • 20.
    Deposit in researchdata repositories The EC guidelines point to Re3data as one of the registries that can be searched to find a home for data Source: http://service.re3data.org/search
  • 21.
    Zenodo is amulti-disciplinary repository that can be used for the long-tail of research data • Multidisciplinary repository accepting – Multiple data types – Publications – Software • Assigns a Digital Object Identifier (DOI) • Links funding, publications, data & software Zenodo (Source: www.zenodo.org)
  • 22.
    Metadata and documentation Metadatais basic descriptive information to help others identify and understand the structure of the data e.g. title, author... Documentation provides the wider context e.g. the methodology / workflow, software, tools and any information needed to understand and reuse the data Relevant standards should be used for interoperability – check out the Metadata Standards Directory from the Research Data Alliance (Source: http://rd-alliance.github.io/metadata-directory)
  • 23.
    FOSTER project Facilitate OpenScience Training for European Research • Network of open access trainers • Programme of open science courses • Portal to training materials • E-learning courses on open access and open data (Source: www.fosteropenscience.eu)
  • 24.
  • 25.
    Slides attribution: Slides 12-24are based on ”The Horizon2020 Open Data Pilot” by Sara Jones, Digital Curation Center https://www.slideshare.net/sjDCC/h2020-open-data- pilot
  • 26.

Editor's Notes

  • #17 Although DMPs are a project deliverable and not required at the application stage, proposals can include a section on data management if desired. The info suggested here is similar to the preliminary DMP, so essentially gets that started.
  • #18 These steps align with what the EC asks for: Choose which data to share – researchers asked to define this in DMP Apply open licence - Take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data Make the data available - deposit data in a research data repository Make it discoverable – provide associated metadata, provide information on the tool and instruments necessary for validating the results
  • #21 Make the data available by depositing in repositories
  • #23 Remember that the data need to be discoverable and understandable, so the associated metadata needs to be deposited too