Indexing &
Discovering a
Record of
Versions
The NIH Preprint Pilot
Kathryn Funk, MLIS, US National Library of Medicine
NISO Webinar - 21 April 2021
November 2019
How to maximize
the impact of
NIH -supported
interim
research
products
PMC
Full-text
archiving,
preservation
PUBMED
Citation,
abstracts,
metadata
Discovery,
Connections,
& Reuse
NLM's Article Databases. Indexing for Discovery
Indexing &
Discovery
What is needed to index for discovery?
(2019 presentation)
What is the scope?
How to...
curate?
present?
maintain?
connect?
***
Scope
Considerations
Build on PMC's role as
the repository for
NIH-funded peer-
reviewed articles
(2019)
Respond to
stakeholder need for
accelerated discovery
to COVID-19 literature
(2020)
Measured approach
to a new content type
to identify and resolve
workflow and data
challenges
COVID-19 PORTFOLIO
INDEXED
PREPRINTS
arXiv
bioRxiv
ChemRxiv
medRxiv
SSRN
Research Square
NIH
PREPRINT
PILOT
PHASE 1
Launched June 9, 2020
Pilot to run a minimum
of 12 months
Preprints
with
identifiable
NIH support
PMC
PubMed
Metadata for curation.
Author
Affiliations.
Funding
metadata.
Phase 1
Workflow
5
Check for updates
4
Load to PMC and
PubMed
3
Review and curate
candidates
2
Text mine &
Affiliation search
1
Start with COVID-19
Portfolio content
Preprint Records in
PMC and PubMed thru
2021 April 18
2,183
PREPRINT SERVER DISTRIBUTION
Preprint banner and filters
Archive and index full-text XML as
license terms allow
All previously indexed versions
available (under a single PMCID)
Preprints in
PMC
Preprint banner and filters
Make most current record
discoverable
LinkOut to preprint server
Preprints in
PubMed
Metadata for
indexing.
Preprint indicator
Preprint server name
Preprint server owner
Preprint posting date
Article title*
Article type
Authors*
Persistent identifier (preferably, DOI)*
Abstract*
*metadata for connecting
a record of versions.
Preprint v1 Preprint v2 Published article
Indexing a record
of versions.
Around 40% of the
preprint records in PMC
had been matched to a
published journal article
at the end of Q3.
Keeping current
Discovering
a record of
versions
PMC 1.5M
Preprint Page Views
PubMed 1.1M
Preprint Page Views
Increase each
quarter in authors
selecting NIH-
recommended
licenses.
OPEN SCIENCE
INDICATORS OF
73%
HAVE SUPPLEMENTAL
MATERIALS
F U L L T E X T P R E P R I N T S I N
P M C
19%
HAVE A DATA
AVAILABILITY STATEMENT
F U L L T E X T P R E P R I N T S I N
P M C
39%
HAVE SUPPLEMENTAL
MATERIALS
C O M P A R A B L E J O U R N A L
A R T I C L E S I N P M C
20%
HAVE A DATA
AVAILABILITY STATEMENT
C O M P A R A B L E J O U R N A L
A R T I C L E S I N P M C
Citing Preprints
Preprints in the pilot have been cited
~10k times
Two-thirds of those are citations of
preprints that have gone on to be
published in a journal
What's Next?
Continue to monitor
workflows for
scalability / pain points
01 02
Monitor Phase 1 impact
on discovery,
dissemination, & trust
03
Ongoing stakeholder
engagement
A record of versions
in a universe of
research objects.
Thank you!
kathryn.funk@nih.gov
This work was supported by the
Intramural Research Program of the
National Library of Medicine, National
Institutes of Health.

Funk "Indexing & Discovering a Record of Versions"