Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
More Related Content
Similar to AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland)
Similar to AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland) (20)
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
AI-SDV 2022: Scientific publishing in the age of data mining and artificial intelligence Dieter Küry (Kuery Knowledge Management, Switzerland)
1. Scientific publishing in the
age of data mining and
artificial intelligence
Oktober 10, 2022, AI-SDV Conference, Vienna
Dr. Dieter Küry
Küry Knowledge Management GmbH, Riehen, Switzerland
2. Abstract
Most scientific journals request, that the complete set of research data is published
simultaneously with the peer-reviewed paper. The publication of the research data usually is
carried out as so-called "Supplementary Material", attached to the original paper, or on a
"Research Data Repository". Both forms have in common, that the data is usually published
unstructured and not in an uniform machine processable format. This makes its further use
in electronic tools for AI or in text and data mining unnecessarily difficult or even impossible.
A concept for a publication process is presented, in which the data is digitally recorded,
following the principle of FAIR data, as part of the publication process. This digital capture
makes the data available to the scientific community for easy use in data mining and AI
tools. The data in the repository contains links to the publication to document its origin. The
concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is
particularly suitable for open access publications. Moreover, the presentation highlights
correspondent activities, which were released in scientific publications recently.
11.10.2022 Kuery Knowledge Management GmbH 2
3. Recently published
11.10.2022 Kuery Knowledge Management GmbH 3
Kearnes S. M., Maser M. R., Wleklinski M., Kast A., Doyle A. G., Dreher S. D.,
Hawkins J. M., Jensen K. F., and Coley C. W.; J. Am. Chem. Soc. 143, 18820–
18826 (2021).
The Open Reaction Database
Baldi, P.. J.; Chem. Inf. Model. acs.jcim.1c01140 (2021)
doi:10.1021/acs.jcim.1c01140.
Call for a Public Open Database of All Chemical Reactions
Jablonka, K. M., Patiny, L. & Smit, B.; Nat. Chem. 14, 365–376 (2022).
Making the collective knowledge of chemistry open and machine actionable
4. Excerpt
11.10.2022 Kuery Knowledge Management GmbH 4
from: Kearnes S. M., Maser M. R., Wleklinski M., Kast A., Doyle A. G., Dreher S. D., Hawkins J.
M., Jensen K. F., and Coley C. W. The Open Reaction Database. J. Am. Chem. Soc. 143,
18820–18826 (2021). Publication date: November 2, 2021.
“To be clear: we believe that PDFs
without accompanying structured data
should no longer clear the bar for
publication.”
5. Where can research data be found?
Current solutions (I)
11.10.2022 Kuery Knowledge Management GmbH 5
Universal solutions
• Supporting/supplementary material for many journal titles
• Unstructured repositories such as Dryad or FigShare (PLOS)
• Sequences in WIPO applications to be listed in a
standardized form
Country specific solutions
• RADAR by FIZ Karlsruhe for Germany
6. Where can research data be found?
Current solutions (II)
11.10.2022 Kuery Knowledge Management GmbH 6
Subject area repositories
• Crystallographic data in the Cambridge Structural Database
(CSD) of the Cambridge Crystallographic Data Centre
(CCDC) (founded 1965)
• GenBank for sequences
• Clinicaltrials.gov for clinical trials data
• PubChem for chemical structures
7. Where can research data be found?
Supporting services
11.10.2022 Kuery Knowledge Management GmbH 7
Directories
• Registry of Research Data Repositories https://www.re3data.org/
Aggregator
• CORE https://core.ac.uk/
• Aggregator of open access research papers from repositories and journals
Guidelines
• FAIRsharing.org https://fairsharing.org/
• Catalogue of standards, databases and policies to help ensure that 'experiments are
reported with enough information to be comprehensible and (in principle) reproducible,
compared or integrated'
8. Research data issues
11.10.2022 Kuery Knowledge Management GmbH 8
DOI
• Enable easy
citation
• Provide a
permanent link to
the data
Structure and
formatting
• Data published in
an unstructured
way e.g., PDF
documents
• No minimum
standardization
e.g., FAIR data
principles
Inconsistent
guidelines by
publishers
• Authors can
choose how and
where to deposit
• Poor linking
between data and
publication
Storage of
data
• In repositories,
hosted by various
organizations
• Government
funded institutions
• Non-profit societies
and organizations
• Commercial
publishers
9. Purpose of scientific publishing
11.10.2022 Kuery Knowledge Management GmbH 9
Sharing research results with scientific community
Demand, so far
Contribution to the advancement of science
Goal
10. Purpose of scientific publishing
11.10.2022 Kuery Knowledge Management GmbH 10
• Provide data for analysis, text and data mining
• Provide data for AI tools
Additional demands, today
• New insights from scientific findings
• Use of available and published data for the own
scientific research
Goals
11. Area of conflict
11.10.2022 Kuery Knowledge Management GmbH 11
Structured data
easy reusable
Unstructured data
arduously
adaptable
12. Area of conflict
11.10.2022 Kuery Knowledge Management GmbH 12
Structured data
easy reusable
Unstructured data
arduously
adaptable
Reinforce with a modified
publication process
13. Fragmentation of scientific publishing
11.10.2022 Kuery Knowledge Management GmbH 13
Publication
with peer
review
Publication with
peer review
Supplementary
material
Preprint
Publication with
peer review
Supplementary
material /
research data
Preprint
Publication with
peer review
Supplementary
material /
research data
Notification and
discussion in
social media
14. Elements of a scientific publication
11.10.2022 Kuery Knowledge Management GmbH 14
Publication
Preprint
Comments/
discussion in
social media
Research data
Peer review
Final paper
Notifications in
social media by
author
15. To a new process of publishing
• Integrates all elements of a
publication
• Addresses challenges and
trends
• Increases value of research
data
11.10.2022 Kuery Knowledge Management GmbH 15
Forward-looking process
of publishing
Data
Final
paper
Peer
review
Pre-
print
Feed-
back
16. The forward-looking process of
publishing
11.10.2022 Kuery Knowledge Management GmbH 16
Final
paper
Peer review
Community
feedback
Revision
Comments
Research
data
Pre-print
Scientist
Author
Data scientist
All elements hosted by one
organization/publisher
17. Goals/advantages
Publication process
11.10.2022 Kuery Knowledge Management GmbH 17
All elements integrated in one process, managed by one organization/publisher
Full interlinking between all elements
Early publication of results as preprint
Community comments and feedback complement peer review
Transparent peer review with reviewer’s comments linked to publication
Final paper, authoritative for the advancement of science
Missing final paper apparent; Retraction includes preprints and data in repository
18. Goals/advantages
Research data
11.10.2022 Kuery Knowledge Management GmbH 18
Standardized data repositories
Source of data identifiable
Following FAIR data principles
Managed and supported by publisher
Data applicable for handling by AI and other software tools
Open data or DaaS subscription
19. Thank you
Dr. Dieter Küry
Kuery Knowledge Management GmbH, Riehen, Switzerland
dieter.kuery@kuery-km.ch
11.10.2022 Kuery Knowledge Management GmbH 19