SlideShare uma empresa Scribd logo
1 de 22
Improving Editorial Workflow and Metadata
Quality at Springer Nature
Angelo Salatino1, Francesco Osborne1,
Aliaksandr Birukou2, Enrico Motta1
1
Knowledge Media Institute, The Open University, United Kingdom
2
Springer Nature, Heidelberg, Germany
ISWC 2019
Open University and Springer Nature Collaboration
The Open University and Springer Nature have been collaborating since 2014 in
the development of an array of semantically-enhanced solutions for:
Osborne et al. (2017) Supporting Springer Nature Editors by means of Semantic Technologies. ISWC 2017. Vienna, Austria.
• Semi-automatic classification of proceedings
and other editorial products.
• Automatic selection of the most appropriate
books, journals, and proceedings to market at a
scientific event.
• Analysis of SN codes, with the aim of evolving
marked codes and detecting fields that deserve
further attention.
• Joint release of the Computer Science Ontology.
Generation of Metadata
It is a crucial task to enable scholars, students, companies and other stakeholders to
discover and access this knowledge.
Traditionally, editors choose a list of related
keywords and categories in relevant taxonomies
according to:
• their own experience of similar conferences;
• a visual exploration of titles and abstracts;
• a list of terms given by the curators or derived
by calls for papers.
Classification of Publications – A Complex Problem
Classify publications manually presents a number of issues for
a large editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale
• It is easy to miss the emergence of new topics
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.
Smart Topic Miner 1.0 - 2016
Smart Topic Miner 1.0 - 2016
Presented at ISWC 2016
Osborne, F., Salatino, A., Birukou, A. and Motta,
E.: Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner. ISWC 2016
A success story
• Since 2016 STM had been regularly used by editors in Germany,
China, Brazil, India, and Japan.
• It is used to classify more than 800 conference proceedings
volume per year including the Lecture Notes in Computer Science
(LNCS) as well as LNBIP, CCIS, IFIP-AICT, LNICST.
• It changed completely SN internal workflow: now the task is semi-
automatic and monitored by junior editors.
• It is constantly evolving and including new functionalities,
following the feedback from the editorial team.
Smart Topic Miner 1.0 - 2016
Smart Topic Miner 1.2 - 2017
Smart Topic Miner 2.0 - 2019
Business Value
• STM halves the time needed for classifying proceedings from
30 to 15 minutes.
• It allows also junior editors to work on the classification of
proceedings, distributing the load and reducing costs.
• The adoption of a controlled vocabulary makes the process
more robust and facilitates the identification of related
editorial products.
11
Retrievability
About 9M of additional downloads thanks to STM.
0
5000
10000
15000
20000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Average number of yearly downloads
for books in SpringerLink
downloads (CS Proceedings) expected downloads (CS Proceedings)
downloads (CS Proceedings) withSTM downloads (other books in CS)
downloads (overall)
Smart Topic Miner 2.0 - 2019
http://stm-demo.kmi.open.ac.uk
Demo 462
Smart Topic Miner 2.0 - 2019
• New GUI.
• New Knowledge Base (CSO).
• New Topic Detection Engine
(CSO Classifier).
• Ability to compare with
previous editions.
• Integrated with SN system
and CSO Portal.
http://stm-demo.kmi.open.ac.uk
SN Editors
HTML - GUI
Parser
Generate
Visualizations
STM Engine
CSO
SNCs
Historical
Data
i) CSO Classifier
ii) Topic Explanation
iii) Taxonomy Generation
iv) SN Tags Inference
v) Previous Classification
word2vec model
STM 2.0 - architecture
A new knowledge base - The Computer Science
Ontology
The Computer Science Ontology (CSO) is a large-scale, automatically generated
ontology of research areas. It is the largest ontology in the field of Computer Science,
including about 14K topics and 162K semantic relationships.
Salatino et al (2019) The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas. Data Intelligence.
http://cso.kmi.open.ac.uk/
A new topic detection engine - The CSO Classifier
The CSO Classifier is a unsupervised approach for automatically classifying documents
according to CSO.
Salatino et al. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles.
https://cso.kmi.open.ac.uk/classify/
https://cso.kmi.open.ac.uk/classify/
https://github.com/angelosalatino/cso-classifier
Download
Demo
pip install cso-classifier
The CSO Classifier - Architecture
Evaluation - Performance
Classifier Description Prec. Rec. F1
TF-IDF TF-IDF 16.7% 24.0% 19.7%
TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1%
LDA100 LDA with 100 topics. 5.9% 11.9% 7.9%
LDA500 LDA with 500 topics. 4.2% 12.5% 6.3%
LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3%
LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6%
LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2%
LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7%
W2V-W W2V on windows of words. 41.2% 16.7% 23.8%
STM - 2016 Classifier used by STM 1.0. 80.8% 58.2% 67.6%
STM – 2017 (CSO-SYN) CSO Classifier -Syntactic module. 78.3% 63.8% 70.3%
CSO-SEM CSO Classifier -Semantic module. 70.8% 72.2% 71.5%
STM – 2019 (CSO-C) The CSO Classifier. 73.0% 75.3% 74.1%
Computed on a GS of 70 publications, each annotated by 3 researchers.
Evaluation - Usability
System SUS score Grade Percentile
STM 2016 76.6 B 80%
STM 2019 82.8 A 93%
0 20 40 60 80 100
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Score
0 1 2 3 4 5
Editor 4
Editor 1
Editor 9
Editor 5
Editor 6
Editor 7
Editor 3
Editor 2
Editor 8
SUS Categories
Want to use frequently Easy to use
Easy to Learn Too complex
Conclusion and Future Work
• “A little semantic goes a long way”
• Semantic explainability is crucial in this domain
• We are working on an application that will support authors in
annotating their own papers.
• Typing of scientific entities: approaches, tasks, domains,
resources.
• Automatic extraction of Scientific Knowledge Graph.
Francesco
Osborne
Angelo
Salatino
Aliaksandr
Birukou
Enrico
Motta
Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic
Classification of Springer Nature Proceedings with Smart Topic
Miner. In ISWC 2016 ). Available at http://rdcu.be/wEHY
Email: francesco.osborne@open.ac.uk
Twitter: FraOsborne
Site: people.kmi.open.ac.uk/francesco
See also

Mais conteúdo relacionado

Semelhante a ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyCornelius Puschmann
 
Publishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it workPublishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it workAliaksandr Birukou
 
Streaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsStreaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsLinked Enterprise Date Services
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Martin Voigt
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semanticssemanticsconference
 
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...CaaS EU FP7 Project
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2
 
Springer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshotsSpringer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshotsAliaksandr Birukou
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsAndrew Clark
 
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...Andy McNamara
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Studyswolny
 
CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayMartin Turner
 
Monitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalMonitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalIna Smith
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
SCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdfSCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdfSharmilaDevi90
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...FajarMaulana962405
 

Semelhante a ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature (20)

Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
SciVerse @ TJU
SciVerse @ TJUSciVerse @ TJU
SciVerse @ TJU
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
Publishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it workPublishing conference proceedings internationally: how does it work
Publishing conference proceedings internationally: how does it work
 
Streaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsStreaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and Semantics
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
 
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
Context-oriented Knowledge Management in Production Networks @Gsom Emerging m...
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product Overview
 
Springer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshotsSpringer LOD conference portal. Demo paper - screenshots
Springer LOD conference portal. Demo paper - screenshots
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit Analytics
 
IA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature ReviewIA377 Seminar FEEC-UNICAMP Literature Review
IA377 Seminar FEEC-UNICAMP Literature Review
 
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
A METHODOLOGICAL APPROACH TO SUPPORT BUILDING LIFE CYCLE ANALYSIS - Andy McNa...
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 May
 
Monitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalMonitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access Journal
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
SCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdfSCOPUS PAPER EJMCM.pdf
SCOPUS PAPER EJMCM.pdf
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
Applied Optimization and Swarm Intelligence (Springer Tracts in Nature-Inspir...
 

Mais de Francesco Osborne

Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Francesco Osborne
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...Francesco Osborne
 
Klink-2: integrating multiple web sources to generate semantic topic networks
 Klink-2: integrating multiple web sources to generate semantic topic networks Klink-2: integrating multiple web sources to generate semantic topic networks
Klink-2: integrating multiple web sources to generate semantic topic networksFrancesco Osborne
 
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...Francesco Osborne
 
Ekaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User FeedbackEkaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User FeedbackFrancesco Osborne
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25Francesco Osborne
 

Mais de Francesco Osborne (7)

Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
Forecasting the Spreading of Technologies in Research Communities @ K-CAP 2017
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
 
Klink-2: integrating multiple web sources to generate semantic topic networks
 Klink-2: integrating multiple web sources to generate semantic topic networks Klink-2: integrating multiple web sources to generate semantic topic networks
Klink-2: integrating multiple web sources to generate semantic topic networks
 
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research C...
 
Ekaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User FeedbackEkaw2014 - Inferring Semantic Relations by User Feedback
Ekaw2014 - Inferring Semantic Relations by User Feedback
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Último (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Nature

  • 1. Improving Editorial Workflow and Metadata Quality at Springer Nature Angelo Salatino1, Francesco Osborne1, Aliaksandr Birukou2, Enrico Motta1 1 Knowledge Media Institute, The Open University, United Kingdom 2 Springer Nature, Heidelberg, Germany ISWC 2019
  • 2. Open University and Springer Nature Collaboration The Open University and Springer Nature have been collaborating since 2014 in the development of an array of semantically-enhanced solutions for: Osborne et al. (2017) Supporting Springer Nature Editors by means of Semantic Technologies. ISWC 2017. Vienna, Austria. • Semi-automatic classification of proceedings and other editorial products. • Automatic selection of the most appropriate books, journals, and proceedings to market at a scientific event. • Analysis of SN codes, with the aim of evolving marked codes and detecting fields that deserve further attention. • Joint release of the Computer Science Ontology.
  • 3. Generation of Metadata It is a crucial task to enable scholars, students, companies and other stakeholders to discover and access this knowledge. Traditionally, editors choose a list of related keywords and categories in relevant taxonomies according to: • their own experience of similar conferences; • a visual exploration of titles and abstracts; • a list of terms given by the curators or derived by calls for papers.
  • 4. Classification of Publications – A Complex Problem Classify publications manually presents a number of issues for a large editor such as Springer Nature. • It a complex process that require expert editors • It is time-consuming process which can hardly scale • It is easy to miss the emergence of new topics • It is easy to assume that some traditional topics are still popular when this is no longer the case • The keywords used in the call of papers are often a reflection of what a venue aspires to be, rather than the real contents of the proceedings.
  • 5. Smart Topic Miner 1.0 - 2016
  • 6. Smart Topic Miner 1.0 - 2016 Presented at ISWC 2016 Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. ISWC 2016
  • 7. A success story • Since 2016 STM had been regularly used by editors in Germany, China, Brazil, India, and Japan. • It is used to classify more than 800 conference proceedings volume per year including the Lecture Notes in Computer Science (LNCS) as well as LNBIP, CCIS, IFIP-AICT, LNICST. • It changed completely SN internal workflow: now the task is semi- automatic and monitored by junior editors. • It is constantly evolving and including new functionalities, following the feedback from the editorial team.
  • 8. Smart Topic Miner 1.0 - 2016
  • 9. Smart Topic Miner 1.2 - 2017
  • 10. Smart Topic Miner 2.0 - 2019
  • 11. Business Value • STM halves the time needed for classifying proceedings from 30 to 15 minutes. • It allows also junior editors to work on the classification of proceedings, distributing the load and reducing costs. • The adoption of a controlled vocabulary makes the process more robust and facilitates the identification of related editorial products. 11
  • 12. Retrievability About 9M of additional downloads thanks to STM. 0 5000 10000 15000 20000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Average number of yearly downloads for books in SpringerLink downloads (CS Proceedings) expected downloads (CS Proceedings) downloads (CS Proceedings) withSTM downloads (other books in CS) downloads (overall)
  • 13. Smart Topic Miner 2.0 - 2019 http://stm-demo.kmi.open.ac.uk Demo 462
  • 14. Smart Topic Miner 2.0 - 2019 • New GUI. • New Knowledge Base (CSO). • New Topic Detection Engine (CSO Classifier). • Ability to compare with previous editions. • Integrated with SN system and CSO Portal. http://stm-demo.kmi.open.ac.uk
  • 15. SN Editors HTML - GUI Parser Generate Visualizations STM Engine CSO SNCs Historical Data i) CSO Classifier ii) Topic Explanation iii) Taxonomy Generation iv) SN Tags Inference v) Previous Classification word2vec model STM 2.0 - architecture
  • 16. A new knowledge base - The Computer Science Ontology The Computer Science Ontology (CSO) is a large-scale, automatically generated ontology of research areas. It is the largest ontology in the field of Computer Science, including about 14K topics and 162K semantic relationships. Salatino et al (2019) The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas. Data Intelligence. http://cso.kmi.open.ac.uk/
  • 17. A new topic detection engine - The CSO Classifier The CSO Classifier is a unsupervised approach for automatically classifying documents according to CSO. Salatino et al. (2019) The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles. https://cso.kmi.open.ac.uk/classify/ https://cso.kmi.open.ac.uk/classify/ https://github.com/angelosalatino/cso-classifier Download Demo pip install cso-classifier
  • 18. The CSO Classifier - Architecture
  • 19. Evaluation - Performance Classifier Description Prec. Rec. F1 TF-IDF TF-IDF 16.7% 24.0% 19.7% TF-IDF-M TF-IDF mapped to CSO concepts. 40.4% 24.1% 30.1% LDA100 LDA with 100 topics. 5.9% 11.9% 7.9% LDA500 LDA with 500 topics. 4.2% 12.5% 6.3% LDA1000 LDA with 1000 topics. 3.8% 5.0% 4.3% LDA100-M LDA with 100 topics mapped to CSO. 9.4% 19.3% 12.6% LDA500-M LDA with 500 topics mapped to CSO. 9.6% 21.2% 13.2% LDA1000-M LDA with 1000 topics mapped to CSO. 12.0% 11.5% 11.7% W2V-W W2V on windows of words. 41.2% 16.7% 23.8% STM - 2016 Classifier used by STM 1.0. 80.8% 58.2% 67.6% STM – 2017 (CSO-SYN) CSO Classifier -Syntactic module. 78.3% 63.8% 70.3% CSO-SEM CSO Classifier -Semantic module. 70.8% 72.2% 71.5% STM – 2019 (CSO-C) The CSO Classifier. 73.0% 75.3% 74.1% Computed on a GS of 70 publications, each annotated by 3 researchers.
  • 20. Evaluation - Usability System SUS score Grade Percentile STM 2016 76.6 B 80% STM 2019 82.8 A 93% 0 20 40 60 80 100 Editor 4 Editor 1 Editor 9 Editor 5 Editor 6 Editor 7 Editor 3 Editor 2 Editor 8 SUS Score 0 1 2 3 4 5 Editor 4 Editor 1 Editor 9 Editor 5 Editor 6 Editor 7 Editor 3 Editor 2 Editor 8 SUS Categories Want to use frequently Easy to use Easy to Learn Too complex
  • 21. Conclusion and Future Work • “A little semantic goes a long way” • Semantic explainability is crucial in this domain • We are working on an application that will support authors in annotating their own papers. • Typing of scientific entities: approaches, tasks, domains, resources. • Automatic extraction of Scientific Knowledge Graph.
  • 22. Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. In ISWC 2016 ). Available at http://rdcu.be/wEHY Email: francesco.osborne@open.ac.uk Twitter: FraOsborne Site: people.kmi.open.ac.uk/francesco See also

Notas do Editor

  1. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  2. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  3. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  4. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  5. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  6. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  7. Smart Topic Miner (STM) is the system that we created for assisting the Springer Nature editorial team in classifying scholarly publications in the field of Computer Science. It takes in input one or more books and returns a representation of its research topics, a description of each chapter, and an explanation for each inferred topic. STM has been used by Springer Nature since January 2017 to annotate several book series in Computer Science (e.g., LNCS) for a total of about 800 volumes each year. During this period, the adoption of STM has halved the time needed for classifying proceedings and allowed a more robust and comprehensive representation of the research areas in the Springer Nature catalogue.
  8. In the scholarly domain, ontologies are often used to facilitate the integration of large datasets of research data, the exploration of the academic landscape, information extraction from scientific articles, and so on. On January 2019, KMi released, in conjunction with Springer Nature, the Computer Science Ontology (CSO), which is the largest taxonomy of research areas in the field. This resource was automatically generated by mining a dataset of 16M publications and using a combination of machine learning and semantic technologies to extract 14K research topics and 162K semantic relationships. CSO includes a much larger number of research topics than the alternatives (e.g., ACM Classification), enabling a very granular characterisation of the content of research papers, and it can be easily updated by running our ontology learning approach on recent corpora of publications. It attracted the attentional of several institutions and companies, such as Digital Science, Elsevier, and ACM, interested in adopting CSO for characterizing their datasets of research publications. We are currently developing a similar ontology in the field of Engineering and we plan of applying our technology on several other fields (Biomedical, Economics).
  9. Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. The CSO Classifier is an application for automatically classifying research papers according to CSO. We are currently using it to enrich the description of 150K publications on Springer Nature online library. We also started a collaboration with Digital Science, the creators of Dimensions, with the aim of automatically annotating their dataset of scholarly data. The resulting characterization of research papers can be used for supporting tasks such as identifying research communities, forecasting research trends, detecting relevant reviewers, and so on.