SlideShare a Scribd company logo
1 of 17
Francesco Osborne1, Angelo Salatino1,
Aliaksandr Birukou2, Enrico Motta1
1 KMi, The Open University, United Kingdom
2 Springer Nature
ISWC 2016
Automatic Classification of Springer Nature
Proceedings with Smart Topic Miner
Classifying Scholarly publications
It is a crucial task to enable scholars, students, companies and
other stakeholders to discover and access this knowledge.
2
• their own experience of
similar conferences;
• a visual exploration of titles
and abstracts;
• a list of terms given by the
curators or derived by calls for
papers.
Traditionally, editors choose a list of related keywords and
categories in relevant taxonomies according to:
Classifying Scholarly publications
Classify publication manually presents a number of issue for a
big editor such as Springer Nature.
• It a complex process that require expert editors
• It is time-consuming process which can hardly scale (1.5M
papers/year)
• It is easy to miss the emergence of a new topic
• It is easy to assume that some traditional topics are still
popular when this is no longer the case
• The keywords used in the call of papers are often a reflection
of what a venue aspires to be, rather than the real contents of
the proceedings.
3
44
Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data with Rexplore.
In International semantic web conference (pp. 460-477). (2013)
technologies.kmi.open.ac.uk/rexplore/
The Smart Topic Miner
The Smart Topic Miner (STM) is a semantic application designed
to support the Springer Nature Computer Science editorial
team in classifying scholarly publications.
5
http://rexplore.kmi.open.ac.uk/STM_demo
STM Architecture
6
Background Data - The Computer Science Ontology 1
• Not fine-grained enough.
– E.g., only 2 topics are classified under Semantic Web
• Static, manually defined, hence prone to get obsolete very
quickly.
7
Standard research areas taxonomies/classifications/ontologies
such as ACM are not apt to the task.
ACM 2012
The Computer Science Ontology was automatically created and
updated by applying the Klink-2 algorithm.
Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate
semantic topic networks. In ISWC 2015. (2015)
Background Data - The Computer Science Ontology 2
• We automatically generated a large-scale ontology consist of about
15,000 topics linked by about 70,000 semantic relationships.
• It included very granular and low level research areas, e.g., Linked
open data, Probabilistic packet marking, Synthetic aperture radar
imaging
• It can be regularly updated by running Klink-2 on a new set of
publications.
• It allows for a research topic to have multiple super-areas – i.e., the
taxonomic structure is a graph rather than a tree, e.g., Inductive Logic
Programming is a sub-area of both Machine Learning and Logic
Programming.
9
Background Data - The Computer Science Ontology 3
The initial keywords are enriched with terms extracted from the
publications and then mapped to a list of research areas in the CSO
ontology;
Initial Keywords
(from authors and editors)
(1) Computer Science [21]
--- (2) Internet [18]
-------- (3) World wide web [16]
------------- (4) Semantic web [16]
------------------ (5) Rdf [7]
------------------ (5) Linked data [5]
---------- (3) NLP systems [3]
--------------- (4) Question answering [2]
---------- (3) Recommender systems [2]
--- (2) Artificial intelligence [12]
-------- (3) Knowledge based systems [8]
------------- (4) Knowledge representation [4]
------------------ (5) Description logic [3]
-------- (3) Machine learning [4]
(1) Semantics [24]
--- (2) Ontology [10]
--- (2) Metadata [7]
-------- (3) Rdf [7]
--- (2) Semantic web [16]
(1) Language [5]
--- (2) Vocabulary [2] […]
semantic:24, rdf:7, applications:5, semantic
web:5, knowledge base:4, linked data:4,
ontology:4, ontologies:4, language:3,
knowledge bases:3, algorithms:2,
integration:2, architecture:2, semantics:2,
knowledge management:2, query
answering:2, recommendation:2, question
answering system:2, semantic similarity:2,
question answering:2, vocabulary:2, svm:1,
graph traversal:1, information needs:1, path
ranking:1, baidu encyclopedia:1, non-
aggregation questions:1, support vector
machine:1, implicit information:1,
construction:1, knowledge base
completion:1, relational constraints:1,
semantical regularizations:1, support vector
machine (svm):1, machine learning:1,
support vector:1, facts:1, logic
programming:1, multi-strategy learning:1,
distant supervision:1, competitor mining:1,
lossy compression:1, comprehensive
evaluation:1, relation reasoning:1,
websites:1, competition:1, decision
support:1, learning algorithm:1 […]
linked data:3, relational constraints:1,
semantical regularizations:1, question
answering:1, graph traversal:1, non-
aggregation questions:1, implicit
information:1, knowledge base
completion:1, dbpedia:1, recommender
system:1, relation extraction:1, weakly
supervised:1, baidu encyclopedia:1, svm:1,
path ranking:1, medical events:1, competitor
mining:1, description logics:1, multi-strategy
learning:1, distant supervision:1, relation
reasoning:1, non-standard reasoning
services:1, concept similarity measures:1,
semantic data:1, medical guidelines:1, rdf:1,
prolog:1, preference profile:1, similarity
measure:1, ontology development:1,
knowledge representation:1, graph
simplification:1, rdf visualization:1, triple
ranking:1, sparql-rank:1, rank-join
operator:1, “shaowei” (稍微 ‘a little’):1,
minimal degree adverb:1, a little:1, rdf native
storage:1, news analysis:1, meta-data
extraction:1, database integration:1, elderly
nursing care:1 […]
Enriched Keywords
(extracted from abstract, titles, etc)
CSO Ontology topics
STM Approach – 1 Topic extraction
A greedy set-covering algorithm is used to reduce the topics to a user-
friendly number.
• We run the algorithm separately on the set of topics at each level of
the ontology, to preserve both high level and granular research areas.
• The standard version of the greedy set-covering algorithm did not
work well in this domain: multiple high level topics cover a similar set
of papers.
• It assigns an initial weight to each paper and at each iteration it selects
the topic which covered the publications with the highest weight and
reduces the weight of every covered paper.
11
STM Approach – 2 Topic Selection
The selected topics are used to infer a number of SNC tags, using the
mapping between CSO ontology and SNC.
I00001 : computer science, general
I23001 : computer applications
I23050 : computational
biology/bioinformatics
I13006 : computer systems organization an
communication networks
I13014 : processor architectures
I13022 : computer comm. networks
I21009 : computing methodologies
I21017 : artificial intelligence
I1200X : computer hardware
I12050 : logic design
I14002 : software engineering/programming
and operating systems
I22005 : computer imaging, vision, pattern
recognition and graphics
I22021 : image processing
I18008 : information sys. and comm. servic
I18030 : data mining, knowledge discove
(1) Computer Science [69]
(2) Bioinformatics [69]
(2) Artificial intelligence [16]
(3) Machine learning [9]
(4) Support vector machines [7]
(2) Computer architecture [13]
(3) Program processors [13]
(4) Graphics Processing Unit (GPU) [7]
(5) Cuda [3]
(2) Image processing [12]
(3) Image reconstruction [6]
(2) Data mining [9]
[…]
(3) Telecommunication networks [5]
STM Approach – 3 Tag Selection
User Trial 1
We conducted individual sessions with 8 experienced SN editors.
We introduced STM for about 15 minutes and then asked them to
classify a number of proceedings in their fields of expertize for about 45
minutes.
The expertise of the editors included: Theoretical Computer Science,
Computer Networks, Software Engineering, HCI, AI, Bioinformatics, and
Security.
After the hands-on session the editors filled a three-parts survey:
• Background and expertise
• Five questions about the strengths and weaknesses of STM and three
about the quality of the results
• SUS questionnaire
13
User Trial 2
Background and expertise
• On average 13 years of experience (7 out of 8 having at least 5 years)
• All of them stated to have extensive knowledge of the main topic
classifications in their fields
• Four of them considered themselves also experts at working with digital
proceedings.
Open questions about STM strengths and weaknesses
• STM had a positive effect on their work.
• They estimated the accuracy of the results between 75% and 90%.
• Limitation: the scope limited to the Computer Science field and occasional
noisy results when examining books with very few chapters.
• Suggested features: produce analytics about the evolution of a venue or a
journal in terms; allowing users to find the most significant proceedings for a
topic.
14
User Trial 3
Quality of results and usability
SUS: 77/100, 80% percentile rank
15
Conclusions
Key Lessons
• Allow users to know the rationale behind a suggestion.
• Value of Semantic Technologies for helping users in addressing noisy data.
Future work
• Discussing a project to further integrate STM into Springer Nature
workflows.
• Extending STM to characterize the evolution of conferences and
venues in time.
– e.g. highlighting new emerging topics, as well as the fact that some traditional
topics are fading out
• Using STM for directly supporting authors in defining the set of
topics which best describe their paper.
16
Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta
Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic
Classification of Springer Nature Proceedings with Smart Topic
Miner. In International Semantic Web Conference (pp. 383-399).
Springer International Publishing. (2016)
Email: francesco.osborne@open.ac.uk
Twitter: FraOsborne
Site: people.kmi.open.ac.uk/francesco

More Related Content

What's hot

The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
Angelo Salatino
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Andre Freitas
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
Salam Shah
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
GESIS
 

What's hot (20)

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Algorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasetsAlgorithms for the thematic analysis of twitter datasets
Algorithms for the thematic analysis of twitter datasets
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPH
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic Web
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Research Statement
Research StatementResearch Statement
Research Statement
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 

Similar to Automatic Classification of Springer Nature Proceedings with Smart Topic Miner

ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
Francesco Osborne
 

Similar to Automatic Classification of Springer Nature Proceedings with Smart Topic Miner (20)

Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
JCDL 2013 DOCTORAL CONSORTIUM
JCDL 2013 DOCTORAL CONSORTIUMJCDL 2013 DOCTORAL CONSORTIUM
JCDL 2013 DOCTORAL CONSORTIUM
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabus
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
0 computers and social sciences pmy 2330 lectures notes 2017
0 computers and social sciences pmy 2330 lectures notes 20170 computers and social sciences pmy 2330 lectures notes 2017
0 computers and social sciences pmy 2330 lectures notes 2017
 
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...ISWC 2019 -  Improving Editorial Workflow and Metadata Quality at Springer Na...
ISWC 2019 - Improving Editorial Workflow and Metadata Quality at Springer Na...
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
Integrating Semantic Systems
Integrating Semantic SystemsIntegrating Semantic Systems
Integrating Semantic Systems
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Be computer-engineering-2012
Be computer-engineering-2012Be computer-engineering-2012
Be computer-engineering-2012
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 

Recently uploaded

Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 

Automatic Classification of Springer Nature Proceedings with Smart Topic Miner

  • 1. Francesco Osborne1, Angelo Salatino1, Aliaksandr Birukou2, Enrico Motta1 1 KMi, The Open University, United Kingdom 2 Springer Nature ISWC 2016 Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
  • 2. Classifying Scholarly publications It is a crucial task to enable scholars, students, companies and other stakeholders to discover and access this knowledge. 2 • their own experience of similar conferences; • a visual exploration of titles and abstracts; • a list of terms given by the curators or derived by calls for papers. Traditionally, editors choose a list of related keywords and categories in relevant taxonomies according to:
  • 3. Classifying Scholarly publications Classify publication manually presents a number of issue for a big editor such as Springer Nature. • It a complex process that require expert editors • It is time-consuming process which can hardly scale (1.5M papers/year) • It is easy to miss the emergence of a new topic • It is easy to assume that some traditional topics are still popular when this is no longer the case • The keywords used in the call of papers are often a reflection of what a venue aspires to be, rather than the real contents of the proceedings. 3
  • 4. 44 Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data with Rexplore. In International semantic web conference (pp. 460-477). (2013) technologies.kmi.open.ac.uk/rexplore/
  • 5. The Smart Topic Miner The Smart Topic Miner (STM) is a semantic application designed to support the Springer Nature Computer Science editorial team in classifying scholarly publications. 5 http://rexplore.kmi.open.ac.uk/STM_demo
  • 7. Background Data - The Computer Science Ontology 1 • Not fine-grained enough. – E.g., only 2 topics are classified under Semantic Web • Static, manually defined, hence prone to get obsolete very quickly. 7 Standard research areas taxonomies/classifications/ontologies such as ACM are not apt to the task. ACM 2012
  • 8. The Computer Science Ontology was automatically created and updated by applying the Klink-2 algorithm. Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In ISWC 2015. (2015) Background Data - The Computer Science Ontology 2
  • 9. • We automatically generated a large-scale ontology consist of about 15,000 topics linked by about 70,000 semantic relationships. • It included very granular and low level research areas, e.g., Linked open data, Probabilistic packet marking, Synthetic aperture radar imaging • It can be regularly updated by running Klink-2 on a new set of publications. • It allows for a research topic to have multiple super-areas – i.e., the taxonomic structure is a graph rather than a tree, e.g., Inductive Logic Programming is a sub-area of both Machine Learning and Logic Programming. 9 Background Data - The Computer Science Ontology 3
  • 10. The initial keywords are enriched with terms extracted from the publications and then mapped to a list of research areas in the CSO ontology; Initial Keywords (from authors and editors) (1) Computer Science [21] --- (2) Internet [18] -------- (3) World wide web [16] ------------- (4) Semantic web [16] ------------------ (5) Rdf [7] ------------------ (5) Linked data [5] ---------- (3) NLP systems [3] --------------- (4) Question answering [2] ---------- (3) Recommender systems [2] --- (2) Artificial intelligence [12] -------- (3) Knowledge based systems [8] ------------- (4) Knowledge representation [4] ------------------ (5) Description logic [3] -------- (3) Machine learning [4] (1) Semantics [24] --- (2) Ontology [10] --- (2) Metadata [7] -------- (3) Rdf [7] --- (2) Semantic web [16] (1) Language [5] --- (2) Vocabulary [2] […] semantic:24, rdf:7, applications:5, semantic web:5, knowledge base:4, linked data:4, ontology:4, ontologies:4, language:3, knowledge bases:3, algorithms:2, integration:2, architecture:2, semantics:2, knowledge management:2, query answering:2, recommendation:2, question answering system:2, semantic similarity:2, question answering:2, vocabulary:2, svm:1, graph traversal:1, information needs:1, path ranking:1, baidu encyclopedia:1, non- aggregation questions:1, support vector machine:1, implicit information:1, construction:1, knowledge base completion:1, relational constraints:1, semantical regularizations:1, support vector machine (svm):1, machine learning:1, support vector:1, facts:1, logic programming:1, multi-strategy learning:1, distant supervision:1, competitor mining:1, lossy compression:1, comprehensive evaluation:1, relation reasoning:1, websites:1, competition:1, decision support:1, learning algorithm:1 […] linked data:3, relational constraints:1, semantical regularizations:1, question answering:1, graph traversal:1, non- aggregation questions:1, implicit information:1, knowledge base completion:1, dbpedia:1, recommender system:1, relation extraction:1, weakly supervised:1, baidu encyclopedia:1, svm:1, path ranking:1, medical events:1, competitor mining:1, description logics:1, multi-strategy learning:1, distant supervision:1, relation reasoning:1, non-standard reasoning services:1, concept similarity measures:1, semantic data:1, medical guidelines:1, rdf:1, prolog:1, preference profile:1, similarity measure:1, ontology development:1, knowledge representation:1, graph simplification:1, rdf visualization:1, triple ranking:1, sparql-rank:1, rank-join operator:1, “shaowei” (稍微 ‘a little’):1, minimal degree adverb:1, a little:1, rdf native storage:1, news analysis:1, meta-data extraction:1, database integration:1, elderly nursing care:1 […] Enriched Keywords (extracted from abstract, titles, etc) CSO Ontology topics STM Approach – 1 Topic extraction
  • 11. A greedy set-covering algorithm is used to reduce the topics to a user- friendly number. • We run the algorithm separately on the set of topics at each level of the ontology, to preserve both high level and granular research areas. • The standard version of the greedy set-covering algorithm did not work well in this domain: multiple high level topics cover a similar set of papers. • It assigns an initial weight to each paper and at each iteration it selects the topic which covered the publications with the highest weight and reduces the weight of every covered paper. 11 STM Approach – 2 Topic Selection
  • 12. The selected topics are used to infer a number of SNC tags, using the mapping between CSO ontology and SNC. I00001 : computer science, general I23001 : computer applications I23050 : computational biology/bioinformatics I13006 : computer systems organization an communication networks I13014 : processor architectures I13022 : computer comm. networks I21009 : computing methodologies I21017 : artificial intelligence I1200X : computer hardware I12050 : logic design I14002 : software engineering/programming and operating systems I22005 : computer imaging, vision, pattern recognition and graphics I22021 : image processing I18008 : information sys. and comm. servic I18030 : data mining, knowledge discove (1) Computer Science [69] (2) Bioinformatics [69] (2) Artificial intelligence [16] (3) Machine learning [9] (4) Support vector machines [7] (2) Computer architecture [13] (3) Program processors [13] (4) Graphics Processing Unit (GPU) [7] (5) Cuda [3] (2) Image processing [12] (3) Image reconstruction [6] (2) Data mining [9] […] (3) Telecommunication networks [5] STM Approach – 3 Tag Selection
  • 13. User Trial 1 We conducted individual sessions with 8 experienced SN editors. We introduced STM for about 15 minutes and then asked them to classify a number of proceedings in their fields of expertize for about 45 minutes. The expertise of the editors included: Theoretical Computer Science, Computer Networks, Software Engineering, HCI, AI, Bioinformatics, and Security. After the hands-on session the editors filled a three-parts survey: • Background and expertise • Five questions about the strengths and weaknesses of STM and three about the quality of the results • SUS questionnaire 13
  • 14. User Trial 2 Background and expertise • On average 13 years of experience (7 out of 8 having at least 5 years) • All of them stated to have extensive knowledge of the main topic classifications in their fields • Four of them considered themselves also experts at working with digital proceedings. Open questions about STM strengths and weaknesses • STM had a positive effect on their work. • They estimated the accuracy of the results between 75% and 90%. • Limitation: the scope limited to the Computer Science field and occasional noisy results when examining books with very few chapters. • Suggested features: produce analytics about the evolution of a venue or a journal in terms; allowing users to find the most significant proceedings for a topic. 14
  • 15. User Trial 3 Quality of results and usability SUS: 77/100, 80% percentile rank 15
  • 16. Conclusions Key Lessons • Allow users to know the rationale behind a suggestion. • Value of Semantic Technologies for helping users in addressing noisy data. Future work • Discussing a project to further integrate STM into Springer Nature workflows. • Extending STM to characterize the evolution of conferences and venues in time. – e.g. highlighting new emerging topics, as well as the fact that some traditional topics are fading out • Using STM for directly supporting authors in defining the set of topics which best describe their paper. 16
  • 17. Francesco Osborne Angelo Salatino Aliaksandr Birukou Enrico Motta Osborne, F., Salatino, A., Birukou, A. and Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. In International Semantic Web Conference (pp. 383-399). Springer International Publishing. (2016) Email: francesco.osborne@open.ac.uk Twitter: FraOsborne Site: people.kmi.open.ac.uk/francesco