SlideShare uma empresa Scribd logo
1 de 25
Scientific Table Type
Classification in Digital Library
Seongchan Kim, Keejun Han, Ying Liu D
ept. of Knowledge Service Engineering K
AIST, Korea
Soon Young Kim
Dept. of Overseas Information
KISTI, Korea
Sept. 6, 2012 (3:35-3:55)
Outline
Introduction
Table Type Taxonomy
IMRAD-Base
d Fine-Graine
d
Table Type Distribution
Classification
Conclusion
Outline
Introduction
Table Type Taxonomy
IMRAD-Base
d Fine-Graine
d
Classification
Conclusion
4 / 20
Introduction
• Are there any special types of tables in papers in scientific papers?
If yes? What are they?
5 / 20
Introduction
• Are there any special types of tables in papers in scientific papers?
If yes? What are they?
Outline
Introduction
Table Type Taxonomy
IMRAD-Base
d Fine-Graine
d
Classification
Conclusion
7 / 20
Table Type Taxonomy
2,500 tables randomly from 25 randomly selected scie
ntific journals published by Springer from 2006 to 2010
Biomedical and Life Science, Chemistry and Materials S
cience, Computer Science, Electrical Engineering, and
Medicine
TableSeer
We found
 IMRAD-based table taxonomy
 Fine-grained table taxonomy
IMRAD-Based Table Taxonomy
Considering the structural position of tables within a
document
Table type is simply decided by the location of the
table
 E.g) if a table is in the introduction part of the paper
 Introduction table Other functions
Scientific
Table
Introduction
Table
Method
Table
Resul
t Tabl
e
Discussio
n Table
8 / 20
Fine-Grained Table Taxonomy
Table type is decided by table contents and purposes
Scientific
Table
Definition
Table
Statistic
Table
Survey
Table
Example
Table
Procedure
Table
Experiment
Setting T
able
Experiment
Result T
able
9 / 20
Fine-Grained Table Taxonomy
Definition Tables
 Consist of defining terms
and their explanations
 Usually appears before
the experiment
10 / 20
Fine-Grained Table Taxonomy
Statistics/Distribution
Tables
 Common statistical or
distribution data
 Not related with the c
urrent experiment being c
arried out in the paper
11 / 20
Fine-Grained Table Taxonomy
Survey Question/Result Table
 Contain questionnaires of the
those questionnaires
survey and the results of
12 / 20
Fine-Grained Table Taxonomy
Example Table
 Show instances that introduce and emphasize something
that needs to be explained clearly
13 / 20
Fine-Grained Table Taxonomy
Procedure Tables
 Describe the sequence
,
methods
step, flow, or schedule of the
14 / 20
Fine-Grained Table Taxonomy
Experiment Setting Tables
 Describe items required for the experiment
 configurations, parameters, data, apparatus, etc.
15 / 20
Fine-Grained Table Taxonomy
Experiment Setting Tables
 accompanied with a summary describing the output of the
experiment
 Some are shown comparing the other results
16 / 20
Table Type Distribution
By IMRAD Taxonomy
3.7
%
20.2
%
74.6
%
1.5
%
Introduction
Methods
Result
s
Discussion
2.9% 5.0%
0.3%
3.3%
1.6%
14.9%
72.0%
Definition St
atistics Surv
ey Example
Procedure
Exp. Setting
Exp. Result
17 / 20
By Fine-Grained Taxonomy
Annotation
 2,380 and 2,324 tables that had agreed on labeling from
more that two annotators out of 2,500 tables
 Inter-Annotator Agreement
• = 0.64 for IMRAD annotation
• = 0.53 for fine-grained annotation
Outline
Introduction
Table Type Taxonomy
IMRAD-Base
d Fine-Graine
d
Classification
Conclusion
19 / 20
Experiment
A preliminary classification
 Only textual feature from Table
• Table Caption
• Table Reference Text
 Textual Information obtained from metadata of TableSeer [ ]
Settings
 DataSet
• 2,380 tables for IMRAD classification
• 2,324 tables for fine-grained classification
 10-fold Cross validation
 SVM and Decision Tree in Weka toolkit (default setting)
Experiment
20 / 20
C1: T1, T
2
C2: T3, T
4
W1 : appears T1 and T2
W2 : appears T1 and T3
21 / 20
Experiment
Result
Performance of IMRAD Classification by Features
Performance of Fine-grained Classification by Features
Features SVM Decision Tree
P R F P R F
Cap.(Baseline) 0.836 0.506 0.543 0.947 0.550 0.792
Ref. 0.875 0.705 0.761 0.930 0.639 0.73
Cap.+Ref. 0.967 0.784 0.866 0.938 0.746 0.831
Features SVM Decision Tree
P R F P R F
Cap.(Baseline) 0.627 0.333 0.397 0.522 0.271 0.302
Ref. 0.707 0.615 0.649 0.790 0.673 0.716
Cap.+Ref. 0.701 0.657 0.668 0.764 0.62 0.671
22 / 20
Experiment
Result
Performance of IMRAD Classification by Types
Type SVM Decision Tree
P R F P R F
Introduction 0.968 0.6 0.741 0.907 0.68 0.777
Methods 0.901 0.996 0.943 0.912 0.992 0.95
Results 1 1 1 1 1 1
Discussion 1 0.543 0.704 0.933 0.314 0.44
Macro Avg. 0.967 0.784 0.866 0.938 0.746 0.831
Micro Avg. 0.977 0.976 0.973 0.973 0.975 0.972
23 / 20
Experiment
Result
Performance of Fine-grained Classification by Types
Type SVM Decision Tree
P R F P R F
Definition 0.689 0.609 0.646 0.837 0.522 0.643
Statistics 0.699 0.879 0.779 0.898 0.681 0.775
Survey 0 0 0 0 0 0
Example 0.716 0.725 0.72 0.875 0.525 0.656
Procedure 0.9 0.486 0.632 1 0.649 0.787
Exp. Setting 0.905 0.9 0.902 0.74 0.963 0.837
Exp. Result 1 1 1 1 1 1
Macro Avg. 0.701 0.657 0.668 0.764 0.62 0.671
Micro Avg. 0.947 0.947 0.946 0.94 0.936 0.987
Outline
Introduction
Table Type Taxonomy
IMRAD-Base
d Fine-Graine
d
Classification
Conclusion
25 / 20
Conclusion
Introduced our study of table types and classifications
in scientific papers
 IMRAD-based Taxonomy
 Fine-Grained Taxonomy
Future Work
 Developing various features from table layout and contents

Mais conteúdo relacionado

Semelhante a Scientific table type classification in digital library (DocEng 2012)

Pasolli-Improving_active_learning_methods_using_spatial_information.ppt
Pasolli-Improving_active_learning_methods_using_spatial_information.pptPasolli-Improving_active_learning_methods_using_spatial_information.ppt
Pasolli-Improving_active_learning_methods_using_spatial_information.ppt
grssieee
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Maxim Kazantsev
 
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.pptHCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
MadeeshShaik
 
AS/A2 AQA Biology and Human Biology
AS/A2 AQA Biology and Human BiologyAS/A2 AQA Biology and Human Biology
AS/A2 AQA Biology and Human Biology
TeachersFirst
 

Semelhante a Scientific table type classification in digital library (DocEng 2012) (20)

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
 
Extending BM25 with multiple query operators
Extending BM25 with multiple query operatorsExtending BM25 with multiple query operators
Extending BM25 with multiple query operators
 
Efficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matchingEfficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matching
 
Lesson03
Lesson03Lesson03
Lesson03
 
Data collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysisData collection,tabulation,processing and analysis
Data collection,tabulation,processing and analysis
 
ADaM - Where Do I Start?
ADaM - Where Do I Start?ADaM - Where Do I Start?
ADaM - Where Do I Start?
 
ADaM
ADaMADaM
ADaM
 
ictir2016
ictir2016ictir2016
ictir2016
 
Error Propagation in Forest Planning Models
Error Propagation in Forest Planning ModelsError Propagation in Forest Planning Models
Error Propagation in Forest Planning Models
 
Lecture 1 - Overview.pptx
Lecture 1 - Overview.pptxLecture 1 - Overview.pptx
Lecture 1 - Overview.pptx
 
Pasolli-Improving_active_learning_methods_using_spatial_information.ppt
Pasolli-Improving_active_learning_methods_using_spatial_information.pptPasolli-Improving_active_learning_methods_using_spatial_information.ppt
Pasolli-Improving_active_learning_methods_using_spatial_information.ppt
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.pptHCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
 
CDISCs_SDTM_basics.ppt
CDISCs_SDTM_basics.pptCDISCs_SDTM_basics.ppt
CDISCs_SDTM_basics.ppt
 
Sophia He
Sophia HeSophia He
Sophia He
 
AS/A2 AQA Biology and Human Biology
AS/A2 AQA Biology and Human BiologyAS/A2 AQA Biology and Human Biology
AS/A2 AQA Biology and Human Biology
 
Intro to SPSS.ppt
Intro to SPSS.pptIntro to SPSS.ppt
Intro to SPSS.ppt
 
BRM Unit 3 Data Analysis-1.pptx
BRM Unit 3 Data Analysis-1.pptxBRM Unit 3 Data Analysis-1.pptx
BRM Unit 3 Data Analysis-1.pptx
 
Statistic report
Statistic reportStatistic report
Statistic report
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Scientific table type classification in digital library (DocEng 2012)

  • 1. Scientific Table Type Classification in Digital Library Seongchan Kim, Keejun Han, Ying Liu D ept. of Knowledge Service Engineering K AIST, Korea Soon Young Kim Dept. of Overseas Information KISTI, Korea Sept. 6, 2012 (3:35-3:55)
  • 2. Outline Introduction Table Type Taxonomy IMRAD-Base d Fine-Graine d Table Type Distribution Classification Conclusion
  • 3. Outline Introduction Table Type Taxonomy IMRAD-Base d Fine-Graine d Classification Conclusion
  • 4. 4 / 20 Introduction • Are there any special types of tables in papers in scientific papers? If yes? What are they?
  • 5. 5 / 20 Introduction • Are there any special types of tables in papers in scientific papers? If yes? What are they?
  • 6. Outline Introduction Table Type Taxonomy IMRAD-Base d Fine-Graine d Classification Conclusion
  • 7. 7 / 20 Table Type Taxonomy 2,500 tables randomly from 25 randomly selected scie ntific journals published by Springer from 2006 to 2010 Biomedical and Life Science, Chemistry and Materials S cience, Computer Science, Electrical Engineering, and Medicine TableSeer We found  IMRAD-based table taxonomy  Fine-grained table taxonomy
  • 8. IMRAD-Based Table Taxonomy Considering the structural position of tables within a document Table type is simply decided by the location of the table  E.g) if a table is in the introduction part of the paper  Introduction table Other functions Scientific Table Introduction Table Method Table Resul t Tabl e Discussio n Table 8 / 20
  • 9. Fine-Grained Table Taxonomy Table type is decided by table contents and purposes Scientific Table Definition Table Statistic Table Survey Table Example Table Procedure Table Experiment Setting T able Experiment Result T able 9 / 20
  • 10. Fine-Grained Table Taxonomy Definition Tables  Consist of defining terms and their explanations  Usually appears before the experiment 10 / 20
  • 11. Fine-Grained Table Taxonomy Statistics/Distribution Tables  Common statistical or distribution data  Not related with the c urrent experiment being c arried out in the paper 11 / 20
  • 12. Fine-Grained Table Taxonomy Survey Question/Result Table  Contain questionnaires of the those questionnaires survey and the results of 12 / 20
  • 13. Fine-Grained Table Taxonomy Example Table  Show instances that introduce and emphasize something that needs to be explained clearly 13 / 20
  • 14. Fine-Grained Table Taxonomy Procedure Tables  Describe the sequence , methods step, flow, or schedule of the 14 / 20
  • 15. Fine-Grained Table Taxonomy Experiment Setting Tables  Describe items required for the experiment  configurations, parameters, data, apparatus, etc. 15 / 20
  • 16. Fine-Grained Table Taxonomy Experiment Setting Tables  accompanied with a summary describing the output of the experiment  Some are shown comparing the other results 16 / 20
  • 17. Table Type Distribution By IMRAD Taxonomy 3.7 % 20.2 % 74.6 % 1.5 % Introduction Methods Result s Discussion 2.9% 5.0% 0.3% 3.3% 1.6% 14.9% 72.0% Definition St atistics Surv ey Example Procedure Exp. Setting Exp. Result 17 / 20 By Fine-Grained Taxonomy Annotation  2,380 and 2,324 tables that had agreed on labeling from more that two annotators out of 2,500 tables  Inter-Annotator Agreement • = 0.64 for IMRAD annotation • = 0.53 for fine-grained annotation
  • 18. Outline Introduction Table Type Taxonomy IMRAD-Base d Fine-Graine d Classification Conclusion
  • 19. 19 / 20 Experiment A preliminary classification  Only textual feature from Table • Table Caption • Table Reference Text  Textual Information obtained from metadata of TableSeer [ ] Settings  DataSet • 2,380 tables for IMRAD classification • 2,324 tables for fine-grained classification  10-fold Cross validation  SVM and Decision Tree in Weka toolkit (default setting)
  • 20. Experiment 20 / 20 C1: T1, T 2 C2: T3, T 4 W1 : appears T1 and T2 W2 : appears T1 and T3
  • 21. 21 / 20 Experiment Result Performance of IMRAD Classification by Features Performance of Fine-grained Classification by Features Features SVM Decision Tree P R F P R F Cap.(Baseline) 0.836 0.506 0.543 0.947 0.550 0.792 Ref. 0.875 0.705 0.761 0.930 0.639 0.73 Cap.+Ref. 0.967 0.784 0.866 0.938 0.746 0.831 Features SVM Decision Tree P R F P R F Cap.(Baseline) 0.627 0.333 0.397 0.522 0.271 0.302 Ref. 0.707 0.615 0.649 0.790 0.673 0.716 Cap.+Ref. 0.701 0.657 0.668 0.764 0.62 0.671
  • 22. 22 / 20 Experiment Result Performance of IMRAD Classification by Types Type SVM Decision Tree P R F P R F Introduction 0.968 0.6 0.741 0.907 0.68 0.777 Methods 0.901 0.996 0.943 0.912 0.992 0.95 Results 1 1 1 1 1 1 Discussion 1 0.543 0.704 0.933 0.314 0.44 Macro Avg. 0.967 0.784 0.866 0.938 0.746 0.831 Micro Avg. 0.977 0.976 0.973 0.973 0.975 0.972
  • 23. 23 / 20 Experiment Result Performance of Fine-grained Classification by Types Type SVM Decision Tree P R F P R F Definition 0.689 0.609 0.646 0.837 0.522 0.643 Statistics 0.699 0.879 0.779 0.898 0.681 0.775 Survey 0 0 0 0 0 0 Example 0.716 0.725 0.72 0.875 0.525 0.656 Procedure 0.9 0.486 0.632 1 0.649 0.787 Exp. Setting 0.905 0.9 0.902 0.74 0.963 0.837 Exp. Result 1 1 1 1 1 1 Macro Avg. 0.701 0.657 0.668 0.764 0.62 0.671 Micro Avg. 0.947 0.947 0.946 0.94 0.936 0.987
  • 24. Outline Introduction Table Type Taxonomy IMRAD-Base d Fine-Graine d Classification Conclusion
  • 25. 25 / 20 Conclusion Introduced our study of table types and classifications in scientific papers  IMRAD-based Taxonomy  Fine-Grained Taxonomy Future Work  Developing various features from table layout and contents