SlideShare uma empresa Scribd logo
1 de 27
Structural analysis of documents Functional Extension Parser (FEP) Günter Mühlberger University Innsbruck Library (UIBK)
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT EVA/MINERVA 12 th  Nov. 2008
Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object]
Benefits (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Benefits (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The FEP Core ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT EVA/MINERVA 12 th  Nov. 2008
Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Results on Evaluation Set Recall Precision F-measure Running text 0,99 0,98 0,98 Footnotes 0,83 0,89 0,86 Page numbers 0,97 1 0,98 Running titles 0,97 1 0,98 Heading 0,85 0,80 0,82 Signature marks 0,68 0,89 0,77
Roadmap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Business offers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
IMPACT EVA/MINERVA 12 th  Nov. 2008
Results: TOC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IMPACT EVA/MINERVA 12 th  Nov. 2008
Thank you for your attention!

Mais conteúdo relacionado

Semelhante a BL Demo Day - July2011 - (8) IMPACT Functional Extension Parser

Introduction to microsoft office
Introduction to microsoft officeIntroduction to microsoft office
Introduction to microsoft officeTimesRide
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
IMPACT Final Conference - USAL - Text line and word segmentation
IMPACT Final Conference - USAL - Text line and word segmentationIMPACT Final Conference - USAL - Text line and word segmentation
IMPACT Final Conference - USAL - Text line and word segmentationIMPACT Centre of Competence
 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationDarian Pruitt
 
Introduction to Microsoft Office
Introduction to Microsoft OfficeIntroduction to Microsoft Office
Introduction to Microsoft OfficeTimesRide
 
PhD Presentation
PhD PresentationPhD Presentation
PhD Presentationmskayed
 
What LSPs can do to support post-editors for addressing pain-points in nmt
What LSPs can do to support post-editors for addressing pain-points in nmtWhat LSPs can do to support post-editors for addressing pain-points in nmt
What LSPs can do to support post-editors for addressing pain-points in nmttoru shishido
 
Full text search
Full text searchFull text search
Full text searchdeleteman
 
Excel Beginners.docx
Excel Beginners.docxExcel Beginners.docx
Excel Beginners.docxIrishCaimbon
 
RPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usageRPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usageGEBS Reporting
 
Standard Grade Administration - Software Applications
Standard Grade Administration - Software ApplicationsStandard Grade Administration - Software Applications
Standard Grade Administration - Software ApplicationsMusselburgh Grammar School
 
Style based templates_demo_diannetheeditor
Style based templates_demo_diannetheeditorStyle based templates_demo_diannetheeditor
Style based templates_demo_diannetheeditorDianne Dickinson
 
wordprocessing-150308082138-conversion-gate01.pdf
wordprocessing-150308082138-conversion-gate01.pdfwordprocessing-150308082138-conversion-gate01.pdf
wordprocessing-150308082138-conversion-gate01.pdfVikasTuwar1
 

Semelhante a BL Demo Day - July2011 - (8) IMPACT Functional Extension Parser (20)

TAUS QE Summit 2017 eBay EN-DE MT Pilot
TAUS QE Summit 2017   eBay EN-DE MT PilotTAUS QE Summit 2017   eBay EN-DE MT Pilot
TAUS QE Summit 2017 eBay EN-DE MT Pilot
 
word.pptx
word.pptxword.pptx
word.pptx
 
Introduction to microsoft office
Introduction to microsoft officeIntroduction to microsoft office
Introduction to microsoft office
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
IMPACT Final Conference - USAL - Text line and word segmentation
IMPACT Final Conference - USAL - Text line and word segmentationIMPACT Final Conference - USAL - Text line and word segmentation
IMPACT Final Conference - USAL - Text line and word segmentation
 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference Annotation
 
Microsoft Office 2007
Microsoft Office 2007Microsoft Office 2007
Microsoft Office 2007
 
PPT-1.pptx
PPT-1.pptxPPT-1.pptx
PPT-1.pptx
 
Introduction to Microsoft Office
Introduction to Microsoft OfficeIntroduction to Microsoft Office
Introduction to Microsoft Office
 
PhD Presentation
PhD PresentationPhD Presentation
PhD Presentation
 
What LSPs can do to support post-editors for addressing pain-points in nmt
What LSPs can do to support post-editors for addressing pain-points in nmtWhat LSPs can do to support post-editors for addressing pain-points in nmt
What LSPs can do to support post-editors for addressing pain-points in nmt
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Full text search
Full text searchFull text search
Full text search
 
Europe PMC Section Tagger
Europe PMC Section TaggerEurope PMC Section Tagger
Europe PMC Section Tagger
 
Excel Beginners.docx
Excel Beginners.docxExcel Beginners.docx
Excel Beginners.docx
 
RPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usageRPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usage
 
Standard Grade Administration - Software Applications
Standard Grade Administration - Software ApplicationsStandard Grade Administration - Software Applications
Standard Grade Administration - Software Applications
 
Style based templates_demo_diannetheeditor
Style based templates_demo_diannetheeditorStyle based templates_demo_diannetheeditor
Style based templates_demo_diannetheeditor
 
wordprocessing-150308082138-conversion-gate01.pdf
wordprocessing-150308082138-conversion-gate01.pdfwordprocessing-150308082138-conversion-gate01.pdf
wordprocessing-150308082138-conversion-gate01.pdf
 

Mais de IMPACT Centre of Competence

Mais de IMPACT Centre of Competence (20)

Session6 01.helmut schmid
Session6 01.helmut schmidSession6 01.helmut schmid
Session6 01.helmut schmid
 
Session1 03.hsian-an wang
Session1 03.hsian-an wangSession1 03.hsian-an wang
Session1 03.hsian-an wang
 
Session7 03.katrien depuydt
Session7 03.katrien depuydtSession7 03.katrien depuydt
Session7 03.katrien depuydt
 
Session7 02.peter kiraly
Session7 02.peter kiralySession7 02.peter kiraly
Session7 02.peter kiraly
 
Session6 04.giuseppe celano
Session6 04.giuseppe celanoSession6 04.giuseppe celano
Session6 04.giuseppe celano
 
Session6 03.sandra young
Session6 03.sandra youngSession6 03.sandra young
Session6 03.sandra young
 
Session6 02.jeremi ochab
Session6 02.jeremi ochabSession6 02.jeremi ochab
Session6 02.jeremi ochab
 
Session5 04.evangelos varthis
Session5 04.evangelos varthisSession5 04.evangelos varthis
Session5 04.evangelos varthis
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 03.juri opitz
Session2 03.juri opitzSession2 03.juri opitz
Session2 03.juri opitz
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

BL Demo Day - July2011 - (8) IMPACT Functional Extension Parser

  • 1. Structural analysis of documents Functional Extension Parser (FEP) Günter Mühlberger University Innsbruck Library (UIBK)
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Results on Evaluation Set Recall Precision F-measure Running text 0,99 0,98 0,98 Footnotes 0,83 0,89 0,86 Page numbers 0,97 1 0,98 Running titles 0,97 1 0,98 Heading 0,85 0,80 0,82 Signature marks 0,68 0,89 0,77
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. IMPACT EVA/MINERVA 12 th Nov. 2008
  • 25.
  • 26.
  • 27. Thank you for your attention!

Notas do Editor

  1. Down the Islands. A voyage to the Caribbees ... With illustrations. (1888) BL Demonstrator Set, [prima ids 465024-465278] William Agnew Paton The images TOC1 input.png and TOC2 input.png show the embedded fulltext (OCR output) within the pdf output of ABBYY Finereader. It is interessting to see that in "TOC1 input.png" there are 3 errors from the ocr analysis which have a strong impact on quality of the fep analysis results. a) The link to pagenumbers from the first two TOC entries, Introduction and Chapter I, are not detected by the OCR. b) The third Toc entry (Chapter II) links according to the OCR to the page labelled with the pagenumber 2 (instead of 22) These errors have the following impact on the analysis (which can be seen on Image TOC1 output.png): a) The entry Introduction is missed completely b) The second toc entry ends after the two centered lines and has no link to the book content c) the second part of the second toc entry is grouped together with the third toc entry and has a wrong link to pagenumber 2 instead of 22. d) The fourth toc entry contains no ocr errors and is therefore grouped and also linked correctly. The seccond toc page (TOC2 input.png) does not contain any ocr errors and also the analysis results of the fep are correct. Concerning the TOC reconstruction the fep performs as follows: 25 TOC entries in total: 1 TOC entry was missed, 2 TOC entries are grouped incorrectly 1 TOC entry has no link 1 TOC entry has a wrong link 22 TOC entries are completely correct. The Images Example1.png and Example2.png show the results of the logical structure analysis of the fep. Correct labels are marked with a green, wrong labels with a red border.