SlideShare uma empresa Scribd logo
1 de 15
The IMPACT Interoperability Framework: 
Workflows for OCR and beyond 
Clemens Neudecker, KB National Library of the Netherlands 
2nd IMPACT Conference, British Library, London 24/25 October 2011
Background 
 > 20 individual software components for specific challenges 
 Prototyping new algorithms, improving commercial solutions 
 Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) 
 Extensible with 3rd party applications 
 IMPACT Interoperability Framework (IIF)
Architecture 
 Java 
 Web Services 
 Apache 
 Taverna 
Open Source available on https://github.com/impactcentre 
Free Hackathon 14/15 November, University of Manchester 
http://impact-mygrid-taverna-hackathon.wikispaces.com/
Integration 
 Only requirement: 
command line executable 
 Generic command line wrapper 
produces web service 
 Web service exposed as 
workflow module with 
documentation 
 Quick & easy integration: 
developers can focus on their application and have to worry 
less about integration = higher quality software
Workflows 
 OCR workflow = 
data pipeline 
 Building blocks = 
processing modules 
(nodes) 
 Integration = 
interaction between 
nodes (mashups) 
 Collaboration with
Evaluation features 
 Text comparison of result with ground truth, 
using Levenshtein distance method 
 Word evaluation (with reading order) 
 Layout based comparison of result with ground truth, 
using the Page Analysis And Ground Truth Elements Framework
Community 
 Web2.0 style 
workflow registry 
 Ready-to-use and 
documented resources 
 Community of experts 
 Sharing of experiments 
and know how
Local client: Taverna Workbench 
 Background: 
BioSciences 
 Developed and 
maintained by 
myGrid, UK 
 Open source 
 GUI for design and execution of web services & workflows
Remote client: Portal 
 SOAP/REST API 
 Remote execution of web services & workflows
Results Repository 
Custom service for IMPACT: 
 automatic storage of 
workflow outputs and 
provenance via WebDAV 
 Fully interoperable, 
since HTTP-based 
 Configurable storage of 
result sets 
 Create reports using POI
Scalability 
 Central ESB proxy 
manages multiple 
service copies 
 Process parallelization, 
Load distribution, 
Fail over, Security 
 Served >2M requests 
 Throughput improvements of 94% with every additional instance 
 Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”)
Outlook 
 Online service for testing/evaluation 
 Specification & Guidelines 
 Extending the scope: 
Workflows for linguistic analysis: CLARIN 
Workflows for preservation: SCAPE 
 Even better scalability: Map/Reduce 
 Supported by a community of developers & practitioners
xkcd.com/688 
“Anyway, the thing about progress is 
that is always seems greater than it really is.” 
Ludwig Wittgenstein, Philosophical Investigations 
(quoting Johann Nestroy)

Mais conteúdo relacionado

Mais procurados

DSD-INT 2021 TVA and MongoDb Archive - Miller
DSD-INT 2021 TVA and MongoDb Archive - MillerDSD-INT 2021 TVA and MongoDb Archive - Miller
DSD-INT 2021 TVA and MongoDb Archive - MillerDeltares
 
Intro to FileCatalyst Direct v3.7
Intro to FileCatalyst Direct v3.7Intro to FileCatalyst Direct v3.7
Intro to FileCatalyst Direct v3.7FileCatalyst
 
Building RESTFUL APIs with Spring Webflux
Building RESTFUL APIs with Spring WebfluxBuilding RESTFUL APIs with Spring Webflux
Building RESTFUL APIs with Spring WebfluxKnoldus Inc.
 
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...BIOVIA
 
Cody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_PosterCody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_PosterCody Zeng
 
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...Marius Politze
 
(ATS6-APP05) Deploying Contur ELN to large organizations
(ATS6-APP05) Deploying Contur ELN to large organizations(ATS6-APP05) Deploying Contur ELN to large organizations
(ATS6-APP05) Deploying Contur ELN to large organizationsBIOVIA
 
Risk Management in Retail with Stream Processing
Risk Management in Retail with Stream ProcessingRisk Management in Retail with Stream Processing
Risk Management in Retail with Stream Processingconfluent
 
DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...
DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...
DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...Deltares
 
Continuous integration
Continuous integrationContinuous integration
Continuous integrationLior Tal
 
DSD-INT 2021 Delft-FEWS new developments - 1 of 3 - Boot
DSD-INT 2021 Delft-FEWS new developments - 1 of 3 - BootDSD-INT 2021 Delft-FEWS new developments - 1 of 3 - Boot
DSD-INT 2021 Delft-FEWS new developments - 1 of 3 - BootDeltares
 
Introducing Omeka for Digital Projects
Introducing Omeka forDigital ProjectsIntroducing Omeka forDigital Projects
Introducing Omeka for Digital ProjectsSteven MacCall
 
Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Robert Metzger
 
Microservices in Go with Go kit
Microservices in Go with Go kitMicroservices in Go with Go kit
Microservices in Go with Go kitShiju Varghese
 
OPEN'17_2_Customer Experience_Essent
OPEN'17_2_Customer Experience_EssentOPEN'17_2_Customer Experience_Essent
OPEN'17_2_Customer Experience_EssentKangaroot
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101Huy Vo
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained Markus Eisele
 

Mais procurados (20)

DSD-INT 2021 TVA and MongoDb Archive - Miller
DSD-INT 2021 TVA and MongoDb Archive - MillerDSD-INT 2021 TVA and MongoDb Archive - Miller
DSD-INT 2021 TVA and MongoDb Archive - Miller
 
Intro to FileCatalyst Direct v3.7
Intro to FileCatalyst Direct v3.7Intro to FileCatalyst Direct v3.7
Intro to FileCatalyst Direct v3.7
 
Spring cloud
Spring cloudSpring cloud
Spring cloud
 
Building RESTFUL APIs with Spring Webflux
Building RESTFUL APIs with Spring WebfluxBuilding RESTFUL APIs with Spring Webflux
Building RESTFUL APIs with Spring Webflux
 
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...
 
Cody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_PosterCody_Zeng_HPE_Intern_Poster
Cody_Zeng_HPE_Intern_Poster
 
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
 
(ATS6-APP05) Deploying Contur ELN to large organizations
(ATS6-APP05) Deploying Contur ELN to large organizations(ATS6-APP05) Deploying Contur ELN to large organizations
(ATS6-APP05) Deploying Contur ELN to large organizations
 
Risk Management in Retail with Stream Processing
Risk Management in Retail with Stream ProcessingRisk Management in Retail with Stream Processing
Risk Management in Retail with Stream Processing
 
DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...
DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...
DSD-INT 2014 - Delft-FEWS Users Meeting - Recent developments in FEWS, Gerben...
 
Eco system apps
Eco system appsEco system apps
Eco system apps
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
DSD-INT 2021 Delft-FEWS new developments - 1 of 3 - Boot
DSD-INT 2021 Delft-FEWS new developments - 1 of 3 - BootDSD-INT 2021 Delft-FEWS new developments - 1 of 3 - Boot
DSD-INT 2021 Delft-FEWS new developments - 1 of 3 - Boot
 
Introducing Omeka for Digital Projects
Introducing Omeka forDigital ProjectsIntroducing Omeka forDigital Projects
Introducing Omeka for Digital Projects
 
Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)
 
Microservices in Go with Go kit
Microservices in Go with Go kitMicroservices in Go with Go kit
Microservices in Go with Go kit
 
OPEN'17_2_Customer Experience_Essent
OPEN'17_2_Customer Experience_EssentOPEN'17_2_Customer Experience_Essent
OPEN'17_2_Customer Experience_Essent
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
moharnab-ft
moharnab-ftmoharnab-ft
moharnab-ft
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained
 

Semelhante a The IMPACT Interoperability Framework - Workflows for OCR and beyond

Vijay Oscon
Vijay OsconVijay Oscon
Vijay Osconvijayrvr
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software SuitemyGrid team
 
Vijay Mix Presentation
Vijay Mix PresentationVijay Mix Presentation
Vijay Mix Presentationvijayrvr
 
Teched India Vijay Interop Track
Teched India Vijay Interop TrackTeched India Vijay Interop Track
Teched India Vijay Interop Trackvijayrvr
 
.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los AngelesVMware Tanzu
 
Ignacio design and building of iaa s clouds
Ignacio design and building of iaa s cloudsIgnacio design and building of iaa s clouds
Ignacio design and building of iaa s cloudsEuroCloud
 
Ignacio design and building of iaa s clouds
Ignacio design and building of iaa s cloudsIgnacio design and building of iaa s clouds
Ignacio design and building of iaa s cloudsEuroCloud
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramièreconfluent
 
2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...
2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...
2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...eMadrid network
 
GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...
GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...
GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...Mohamed Tawfik
 
Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
 
What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5confluent
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshopChristina Lin
 
What's New in LabVIEW 2017
What's New in LabVIEW 2017What's New in LabVIEW 2017
What's New in LabVIEW 2017DMC, Inc.
 
Educon 2012- On the Design of Remote Laboratories
Educon 2012- On the Design of Remote LaboratoriesEducon 2012- On the Design of Remote Laboratories
Educon 2012- On the Design of Remote LaboratoriesMohamed Tawfik
 
OCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes LaunchOCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes LaunchPT Datacomm Diangraha
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Centre of Competence
 
Red Hat and kubernetes: awesome stuff coming your way
Red Hat and kubernetes:  awesome stuff coming your wayRed Hat and kubernetes:  awesome stuff coming your way
Red Hat and kubernetes: awesome stuff coming your wayJohannes Brännström
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
 

Semelhante a The IMPACT Interoperability Framework - Workflows for OCR and beyond (20)

Vijay Oscon
Vijay OsconVijay Oscon
Vijay Oscon
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
 
Vijay Mix Presentation
Vijay Mix PresentationVijay Mix Presentation
Vijay Mix Presentation
 
Teched India Vijay Interop Track
Teched India Vijay Interop TrackTeched India Vijay Interop Track
Teched India Vijay Interop Track
 
.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles
 
Ignacio design and building of iaa s clouds
Ignacio design and building of iaa s cloudsIgnacio design and building of iaa s clouds
Ignacio design and building of iaa s clouds
 
Ignacio design and building of iaa s clouds
Ignacio design and building of iaa s cloudsIgnacio design and building of iaa s clouds
Ignacio design and building of iaa s clouds
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
 
2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...
2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...
2012 04-18 (educon2012) emadrid uned on design remote laboratories study lab ...
 
GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...
GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...
GOLC 2012 - On Standardizing the Management of LabVIEW-based Remote Laborator...
 
Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux
 
What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshop
 
What's New in LabVIEW 2017
What's New in LabVIEW 2017What's New in LabVIEW 2017
What's New in LabVIEW 2017
 
Aneka
AnekaAneka
Aneka
 
Educon 2012- On the Design of Remote Laboratories
Educon 2012- On the Design of Remote LaboratoriesEducon 2012- On the Design of Remote Laboratories
Educon 2012- On the Design of Remote Laboratories
 
OCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes LaunchOCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes Launch
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
Red Hat and kubernetes: awesome stuff coming your way
Red Hat and kubernetes:  awesome stuff coming your wayRed Hat and kubernetes:  awesome stuff coming your way
Red Hat and kubernetes: awesome stuff coming your way
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
 

Mais de cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenzcneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-Dcneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspaperscneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Miningcneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltextecneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minutencneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlincneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 

Mais de cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

The IMPACT Interoperability Framework - Workflows for OCR and beyond

  • 1. The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherlands 2nd IMPACT Conference, British Library, London 24/25 October 2011
  • 2. Background  > 20 individual software components for specific challenges  Prototyping new algorithms, improving commercial solutions  Different frameworks (C, C++, Java, etc.), platforms (Win/Linux)  Extensible with 3rd party applications  IMPACT Interoperability Framework (IIF)
  • 3. Architecture  Java  Web Services  Apache  Taverna Open Source available on https://github.com/impactcentre Free Hackathon 14/15 November, University of Manchester http://impact-mygrid-taverna-hackathon.wikispaces.com/
  • 4. Integration  Only requirement: command line executable  Generic command line wrapper produces web service  Web service exposed as workflow module with documentation  Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software
  • 5. Workflows  OCR workflow = data pipeline  Building blocks = processing modules (nodes)  Integration = interaction between nodes (mashups)  Collaboration with
  • 6.
  • 7. Evaluation features  Text comparison of result with ground truth, using Levenshtein distance method  Word evaluation (with reading order)  Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework
  • 8. Community  Web2.0 style workflow registry  Ready-to-use and documented resources  Community of experts  Sharing of experiments and know how
  • 9. Local client: Taverna Workbench  Background: BioSciences  Developed and maintained by myGrid, UK  Open source  GUI for design and execution of web services & workflows
  • 10. Remote client: Portal  SOAP/REST API  Remote execution of web services & workflows
  • 11. Results Repository Custom service for IMPACT:  automatic storage of workflow outputs and provenance via WebDAV  Fully interoperable, since HTTP-based  Configurable storage of result sets  Create reports using POI
  • 12. Scalability  Central ESB proxy manages multiple service copies  Process parallelization, Load distribution, Fail over, Security  Served >2M requests  Throughput improvements of 94% with every additional instance  Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”)
  • 13. Outlook  Online service for testing/evaluation  Specification & Guidelines  Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE  Even better scalability: Map/Reduce  Supported by a community of developers & practitioners
  • 14.
  • 15. xkcd.com/688 “Anyway, the thing about progress is that is always seems greater than it really is.” Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy)