SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
Outline

       IMPACT final event – The Hague – 26 June 2012          1. Institutional background
                                                              2. IMPACT test case
       Metadata extraction from title pages                   3.     Strategic goals
                                                              4.     Preliminary work
       Evaluation of the FEP pilot
                                                              5.     Results
       at the German National Library
                                                              6.     Perspective

       Christa Schöning-Walter

1                                                      2 | IMPACT event | June 26, 2012 |




       The German National Library (DNB)                      The German National Library (DNB)
       – some facts and figures (I)                           – some facts and figures (II)

       − Legal deposit:                                       − Collection size             (January 2012):   27 million media units
         Collecting, cataloguing, archiving
         and making available to the                          − Daily input: 1.500 physical units (each with 2 copies)
         general public all German and                        − Since 2006:
         German-language publications,
                                                                Collection mandate includes non-physical media
         publications about Germany etc                         (online publications)
         from 1913
                                                                    − DNBG = Law regarding the German National Library
       − Bibliographic services:                                    − PflAV = Legal Deposit Regulation
          − National Bibliography
          − Authority files                                   − Since 2009:
                                                                Considerations on and implementation of automated
          − Bibliographic standards
                                                                cataloguing processes
       − 2 sites: Leipzig, Frankfurt am Main
3 | IMPACT event | June 26, 2012 |                     4 | IMPACT event | June 26, 2012 |




                                                                                                                                       1
Target of the IMPACT scenario                                        Starting point


       Opening questions (summer 2011):                                     Since January 2012:
                                                                            − Experimental application studies in collaboration with
       − Can metadata extraction from title pages successfully                 the University of Innsbruck
         be done by a rule engine in case of simple structured
         monographic publications?                                          − Using the rule-based exploitation features of FEP
                                                                              (Functional Extension Parser)
       − Is this useful in order to accelerate the cataloguing
         processes if no machine-readable metadata from                     What is FEP?
         other sources is available?
                                                                            − Software platform for the purpose of analysing the
                                     Test case: Theses                        logical structure of documents

                                                                            − Developed within IMPACT work package EE4
                                     − 14.000 print units annually
                                                                              (Goal: enrichment of OCR output with structure
                                     − simple structure !?
                                                                              information)
5 | IMPACT event | June 26, 2012 |                                   6 | IMPACT event | June 26, 2012 |




       Strategic goals                                                      Conceptual design of the workflow

                                                                            Example: http://d-nb.info/1017138931
       In particular:
                                                                                     Accession             Repository              FEP results
       − Making descriptive cataloguing less time-consuming and
                                                                                   (Printed media
         literature processing of printed media faster by                               units)             OAI-Harvester   Cataloguing
           − Partial digitisation
           − Automated metadata extraction
                                                                                  Bibliographic                             Qualitiy
           − Result transfer into the bibliographic record                                                Data Provider      check
                                                                                     record
           − Quality check and completion of cataloguing by the                                                             Statistics
             staff
                                                                                                    a
       Generally:                                                                  Service partner
                                                                                   Scan service           OCR output/
                                                                                                                              Stack
       − Gaining experience in the area of automated metadata                       (title page +          Indexing
         extraction / automated cataloguing                                              ToC)
7 | IMPACT event | June 26, 2012 |                                   8 | IMPACT event | June 26, 2012 |




                                                                                                                                                 2
The Objective: Automated
                                                                                            exploitation of descriptive bibliographic data

                                                                                                                             − Specification, implementation,
                                                                                                                               evaluation and gradually
                                                                                                                               improvement of
                                                                                                                                − Appropriate structure
                                                                                                                                  types
                                                                                                                                − Dictionaries
                                                                                                                                  (controlled vocabulary,
                                              The idea:                                                                           indicating keywords,
                                              Taking bibliographic data over                                                      abbreviations etc)
                                              from metadata mining tools.                                                       − Expert rules
                                                                                                                                − Etc

                                                                                                                                  Illustration: University of Innsbruck
9 | IMPACT event | June 26, 2012 |                                                   10 |




       Preliminary work (I)                                                                 Preliminary work (II)


       − Specification of the bibliographic statements to be mined                          − Going over some hundreds of title pages of theses
         from the title page                                                                  (scans from 2009-2011 + documents from daily business)
             Attribute                Value
                                                                                            − Exploring typical structural patterns / regularities etc,
             Publication year         2010                                                    such as                    Examples of indicating phrases to find out
             Language code            /1ger                                                    − Prefixes                the creator:
                                      /1eng                                                                              von
                                                                                               − Phrases                 von <Verfasser> vorgelegte Dissertation
             Creator                  <last name>,<first name>
                                                                                               − Notation                von Herrn/Frau:
             Title                    <full title>:<additional title information>/                                       vorgelegt von(:)
                                                                                               − Position                vorgelegt JJJJ von
                                      <author statement>
                                                                                                                            vorgelegt dem Fachbereich ... von
             Size                     30 cm                                                                                 Name:
                                      21 cm                                                                                 Name des Verfassers:
             Theses statement         <city name>, <corporate body name>,
                                                                                                     Expert rules           Name der Verfasserin:
                                                                                                                            verfasst von(:)
                                      <type of publication>,<year of graduation>
                                                                                                                            eingereicht von
11 | IMPACT event | June 26, 2012 |                                                  12 | IMPACT event | June 26, 2012 |
                                                                                                                            ...




                                                                                                                                                                          3
Preliminary work (III)                                                                      Preliminary work (IV)

                                               Theses statement items (examples):
                                               …
       Choosing / preparing                    Berlin, ESCP Europe Wirtschaftshochschule          − Setting up a sample of documents for evaluation
       dictionaries for tagging,               Berlin, Freie Univ.                                  purposes:
       matching and mapping                    Berlin, Humboldt-Univ.
                                               Berlin, Steinbeis-Hochsch.
                                                                                                     − 1.000 theses from several universities
       purposes:                               Berlin, Techn. Univ.                                  − Publication year: 2010 – 2011
                                               Berlin, Univ. der Künste
       − List of universities                  …
                                                                                                     − Different dimensions (A- and B-size)
         which have the right to                                                                     − Scans: 300 dpi, bitonal
         graduation (identifying               Academic grades (examples):                           − Transfer format: Pdf (in future: XML files)
         the corporate bodies)                 …
                                               M.A.   Master of Arts / Magister Artium            − Ground truth determination:
       − Name Authority File                   M.Sc.  Master of Science
                                               M.Eng. Master of Engineering                          − Manually region tagging on image files
         subset (identifying
                                               LL.M.  Master of Laws / Legum Magister                  (done in Vietnam by the Aletheia tool)
         personal names)                       M.F.A. Master of Fine Arts
                                               M.Mus. Master of Music
       − List of academic grades               M.Ed.  Master of Education
13 | IMPACT event | June 26, 2012 |            …                                           14 | IMPACT event | June 26, 2012 |




       Document processing in brief                                                                Results


       − Database: Storage of all                                                                 Second test phase with a revised list of universities                       (June 2012):
         available information
         (OCR output, automatically
         or manually produced
         annotations, dictionaries,
         facts etc)

       − Input of expert rules

       − Rule engine: Stepwise
         proceeding taking
         intermediary results into
         account
       Illustration: University of Innsbruck                                                     (1) total conformity            (2) complete title + noise (just to be deleted by the staff)
15 | IMPACT event | June 26, 2012 |                                                        16 | IMPACT event | June 26, 2012 |




                                                                                                                                                                                                4
Forecast: Feasibility study                                          New ideas


       − Technical and organisational requirements:                         − Extraction of defined structures from the body of
         Operational aspects, technical workflow, interfaces etc              monographic publications, such as table of contents,
                                                                              abstracts, pure text (without any introductory remarks,
       − Further functional enhancements needed:                              footers, references etc)
          − Dictionary maintenance: Expanding controlled
            vocabulary, sorting out unsuitable items etc                    Target:
          − Taking additional facts into account: Ground truth etc          − Improvement of the results of current automated
                                                                              subject cataloguing projects, such as
          − Additional expert rules (?)
                                                                               − Thematic classification by machine learning
          − Additional functions: Language guesser, document                     techniques
            size etc
                                                                               − Subject headings obtainment by text analysis
          − Customising FEP (?)                                                  techniques
                                                                                                   Reducing the noise via preceding
                                                                                                   structure analysis processes
17 | IMPACT event | June 26, 2012 |                                  18 | IMPACT event | June 26, 2012 |




       Thank you for your attention.

       Christa Schöning-Walter                   Sandra Hamm
       Staff position ’Automated Cataloguing’    Project leader
       c.schoening@dnb.de                        s.hamm@dnb.de


       German National Library
       Digital Services
       Frankfurt am Main, Germany



19 | IMPACT event | June 26, 2012 |




                                                                                                                                        5

Mais conteúdo relacionado

Semelhante a IMPACT Final Event 26-06-2012 - Automated metadata extraction from title pages. Report on the FEP pilot at the German National Library by Christa Schöning-Walter (DNB)

Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]IMPACT Centre of Competence
 
SplunkLive! San Francisco Dec 2012 - Intuit
SplunkLive! San Francisco Dec 2012 - IntuitSplunkLive! San Francisco Dec 2012 - Intuit
SplunkLive! San Francisco Dec 2012 - IntuitSplunk
 
Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...MOCA Platform
 
Presentation of SCAPE Project
Presentation of SCAPE ProjectPresentation of SCAPE Project
Presentation of SCAPE ProjectSCAPE Project
 
Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics Khaled Tumbi
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyAnatoly Levenchuk
 
Who cares about Software Process Modelling? A First Investigation about the P...
Who cares about Software Process Modelling? A First Investigation about the P...Who cares about Software Process Modelling? A First Investigation about the P...
Who cares about Software Process Modelling? A First Investigation about the P...Daniel Mendez
 
SpagoBI Webinar @ OW2
SpagoBI Webinar @ OW2SpagoBI Webinar @ OW2
SpagoBI Webinar @ OW2SpagoWorld
 
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBI
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBIGeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBI
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBIACSG Section Montréal
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...ICSM 2011
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1stat
 
Ac2017 8. metrics forprivacysafety-notes
Ac2017   8. metrics forprivacysafety-notesAc2017   8. metrics forprivacysafety-notes
Ac2017 8. metrics forprivacysafety-notesNesma
 
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...IMPACT Centre of Competence
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Modeling Requirements for the Management of Electronic Records
Modeling Requirements for the Management of Electronic RecordsModeling Requirements for the Management of Electronic Records
Modeling Requirements for the Management of Electronic RecordsMichele Chinosi
 
Towards an INSPIREd e-reporting & INSPIRE priority datasets in Slovakia
Towards an INSPIREd e-reporting & INSPIRE priority datasets in SlovakiaTowards an INSPIREd e-reporting & INSPIRE priority datasets in Slovakia
Towards an INSPIREd e-reporting & INSPIRE priority datasets in SlovakiaMartin Tuchyna
 

Semelhante a IMPACT Final Event 26-06-2012 - Automated metadata extraction from title pages. Report on the FEP pilot at the German National Library by Christa Schöning-Walter (DNB) (20)

Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
 
SplunkLive! San Francisco Dec 2012 - Intuit
SplunkLive! San Francisco Dec 2012 - IntuitSplunkLive! San Francisco Dec 2012 - Intuit
SplunkLive! San Francisco Dec 2012 - Intuit
 
Goobi
GoobiGoobi
Goobi
 
Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...
 
Presentation of SCAPE Project
Presentation of SCAPE ProjectPresentation of SCAPE Project
Presentation of SCAPE Project
 
Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering Methodology
 
Who cares about Software Process Modelling? A First Investigation about the P...
Who cares about Software Process Modelling? A First Investigation about the P...Who cares about Software Process Modelling? A First Investigation about the P...
Who cares about Software Process Modelling? A First Investigation about the P...
 
SpagoBI Webinar @ OW2
SpagoBI Webinar @ OW2SpagoBI Webinar @ OW2
SpagoBI Webinar @ OW2
 
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBI
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBIGeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBI
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBI
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1
 
Ac2017 8. metrics forprivacysafety-notes
Ac2017   8. metrics forprivacysafety-notesAc2017   8. metrics forprivacysafety-notes
Ac2017 8. metrics forprivacysafety-notes
 
KNIME tutorial
KNIME tutorialKNIME tutorial
KNIME tutorial
 
Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016Hobbit presentation at Apache Big Data Europe 2016
Hobbit presentation at Apache Big Data Europe 2016
 
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Modeling Requirements for the Management of Electronic Records
Modeling Requirements for the Management of Electronic RecordsModeling Requirements for the Management of Electronic Records
Modeling Requirements for the Management of Electronic Records
 
Towards an INSPIREd e-reporting & INSPIRE priority datasets in Slovakia
Towards an INSPIREd e-reporting & INSPIRE priority datasets in SlovakiaTowards an INSPIREd e-reporting & INSPIRE priority datasets in Slovakia
Towards an INSPIREd e-reporting & INSPIRE priority datasets in Slovakia
 
Ws2001 sessione8 cibella_tuoto
Ws2001 sessione8 cibella_tuotoWs2001 sessione8 cibella_tuoto
Ws2001 sessione8 cibella_tuoto
 

Mais de IMPACT Centre of Competence

Mais de IMPACT Centre of Competence (20)

Session6 01.helmut schmid
Session6 01.helmut schmidSession6 01.helmut schmid
Session6 01.helmut schmid
 
Session1 03.hsian-an wang
Session1 03.hsian-an wangSession1 03.hsian-an wang
Session1 03.hsian-an wang
 
Session7 03.katrien depuydt
Session7 03.katrien depuydtSession7 03.katrien depuydt
Session7 03.katrien depuydt
 
Session7 02.peter kiraly
Session7 02.peter kiralySession7 02.peter kiraly
Session7 02.peter kiraly
 
Session6 04.giuseppe celano
Session6 04.giuseppe celanoSession6 04.giuseppe celano
Session6 04.giuseppe celano
 
Session6 03.sandra young
Session6 03.sandra youngSession6 03.sandra young
Session6 03.sandra young
 
Session6 02.jeremi ochab
Session6 02.jeremi ochabSession6 02.jeremi ochab
Session6 02.jeremi ochab
 
Session5 04.evangelos varthis
Session5 04.evangelos varthisSession5 04.evangelos varthis
Session5 04.evangelos varthis
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 03.juri opitz
Session2 03.juri opitzSession2 03.juri opitz
Session2 03.juri opitz
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 

Último

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 

Último (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 

IMPACT Final Event 26-06-2012 - Automated metadata extraction from title pages. Report on the FEP pilot at the German National Library by Christa Schöning-Walter (DNB)

  • 1. Outline IMPACT final event – The Hague – 26 June 2012 1. Institutional background 2. IMPACT test case Metadata extraction from title pages 3. Strategic goals 4. Preliminary work Evaluation of the FEP pilot 5. Results at the German National Library 6. Perspective Christa Schöning-Walter 1 2 | IMPACT event | June 26, 2012 | The German National Library (DNB) The German National Library (DNB) – some facts and figures (I) – some facts and figures (II) − Legal deposit: − Collection size (January 2012): 27 million media units Collecting, cataloguing, archiving and making available to the − Daily input: 1.500 physical units (each with 2 copies) general public all German and − Since 2006: German-language publications, Collection mandate includes non-physical media publications about Germany etc (online publications) from 1913 − DNBG = Law regarding the German National Library − Bibliographic services: − PflAV = Legal Deposit Regulation − National Bibliography − Authority files − Since 2009: Considerations on and implementation of automated − Bibliographic standards cataloguing processes − 2 sites: Leipzig, Frankfurt am Main 3 | IMPACT event | June 26, 2012 | 4 | IMPACT event | June 26, 2012 | 1
  • 2. Target of the IMPACT scenario Starting point Opening questions (summer 2011): Since January 2012: − Experimental application studies in collaboration with − Can metadata extraction from title pages successfully the University of Innsbruck be done by a rule engine in case of simple structured monographic publications? − Using the rule-based exploitation features of FEP (Functional Extension Parser) − Is this useful in order to accelerate the cataloguing processes if no machine-readable metadata from What is FEP? other sources is available? − Software platform for the purpose of analysing the Test case: Theses logical structure of documents − Developed within IMPACT work package EE4 − 14.000 print units annually (Goal: enrichment of OCR output with structure − simple structure !? information) 5 | IMPACT event | June 26, 2012 | 6 | IMPACT event | June 26, 2012 | Strategic goals Conceptual design of the workflow Example: http://d-nb.info/1017138931 In particular: Accession Repository FEP results − Making descriptive cataloguing less time-consuming and (Printed media literature processing of printed media faster by units) OAI-Harvester Cataloguing − Partial digitisation − Automated metadata extraction Bibliographic Qualitiy − Result transfer into the bibliographic record Data Provider check record − Quality check and completion of cataloguing by the Statistics staff a Generally: Service partner Scan service OCR output/ Stack − Gaining experience in the area of automated metadata (title page + Indexing extraction / automated cataloguing ToC) 7 | IMPACT event | June 26, 2012 | 8 | IMPACT event | June 26, 2012 | 2
  • 3. The Objective: Automated exploitation of descriptive bibliographic data − Specification, implementation, evaluation and gradually improvement of − Appropriate structure types − Dictionaries (controlled vocabulary, The idea: indicating keywords, Taking bibliographic data over abbreviations etc) from metadata mining tools. − Expert rules − Etc Illustration: University of Innsbruck 9 | IMPACT event | June 26, 2012 | 10 | Preliminary work (I) Preliminary work (II) − Specification of the bibliographic statements to be mined − Going over some hundreds of title pages of theses from the title page (scans from 2009-2011 + documents from daily business) Attribute Value − Exploring typical structural patterns / regularities etc, Publication year 2010 such as Examples of indicating phrases to find out Language code /1ger − Prefixes the creator: /1eng von − Phrases von <Verfasser> vorgelegte Dissertation Creator <last name>,<first name> − Notation von Herrn/Frau: Title <full title>:<additional title information>/ vorgelegt von(:) − Position vorgelegt JJJJ von <author statement> vorgelegt dem Fachbereich ... von Size 30 cm Name: 21 cm Name des Verfassers: Theses statement <city name>, <corporate body name>, Expert rules Name der Verfasserin: verfasst von(:) <type of publication>,<year of graduation> eingereicht von 11 | IMPACT event | June 26, 2012 | 12 | IMPACT event | June 26, 2012 | ... 3
  • 4. Preliminary work (III) Preliminary work (IV) Theses statement items (examples): … Choosing / preparing Berlin, ESCP Europe Wirtschaftshochschule − Setting up a sample of documents for evaluation dictionaries for tagging, Berlin, Freie Univ. purposes: matching and mapping Berlin, Humboldt-Univ. Berlin, Steinbeis-Hochsch. − 1.000 theses from several universities purposes: Berlin, Techn. Univ. − Publication year: 2010 – 2011 Berlin, Univ. der Künste − List of universities … − Different dimensions (A- and B-size) which have the right to − Scans: 300 dpi, bitonal graduation (identifying Academic grades (examples): − Transfer format: Pdf (in future: XML files) the corporate bodies) … M.A. Master of Arts / Magister Artium − Ground truth determination: − Name Authority File M.Sc. Master of Science M.Eng. Master of Engineering − Manually region tagging on image files subset (identifying LL.M. Master of Laws / Legum Magister (done in Vietnam by the Aletheia tool) personal names) M.F.A. Master of Fine Arts M.Mus. Master of Music − List of academic grades M.Ed. Master of Education 13 | IMPACT event | June 26, 2012 | … 14 | IMPACT event | June 26, 2012 | Document processing in brief Results − Database: Storage of all Second test phase with a revised list of universities (June 2012): available information (OCR output, automatically or manually produced annotations, dictionaries, facts etc) − Input of expert rules − Rule engine: Stepwise proceeding taking intermediary results into account Illustration: University of Innsbruck (1) total conformity (2) complete title + noise (just to be deleted by the staff) 15 | IMPACT event | June 26, 2012 | 16 | IMPACT event | June 26, 2012 | 4
  • 5. Forecast: Feasibility study New ideas − Technical and organisational requirements: − Extraction of defined structures from the body of Operational aspects, technical workflow, interfaces etc monographic publications, such as table of contents, abstracts, pure text (without any introductory remarks, − Further functional enhancements needed: footers, references etc) − Dictionary maintenance: Expanding controlled vocabulary, sorting out unsuitable items etc Target: − Taking additional facts into account: Ground truth etc − Improvement of the results of current automated subject cataloguing projects, such as − Additional expert rules (?) − Thematic classification by machine learning − Additional functions: Language guesser, document techniques size etc − Subject headings obtainment by text analysis − Customising FEP (?) techniques Reducing the noise via preceding structure analysis processes 17 | IMPACT event | June 26, 2012 | 18 | IMPACT event | June 26, 2012 | Thank you for your attention. Christa Schöning-Walter Sandra Hamm Staff position ’Automated Cataloguing’ Project leader c.schoening@dnb.de s.hamm@dnb.de German National Library Digital Services Frankfurt am Main, Germany 19 | IMPACT event | June 26, 2012 | 5