[poster] Extracting Information From Classics Scholarly Texts

•

0 gostou•497 visualizações

Matteo Romanello

Poster presented at the Research Fair 2009 at King's College (London).

Educação Tecnologia

EXTRACTING INFORMATION FROM CLASSICS SCHOLARLY TEXTS
Matteo Romanello, matteo.romanello@kcl.ac.uk

Goal HIDDEN WORD PUZZLE
The project at a glance
● PhD research project in Digital Humanities
Devising an automatic system to improve To solve the puzzle find the
(DH) information retrieval over a discipline-specif c
i words in the schema by
● discipline: DH, Classics (Greek and Latin
corpus of unstructured texts. using a word list as clue.
literature)
CORPUS: Open Access At the end you'll have added
● topic: extracting structured information from a information to the initially
collection of Classics journal papers
corpus of unstructured texts chaotic picture.
Why automatic? Because automatic means also Steps
Gone digital. What changed? scalable when you are dealing with a huge quantity
of data. 1. Building the corpus (OCR, preprocessing)
We are moving from books to e- Information retrieval: the task of retrieving
books, and from journals to e- information (most of the times accomplished by 2. Making the data sources interoperable
journals as we are using them using search engines) (when the same entity E appears in DB1 and DB2,
almost daily. the information about E in DB1 have to be added
Corpus of unstructured texts: collection of plain
Is our way of accessing texts, without any kind of mark-up (such as XML). to information about E in DB2)
information actually changed
with the use of digital tools? 3. Finding in the corpus the mentions of
Information can be REALIA (place, names, work passages, etc.)
Did just the format change or accessed using multiple
are we provided with innovative access points that are 4. Disambiguating the mentions of REALIA
ways of accessing information meaningful for scholars
based on digital technologies? in a specif c f eld.
i i 5. Automatic creation of new indices to the
texts

Access points to information in Classics Method
Expected results
Print resources 1. Reuse existing data resources containing
structured information (such as gazetteers, ●Providing automatically multiple
● Table of Content (TOC)
authority lists, etc.) stored using different data meaningful entry points to information
● Indexes (index of citations, index of greek word,

index of geographic place, index of names, etc.) formats (Relational DataBases, XML f les, i ● Enrich the corpus with links to navigate
etc.)
Electronic resources through resources
● TOCs 2. Apply Computational Linguistic and
● Access through search engines Natural Language Processing algorithms ● Exploiting extracted information to
● ? for the information extraction improve user access to the corpus

* usually provided just for monographs because expensive 3. Use structured data as training data for ●Demonstrate the scalability of the
to be produced the algorithms which “mines” the unstructured approach
text corpus

Centre of Computing in the Humanities (CCH), King's College London

Mais conteúdo relacionado

Semelhante a [poster] Extracting Information From Classics Scholarly Texts

IASSIT Kansa Presentationekansa

Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsMatteo Romanello

AI Beyond Deep LearningAndre Freitas

Archiving and managing a million or more data files on BiG Gridpkdoorn

Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman

Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana

Post 1What is text analytics How does it differ from text minianhcrowley

Linked Open data: CNRDatiGovIT

OAI7 Research Objectsseanb

USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij

Text mining introduction-1Sumit Sony

Topic Extraction on Domain OntologyKeerti Bhogaraju

Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa

Introduction to the Semantic WebNuxeo

Information Quality in the Web EraUniversità degli Studi di Milano-Bicocca

Invited talk @ DCC09 workshopPaolo Missier

Inquiry Optimization Technique for a Topic Map Databasetmra

03 Object Dbms TechnologyLaguna State Polytechnic University

A spatio-temporal visual analysis tool for historical dictionaries. Technological Ecosystems for Enhancing Multiculturality

Adding structure to unstructured content for enhanced findability hakan tylenDynamic People B.V.

Semelhante a [poster] Extracting Information From Classics Scholarly Texts (20)

IASSIT Kansa Presentation

Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts

AI Beyond Deep Learning

Archiving and managing a million or more data files on BiG Grid

Linked Data: Een extra ontstluitingslaag op archieven

Post 1What is text analytics How does it differ from text mini.docx

Post 1What is text analytics How does it differ from text mini

Linked Open data: CNR

OAI7 Research Objects

USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY

Text mining introduction-1

Topic Extraction on Domain Ontology

Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation

Introduction to the Semantic Web

Information Quality in the Web Era

Invited talk @ DCC09 workshop

Inquiry Optimization Technique for a Topic Map Database

03 Object Dbms Technology

A spatio-temporal visual analysis tool for historical dictionaries.

Adding structure to unstructured content for enhanced findability hakan tylen

Mais de Matteo Romanello

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Matteo Romanello

Scaling up the Extraction of Canonical Citations in ClassicsMatteo Romanello

Transforming Indexes Locorum into Citation NetworksMatteo Romanello

Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Matteo Romanello

Introduction to the Text Reuse panel at DH 2014Matteo Romanello

Exploring Citation Networks to Study Intertextuality in ClassicsMatteo Romanello

DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello

Greedy Enough for the Grid?Matteo Romanello

Romanello tokyoMatteo Romanello

DIGITAL HUMANITIES E FILOLOGIA Un'introduzioneMatteo Romanello

Ht159 PosterMatteo Romanello

Rethinking Critical Editions of Fragments by OntologiesMatteo Romanello

Presentatio @ ELPUB 2008, TorontoMatteo Romanello

Linking Primary and Secondary by MicroformatsMatteo Romanello

M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...Matteo Romanello

M.Romanello Ecal PresentationMatteo Romanello

Mais de Matteo Romanello (16)

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...

Scaling up the Extraction of Canonical Citations in Classics

Transforming Indexes Locorum into Citation Networks

Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...

Introduction to the Text Reuse panel at DH 2014

Exploring Citation Networks to Study Intertextuality in Classics

DARIAH Geo-browser: Exploring Data through Time and Space

Greedy Enough for the Grid?

Romanello tokyo

DIGITAL HUMANITIES E FILOLOGIA Un'introduzione

Ht159 Poster

Rethinking Critical Editions of Fragments by Ontologies

Presentatio @ ELPUB 2008, Toronto

Linking Primary and Secondary by Microformats

M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...

M.Romanello Ecal Presentation

Último

How to Fix XML SyntaxError in Odoo the 17Celine George

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar

Scientific Writing :Research DiscourseAnita GoswamiGiri

Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringSri Sairam College Of Engineering Bengaluru

Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO

31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection

Transaction Management in Database Management SystemChristalin Nelson

Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar

4.11.24 Poverty and Inequality in America.pptxmary850239

Oppenheimer Film Discussion for Philosophy and FilmStan Meyer

Measures of Position DECILES for ungrouped dataBabyAnnMotar

Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar

Expanded definition: technical and operationalssuser3e220a

Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7

Concurrency Control in Database Management systemChristalin Nelson

Textual Evidence in Reading and Writing of SHSMae Pangan

Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1

INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxExcellence Foundation for South Sudan

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco

[poster] Extracting Information From Classics Scholarly Texts

1. EXTRACTING INFORMATION FROM CLASSICS SCHOLARLY TEXTS Matteo Romanello, matteo.romanello@kcl.ac.uk Goal HIDDEN WORD PUZZLE The project at a glance ● PhD research project in Digital Humanities Devising an automatic system to improve To solve the puzzle find the (DH) information retrieval over a discipline-specif c i words in the schema by ● discipline: DH, Classics (Greek and Latin corpus of unstructured texts. using a word list as clue. literature) CORPUS: Open Access At the end you'll have added ● topic: extracting structured information from a information to the initially collection of Classics journal papers corpus of unstructured texts chaotic picture. Why automatic? Because automatic means also Steps Gone digital. What changed? scalable when you are dealing with a huge quantity of data. 1. Building the corpus (OCR, preprocessing) We are moving from books to e- Information retrieval: the task of retrieving books, and from journals to e- information (most of the times accomplished by 2. Making the data sources interoperable journals as we are using them using search engines) (when the same entity E appears in DB1 and DB2, almost daily. the information about E in DB1 have to be added Corpus of unstructured texts: collection of plain Is our way of accessing texts, without any kind of mark-up (such as XML). to information about E in DB2) information actually changed with the use of digital tools? 3. Finding in the corpus the mentions of Information can be REALIA (place, names, work passages, etc.) Did just the format change or accessed using multiple are we provided with innovative access points that are 4. Disambiguating the mentions of REALIA ways of accessing information meaningful for scholars based on digital technologies? in a specif c f eld. i i 5. Automatic creation of new indices to the texts Access points to information in Classics Method Expected results Print resources 1. Reuse existing data resources containing structured information (such as gazetteers, ●Providing automatically multiple ● Table of Content (TOC) authority lists, etc.) stored using different data meaningful entry points to information ● Indexes (index of citations, index of greek word, index of geographic place, index of names, etc.) formats (Relational DataBases, XML f les, i ● Enrich the corpus with links to navigate etc.) Electronic resources through resources ● TOCs 2. Apply Computational Linguistic and ● Access through search engines Natural Language Processing algorithms ● Exploiting extracted information to ● ? for the information extraction improve user access to the corpus * usually provided just for monographs because expensive 3. Use structured data as training data for ●Demonstrate the scalability of the to be produced the algorithms which “mines” the unstructured approach text corpus Centre of Computing in the Humanities (CCH), King's College London

[poster] Extracting Information From Classics Scholarly Texts

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a [poster] Extracting Information From Classics Scholarly Texts

Semelhante a [poster] Extracting Information From Classics Scholarly Texts (20)

Mais de Matteo Romanello

Mais de Matteo Romanello (16)

Último

Último (20)

[poster] Extracting Information From Classics Scholarly Texts