SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
EXTRACTING INFORMATION FROM CLASSICS SCHOLARLY TEXTS
                                                                    Matteo Romanello, matteo.romanello@kcl.ac.uk


                                                           Goal                                                                   HIDDEN WORD PUZZLE
    The project at a glance
    ●  PhD research project in Digital Humanities
                                                           Devising an automatic system to improve                                To solve the puzzle find the
    (DH)                                                   information retrieval over a discipline-specif c
                                                                                                        i                         words in the schema by
    ● discipline: DH, Classics (Greek and Latin
                                                           corpus of unstructured texts.                                          using a word list as clue.
    literature)
                                                                        CORPUS: Open Access                                       At the end you'll have added
    ● topic: extracting structured information from a                                                                             information to the initially
                                                                  collection of Classics journal papers
    corpus of unstructured texts                                                                                                  chaotic picture.
                                                           Why automatic? Because automatic means also          Steps
Gone digital. What changed?                                scalable when you are dealing with a huge quantity
                                                           of data.                                             1. Building the corpus (OCR, preprocessing)
                  We are moving from books to e-           Information retrieval: the task of retrieving
                  books, and from journals to e-           information (most of the times accomplished by       2. Making the data sources interoperable
                  journals as we are using them            using search engines)                                (when the same entity E appears in DB1 and DB2,
                  almost daily.                                                                                 the information about E in DB1 have to be added
                                                           Corpus of unstructured texts: collection of plain
                  Is our way of accessing                  texts, without any kind of mark-up (such as XML).    to information about E in DB2)
                  information actually changed
                  with the use of digital tools?                                                                3. Finding in the corpus the mentions of
                                                                                Information can be              REALIA (place, names, work passages, etc.)
                  Did just the format change or                                 accessed using multiple
                  are we provided with innovative                               access points that are          4. Disambiguating the mentions of REALIA
                  ways of accessing information                                 meaningful for scholars
                  based on digital technologies?                                in a specif c f eld.
                                                                                          i i                   5. Automatic creation of new indices to the
                                                                                                                texts

Access points to information in Classics                   Method
                                                                                                                Expected results
Print resources                                            1. Reuse existing data resources containing
                                                           structured information (such as gazetteers,          ●Providing automatically multiple
● Table of Content (TOC)
                                                           authority lists, etc.) stored using different data   meaningful entry points to information
● Indexes (index of citations, index of greek word,

index of geographic place, index of names, etc.)           formats (Relational DataBases, XML f les, i          ● Enrich the corpus with links to navigate
                                                           etc.)
                    Electronic resources                                                                        through resources
                    ● TOCs                                 2. Apply Computational Linguistic and
                    ● Access through search engines        Natural Language Processing algorithms               ● Exploiting extracted information to
                    ● ?                                    for the information extraction                       improve user access to the corpus

* usually provided just for monographs because expensive   3. Use structured data as training data for          ●Demonstrate the scalability of the
to be produced                                             the algorithms which “mines” the unstructured        approach
                                                           text corpus


                                           Centre of Computing in the Humanities (CCH), King's College London

Mais conteúdo relacionado

Semelhante a [poster] Extracting Information From Classics Scholarly Texts

IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsStuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsMatteo Romanello
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Archiving and managing a million or more data files on BiG Grid
Archiving and managing a million or more data files on BiG GridArchiving and managing a million or more data files on BiG Grid
Archiving and managing a million or more data files on BiG Gridpkdoorn
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text minianhcrowley
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebNuxeo
 
Invited talk @ DCC09 workshop
Invited talk @ DCC09 workshopInvited talk @ DCC09 workshop
Invited talk @ DCC09 workshopPaolo Missier
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Databasetmra
 
Adding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylenAdding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylenDynamic People B.V.
 

Semelhante a [poster] Extracting Information From Classics Scholarly Texts (20)

IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsStuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Archiving and managing a million or more data files on BiG Grid
Archiving and managing a million or more data files on BiG GridArchiving and managing a million or more data files on BiG Grid
Archiving and managing a million or more data files on BiG Grid
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
Invited talk @ DCC09 workshop
Invited talk @ DCC09 workshopInvited talk @ DCC09 workshop
Invited talk @ DCC09 workshop
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
03 Object Dbms Technology
03 Object Dbms Technology03 Object Dbms Technology
03 Object Dbms Technology
 
A spatio-temporal visual analysis tool for historical dictionaries.
A spatio-temporal visual analysis tool for historical dictionaries. A spatio-temporal visual analysis tool for historical dictionaries.
A spatio-temporal visual analysis tool for historical dictionaries.
 
Adding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylenAdding structure to unstructured content for enhanced findability hakan tylen
Adding structure to unstructured content for enhanced findability hakan tylen
 

Mais de Matteo Romanello

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Matteo Romanello
 
Scaling up the Extraction of Canonical Citations in Classics
Scaling up the Extraction of Canonical Citations in ClassicsScaling up the Extraction of Canonical Citations in Classics
Scaling up the Extraction of Canonical Citations in ClassicsMatteo Romanello
 
Transforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksTransforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksMatteo Romanello
 
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Matteo Romanello
 
Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Matteo Romanello
 
Exploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsExploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsMatteo Romanello
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?Matteo Romanello
 
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
DIGITAL HUMANITIES   E FILOLOGIA   Un'introduzioneDIGITAL HUMANITIES   E FILOLOGIA   Un'introduzione
DIGITAL HUMANITIES E FILOLOGIA Un'introduzioneMatteo Romanello
 
Rethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesRethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesMatteo Romanello
 
Presentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoPresentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoMatteo Romanello
 
Linking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsLinking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsMatteo Romanello
 
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...Matteo Romanello
 
M.Romanello Ecal Presentation
M.Romanello Ecal PresentationM.Romanello Ecal Presentation
M.Romanello Ecal PresentationMatteo Romanello
 

Mais de Matteo Romanello (16)

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
 
Scaling up the Extraction of Canonical Citations in Classics
Scaling up the Extraction of Canonical Citations in ClassicsScaling up the Extraction of Canonical Citations in Classics
Scaling up the Extraction of Canonical Citations in Classics
 
Transforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksTransforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation Networks
 
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
 
Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014
 
Exploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsExploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in Classics
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and Space
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?
 
Romanello tokyo
Romanello tokyoRomanello tokyo
Romanello tokyo
 
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
DIGITAL HUMANITIES   E FILOLOGIA   Un'introduzioneDIGITAL HUMANITIES   E FILOLOGIA   Un'introduzione
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
 
Ht159 Poster
Ht159 PosterHt159 Poster
Ht159 Poster
 
Rethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesRethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by Ontologies
 
Presentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoPresentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, Toronto
 
Linking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsLinking Primary and Secondary by Microformats
Linking Primary and Secondary by Microformats
 
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
 
M.Romanello Ecal Presentation
M.Romanello Ecal PresentationM.Romanello Ecal Presentation
M.Romanello Ecal Presentation
 

Último

How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 

Último (20)

How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 

[poster] Extracting Information From Classics Scholarly Texts

  • 1. EXTRACTING INFORMATION FROM CLASSICS SCHOLARLY TEXTS Matteo Romanello, matteo.romanello@kcl.ac.uk Goal HIDDEN WORD PUZZLE The project at a glance ● PhD research project in Digital Humanities Devising an automatic system to improve To solve the puzzle find the (DH) information retrieval over a discipline-specif c i words in the schema by ● discipline: DH, Classics (Greek and Latin corpus of unstructured texts. using a word list as clue. literature) CORPUS: Open Access At the end you'll have added ● topic: extracting structured information from a information to the initially collection of Classics journal papers corpus of unstructured texts chaotic picture. Why automatic? Because automatic means also Steps Gone digital. What changed? scalable when you are dealing with a huge quantity of data. 1. Building the corpus (OCR, preprocessing) We are moving from books to e- Information retrieval: the task of retrieving books, and from journals to e- information (most of the times accomplished by 2. Making the data sources interoperable journals as we are using them using search engines) (when the same entity E appears in DB1 and DB2, almost daily. the information about E in DB1 have to be added Corpus of unstructured texts: collection of plain Is our way of accessing texts, without any kind of mark-up (such as XML). to information about E in DB2) information actually changed with the use of digital tools? 3. Finding in the corpus the mentions of Information can be REALIA (place, names, work passages, etc.) Did just the format change or accessed using multiple are we provided with innovative access points that are 4. Disambiguating the mentions of REALIA ways of accessing information meaningful for scholars based on digital technologies? in a specif c f eld. i i 5. Automatic creation of new indices to the texts Access points to information in Classics Method Expected results Print resources 1. Reuse existing data resources containing structured information (such as gazetteers, ●Providing automatically multiple ● Table of Content (TOC) authority lists, etc.) stored using different data meaningful entry points to information ● Indexes (index of citations, index of greek word, index of geographic place, index of names, etc.) formats (Relational DataBases, XML f les, i ● Enrich the corpus with links to navigate etc.) Electronic resources through resources ● TOCs 2. Apply Computational Linguistic and ● Access through search engines Natural Language Processing algorithms ● Exploiting extracted information to ● ? for the information extraction improve user access to the corpus * usually provided just for monographs because expensive 3. Use structured data as training data for ●Demonstrate the scalability of the to be produced the algorithms which “mines” the unstructured approach text corpus Centre of Computing in the Humanities (CCH), King's College London