SlideShare uma empresa Scribd logo
1 de 26
Information Extraction
Joan Hurtado
Ignacio Delgado
Contents
• INTRODUCTION
• IE TASKS
• IE WITH CASCADED FINITE-SATATE TRANSDUCERS
• LEARNING-BASED APPROACHES TO IE
• HOW GOOD IS IE
INTRODUCTION
What is IE?
 Information Extraction is the
process of scanning text for
relevant information to
some interest
 Extract:
 Entities
 Relations
 Events
 Who did what to whom
when and where
Why IE?
 Need for eficient processing of texts in
specialized domains
 Focus on relevant parts, ignore the rest
 Typical applications:
 Gleaning business
 Government
 Military intelligence
 WWW searches (more specific than keywords)
 Scientific literature searches
 …
Most common uses
 Named Entity Recognition
 Identify names, special
entities (dates, times)
 Uses textual patterns
 Important at biomedical
applications
 IE is more than NER
 Recognition of events
and their participants
How to measure
performance
 Recall
 What percentage of the correct answers did the
system get
 Precision
 What percentage of the system’s answers were
correct
 F-score
 Weighted harmonic mean between recall and
precision
c
IE TASKS
Unstructured vs. Semi-structured
text
Unstructured
 Natural language
sentences
 Semantics depends on
linguistic analysis
 Examples:
 News stories
 Magazines articles
 Books
 …
Semi-structured
 Structured data
 Semantics defined by its
organization
 Physical layout plays role in
interpretation
 Examples:
 Job postings
 Rental ads
 …
Single-document vs. Multi-
document
 Originally IE systems designed for individual
documents
 Nowadays new systems to extract facts from WWW
 Both use similar techniques
 Distinguishing issue: redundancy
 Multi-document can exploit redundancy
 However need to challenge cross-document
coreference resolution
 Multi-document IE systems also are referred as
open-domain
Assumptions about
Incoming Documents
 Relevant only documents
 Single event documents
IE WITH CASCADE FINITE-STATE TRANSDUCERS
Complex Words
Basic Phrases
Complex Phrases
Domain Events
Template Generation
Complex Words
 Identify multiwords, company names, people
names, locations, dates, times and basic entities
 Recognition strategies:
 Patterns
 Dictionaries
 Context
Basic Phrases
 Some syntactic constructs can be
identified with reasonable
reliability:
 Noun group
 Verb group
 Strategies:
 Simple finite-state grammars
 Ambiguities
 Noun-verb ambiguity
 Verbs locally ambiguous
 Problems
 Not al languages have high
distinction between noun and
verb groups
Complex Phrases
 Recognize complex noun and verb groups
 Complex noun groups
 Appositives
 Measure phrases
 Prepositional attachments (of, for)
 Noun group conjunction
 Complex verb groups
 Verb conjunction
 Verb groups with same significance
 Domain-relevant entities can be recognized
Domain Events
 Ignore anything not identified in previous phases
 Domain events require domain-specific patterns
for identification
 Strategy:
 Finite-state machines
 Certain kind of “pseudo-syntax” can be done
 Nowadays IE systems begin to rely in full-
sentence parsing
Template Generation:
Merging Structures
 Previous stages operate within bounds of single
sentences
 Operate over whole text to combine previous
collected information into a unified whole
 If recognizing multiple events:
 Determine how many distinct events
 Assign each entity to appropriate event
LEARNING-BASED APPROACHES TO IE
Supervised Learning of
Extraction patterns & rules
 Reduce knowledge engineering bottleneck
required to create an IE system for a new domain
 Examples:
 AutoSlog  create lexico-syntactic patterns
 PALKA  patterns generalized based on words
semantics
 LIEP  identify syntactic paths related to roles
 CRYSTAL  “concept nodes” with lexical, syntactic
and semantic constrains
 WHISK  learn regular expressions
 Many others: SRV, RAPIER, …
Supervised Learning of
sequential classifier models
 View IE as a classification problem that can be
tackled using sequential learning models
 Read sequentially and label each word as an
extraction or a non-extraction
 Typical labeling scheme IOB
 Inside
 Outside
 Beginning of desired extraction
 Strategies:
 Hidden Markov Models
 Maximum Entropy Classifiers
 Support Vector Machines
Weakly supervised and
unsupervised approaches
 Annotating training text still requires time and
complexity
 Further techniques to learn extraction using weakly
supervised and unsupervised systems
 Examples
 AutoSlog-TS (preclassifed corpus which texts identified
as relevant or irrelevant)
 Ex-Disco (manually defined seed, patterns ranked, best
patterns selected added to seed)
 Meta-bootstraping (seed nouns that belong to
semantic class)
 On-Demand Information Extraction (dynamically learns
from queries)
Discourse-oriented
approaches to IE
 Most IE systems patterns focus only on local
context surrounding
 Extend systems to have more global view
 Strategy:
 Add constrains to connect entities in diferent
clauses
 Decision trees (WRAP-UP)
 Set of classifiers to identify new templates (ALICE)
HOW GOOD IS IE
How IE systems are
progressing?
 The 60% barrier in performance
 Biggest mistakes in entity and event coreference
 The implicit knowledge on NL not translated to texts
 Problems on training data not found on test data
 Good IE systems typically recognize 90% of entities
 An event requires about 4 entities
 0.9*0.9*0.9*0.9 = 65.61%
THANKS

Mais conteúdo relacionado

Mais procurados

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)Kobkrit Viriyayudhakorn
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseRAKESH P
 
Text analysis presentation ppt
Text analysis presentation pptText analysis presentation ppt
Text analysis presentation pptMs A
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 

Mais procurados (20)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Examples of Ontology Applications
Examples of Ontology ApplicationsExamples of Ontology Applications
Examples of Ontology Applications
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
 
Text analysis presentation ppt
Text analysis presentation pptText analysis presentation ppt
Text analysis presentation ppt
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 

Semelhante a Information Extraction

NetIKX Semantic Search Presentation
NetIKX Semantic Search PresentationNetIKX Semantic Search Presentation
NetIKX Semantic Search Presentationurvics
 
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...James Mullooly PhD
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceEnterprise Knowledge
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Finalguestcaef1d
 
Achieving the functions of a Uniform title, without a Uniform title?
Achieving the functions of a Uniform title, without a Uniform title?Achieving the functions of a Uniform title, without a Uniform title?
Achieving the functions of a Uniform title, without a Uniform title?Jenn Riley
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Taxonomy Development and Digital Projects
Taxonomy Development and Digital ProjectsTaxonomy Development and Digital Projects
Taxonomy Development and Digital Projects daniela barbosa
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methodsbutest
 

Semelhante a Information Extraction (20)

NetIKX Semantic Search Presentation
NetIKX Semantic Search PresentationNetIKX Semantic Search Presentation
NetIKX Semantic Search Presentation
 
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
 
Textmining
TextminingTextmining
Textmining
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Final
 
Achieving the functions of a Uniform title, without a Uniform title?
Achieving the functions of a Uniform title, without a Uniform title?Achieving the functions of a Uniform title, without a Uniform title?
Achieving the functions of a Uniform title, without a Uniform title?
 
Taxonomy And Metadata
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And Metadata
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Taxonomy Development and Digital Projects
Taxonomy Development and Digital ProjectsTaxonomy Development and Digital Projects
Taxonomy Development and Digital Projects
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
Ontology
OntologyOntology
Ontology
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
 

Mais de Ignacio Delgado

Mais de Ignacio Delgado (7)

Ignacio Delgado Portfolio
Ignacio Delgado PortfolioIgnacio Delgado Portfolio
Ignacio Delgado Portfolio
 
Presentacio "Kiwi" a la BCNAppsJam 2011
Presentacio "Kiwi" a la BCNAppsJam 2011Presentacio "Kiwi" a la BCNAppsJam 2011
Presentacio "Kiwi" a la BCNAppsJam 2011
 
Presentacio eaZyBK a FinAppsParty 2011
Presentacio eaZyBK a FinAppsParty 2011Presentacio eaZyBK a FinAppsParty 2011
Presentacio eaZyBK a FinAppsParty 2011
 
Phidgets 1
Phidgets 1Phidgets 1
Phidgets 1
 
Online Semantics
Online SemanticsOnline Semantics
Online Semantics
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Wiimote
WiimoteWiimote
Wiimote
 

Último

Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 

Último (20)

Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 

Information Extraction

  • 2. Contents • INTRODUCTION • IE TASKS • IE WITH CASCADED FINITE-SATATE TRANSDUCERS • LEARNING-BASED APPROACHES TO IE • HOW GOOD IS IE
  • 4. What is IE?  Information Extraction is the process of scanning text for relevant information to some interest  Extract:  Entities  Relations  Events  Who did what to whom when and where
  • 5. Why IE?  Need for eficient processing of texts in specialized domains  Focus on relevant parts, ignore the rest  Typical applications:  Gleaning business  Government  Military intelligence  WWW searches (more specific than keywords)  Scientific literature searches  …
  • 6. Most common uses  Named Entity Recognition  Identify names, special entities (dates, times)  Uses textual patterns  Important at biomedical applications  IE is more than NER  Recognition of events and their participants
  • 7. How to measure performance  Recall  What percentage of the correct answers did the system get  Precision  What percentage of the system’s answers were correct  F-score  Weighted harmonic mean between recall and precision
  • 9. Unstructured vs. Semi-structured text Unstructured  Natural language sentences  Semantics depends on linguistic analysis  Examples:  News stories  Magazines articles  Books  … Semi-structured  Structured data  Semantics defined by its organization  Physical layout plays role in interpretation  Examples:  Job postings  Rental ads  …
  • 10. Single-document vs. Multi- document  Originally IE systems designed for individual documents  Nowadays new systems to extract facts from WWW  Both use similar techniques  Distinguishing issue: redundancy  Multi-document can exploit redundancy  However need to challenge cross-document coreference resolution  Multi-document IE systems also are referred as open-domain
  • 11. Assumptions about Incoming Documents  Relevant only documents  Single event documents
  • 12. IE WITH CASCADE FINITE-STATE TRANSDUCERS
  • 13. Complex Words Basic Phrases Complex Phrases Domain Events Template Generation
  • 14. Complex Words  Identify multiwords, company names, people names, locations, dates, times and basic entities  Recognition strategies:  Patterns  Dictionaries  Context
  • 15. Basic Phrases  Some syntactic constructs can be identified with reasonable reliability:  Noun group  Verb group  Strategies:  Simple finite-state grammars  Ambiguities  Noun-verb ambiguity  Verbs locally ambiguous  Problems  Not al languages have high distinction between noun and verb groups
  • 16. Complex Phrases  Recognize complex noun and verb groups  Complex noun groups  Appositives  Measure phrases  Prepositional attachments (of, for)  Noun group conjunction  Complex verb groups  Verb conjunction  Verb groups with same significance  Domain-relevant entities can be recognized
  • 17. Domain Events  Ignore anything not identified in previous phases  Domain events require domain-specific patterns for identification  Strategy:  Finite-state machines  Certain kind of “pseudo-syntax” can be done  Nowadays IE systems begin to rely in full- sentence parsing
  • 18. Template Generation: Merging Structures  Previous stages operate within bounds of single sentences  Operate over whole text to combine previous collected information into a unified whole  If recognizing multiple events:  Determine how many distinct events  Assign each entity to appropriate event
  • 20. Supervised Learning of Extraction patterns & rules  Reduce knowledge engineering bottleneck required to create an IE system for a new domain  Examples:  AutoSlog  create lexico-syntactic patterns  PALKA  patterns generalized based on words semantics  LIEP  identify syntactic paths related to roles  CRYSTAL  “concept nodes” with lexical, syntactic and semantic constrains  WHISK  learn regular expressions  Many others: SRV, RAPIER, …
  • 21. Supervised Learning of sequential classifier models  View IE as a classification problem that can be tackled using sequential learning models  Read sequentially and label each word as an extraction or a non-extraction  Typical labeling scheme IOB  Inside  Outside  Beginning of desired extraction  Strategies:  Hidden Markov Models  Maximum Entropy Classifiers  Support Vector Machines
  • 22. Weakly supervised and unsupervised approaches  Annotating training text still requires time and complexity  Further techniques to learn extraction using weakly supervised and unsupervised systems  Examples  AutoSlog-TS (preclassifed corpus which texts identified as relevant or irrelevant)  Ex-Disco (manually defined seed, patterns ranked, best patterns selected added to seed)  Meta-bootstraping (seed nouns that belong to semantic class)  On-Demand Information Extraction (dynamically learns from queries)
  • 23. Discourse-oriented approaches to IE  Most IE systems patterns focus only on local context surrounding  Extend systems to have more global view  Strategy:  Add constrains to connect entities in diferent clauses  Decision trees (WRAP-UP)  Set of classifiers to identify new templates (ALICE)
  • 25. How IE systems are progressing?  The 60% barrier in performance  Biggest mistakes in entity and event coreference  The implicit knowledge on NL not translated to texts  Problems on training data not found on test data  Good IE systems typically recognize 90% of entities  An event requires about 4 entities  0.9*0.9*0.9*0.9 = 65.61%

Notas do Editor

  1. Semi-structured-> anuncios de clasificados