On the Semantic Representation and Extraction of Complex Category Descriptors

•Transferir como PPTX, PDF•

1 gostou•849 visualizações

Natural language descriptors used for categorizations are present from folksonomies to ontologies. While some descriptors are composed of simple expressions, other descriptors have complex compositional patterns (e.g. ‘French Senators Of The Second Empire’, ‘Churches Destroyed In The Great Fire Of London And Not Rebuilt’). As conceptual models get more complex and decentralized, more content is transferred to unstructured natural language descriptors, increasing the terminological variation, reducing the conceptual integration and the structure level of the model. This work describes a formal representation for complex natural language category descriptors (NLCDs). In the representation, complex categories are decomposed into a graph of primitive concepts, supporting their interlinking and semantic interpretation. A category extractor is built and the quality of its extraction under the proposed representation model is evaluated.

Ciências Tecnologia Educação

On the Semantic Representation and
Extraction of Complex Category
Descriptors
André Freitas, Rafael Vieira, Edward Curry, Danilo
Carvalho, João C. Pereira da Silva
Insight Centre for Data Analytics
NLDB 2014
Montpellier, France

Outline
 Motivation
 Extracting Natural Language Category Descriptors (NLCDs)
 Evaluation
 Summary
2

Big Data
 Vision: More complete data-based picture of the world for
systems and users.
4

“Schema” Growth & Complexity
 Fundamental shift in the database landscape
 How to build large ‘schemas’?
10s-100s attributes
1,000s-1,000,000s attributes
5

Target Motivational Scenario: Wikipedia
 Decentralized content generation
 300,000 editors have edited Wikipedia more than 10 times
 > 280,000 distinct Natural Language Category Descriptors
(NLCDs)
6

Natural Language Category Descriptors
(NLCDs)
7

NLCDs
 Natural Language Category Descriptors (NLCDs) are
natural language descriptors for sets
 Simple NLCDs:
- ‘People’
- ‘Countries’
- ‘Films’
 Complex NLCDs:
- ‘French Senators Of The Second Empire’
- ‘United Kingdom Parliamentary Constituencies Represented
By A Sitting Prime Minister’
 Goal:
- Parse NLCDs into an integrated structured graph
8

Assumptions
N
L
C
D
 NLCDs as a more syntactically tractable subset of natural
language
 NLCDs as a low effort interface for structuring a domain of
discourse
IE
9

Formality vs. Usability Spectrum
NLCDss NLCD graphss
Information Extraction
10
NLCD graphss

Applications
 Database Creation
 Semantic Annotation
 Entity/Semantic Search
11

Other Examples
 IFRS and US GAAP
- ‘Partially owned properties’
- ‘Residential portfolio segment’
- ‘Assets arising from exploration for and evaluation of mineral
resources’
- ‘Key management personnel compensation’
- ‘Other long-term employee benefits’
12

Extracting Natural Language
Category Descriptors (NLCDs)
13

Natural Language Category
Descriptors
What is Big Data?
14

Core Features
 Manual analysis of 10,000 NLCDs.
15

Features/Core Lexical Categories
Distribution
16

Focus of the Representation
 Taxonomic Structure
 Context Representation (Open Relation Extraction)
- Reification-based

NLCD Extractor: Named Entity
Recognition
27

NLCD Extractor: Entity Linking
30
Dbpedia

NLCD Extractor: RDF Representation
31
Dbpedia

Evaluation Setup
 Total of 287,957 English Wikipedia categories (Open Domain
scenario)
 Selected random sample of 2,696 categories
 Manual evaluation of the core extraction features
- Entity segmentation
- Relation identification
- Unary operators
- Specialization relations
- Category core identification
- Entity core identification
- Word Sense Disambiguation (WordNet)
- Entity linking (DBpedia)
34

Results
 Performance:
- (i) graph extraction time: 9.8 ms per graph
- (ii) word sense disambiguation: 121.0 ms per word
- (iii) entity linking: 530.0 ms per link
* i5-3317U (1.70GHz) CPU computer with 4GB RAM (4 core, 2 threads per core).
35

Summary
 NLCDs can provide a more tractable (from the IE perspective)
natural language interface for structuring large KBs
 We developed an approach for the representation, extraction
and integration of NLCDs
- ~75% extraction accuracy
 Limitations:
- Need for a more principled and formal definition for a NLCD
- Need for a better entity recognition and linking approach
 Future Work: evaluation under a domain-specific scenario
36

Mais conteúdo relacionado

Semelhante a On the Semantic Representation and Extraction of Complex Category Descriptors

Semantic Web in ActionSebastian Ryszard Kruk

Contributing to the Smart City Through Linked Library DataMarcia Zeng

CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum

Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...Jennifer Bowen

Linked Open Data VisualizationLaura Po

Scaling the (evolving) web data –at low cost-WU (Vienna University of Economics and Business)

NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...National Information Standards Organization (NISO)

Irish Digital Libraries SummitSebastian Ryszard Kruk

Using Architectures for Semantic Interoperability to Create Journal Clubs for...James Powell

Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"National Information Standards Organization (NISO)

PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...Dimitris Kontokostas

Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel

Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra

LODStats (Presentation for KESW2013 System Demo)Ivan Ermilov

Integrating Heterogeneous Data Sources in the Web of DataFranck Michel

Role of Ontologies in Semantic Digital LibrariesSebastian Ryszard Kruk

Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne

Technologies For Appraising and Managing Electronic Recordspbajcsy

Standardizing for Open DataIvan Herman

The web of interlinked data and knowledge strippedSören Auer

Semelhante a On the Semantic Representation and Extraction of Complex Category Descriptors (20)

Semantic Web in Action

Contributing to the Smart City Through Linked Library Data

CSHALS 2010 W3C Semanic Web Tutorial

Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...

Linked Open Data Visualization

Scaling the (evolving) web data –at low cost-

NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...

Irish Digital Libraries Summit

Using Architectures for Semantic Interoperability to Create Journal Clubs for...

Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"

PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...

Make our Scientific Datasets Accessible and Interoperable on the Web

Neural Text Embeddings for Information Retrieval (WSDM 2017)

LODStats (Presentation for KESW2013 System Demo)

Integrating Heterogeneous Data Sources in the Web of Data

Role of Ontologies in Semantic Digital Libraries

Facilitating Data Curation: a Solution Developed in the Toxicology Domain

Technologies For Appraising and Managing Electronic Records

Standardizing for Open Data

The web of interlinked data and knowledge stripped

Mais de Andre Freitas

AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAndre Freitas

AI Systems @ ManchesterAndre Freitas

AI Beyond Deep LearningAndre Freitas

Building AI Applications using Knowledge GraphsAndre Freitas

Open IE tutorial 2018Andre Freitas

Effective Semantics for Engineering NLP SystemsAndre Freitas

SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...Andre Freitas

Semantic Perspectives for Contemporary Question Answering SystemsAndre Freitas

Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas

Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas

Word Tagging with Foundational Ontology ClassesAndre Freitas

Different Semantic Perspectives for Question Answering SystemsAndre Freitas

WiSS Challenge - Day 2Andre Freitas

WISS QA Do it yourself Question answering over Linked DataAndre Freitas

Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas

How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas

Semantics at Scale: A Distributional ApproachAndre Freitas

Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas

A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas

How Semantic Technologies can help to cure Hearing Loss?Andre Freitas

Mais de Andre Freitas (20)

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends

AI Systems @ Manchester

AI Beyond Deep Learning

Building AI Applications using Knowledge Graphs

Open IE tutorial 2018

Effective Semantics for Engineering NLP Systems

SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...

Semantic Perspectives for Contemporary Question Answering Systems

Semantic Relation Classification: Task Formalisation and Refinement

Categorization of Semantic Roles for Dictionary Definitions

Word Tagging with Foundational Ontology Classes

Different Semantic Perspectives for Question Answering Systems

WiSS Challenge - Day 2

WISS QA Do it yourself Question answering over Linked Data

Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...

Semantics at Scale: A Distributional Approach

Schema-agnositc queries over large-schema databases: a distributional semanti...

A Semantic Web Platform for Automating the Interpretation of Finite Element ...

How Semantic Technologies can help to cure Hearing Loss?

Último

Porella : features, morphology, anatomy, reproduction etc.Silpa

Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa

biology HL practice questions IB BIOLOGY1301aanya

pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1

Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa

Selaginella: features, morphology ,anatomy and reproduction.Silpa

PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani

Site Acceptance Test .Poonam Aher Patil

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson

development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6

Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay

The Mariana Trench remarkable geological features on Earth.pptxseri bangash

Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)AkefAfaneh2

Bacterial Identification and ClassificationsAreesha Ahmad

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation

Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1

On the Semantic Representation and Extraction of Complex Category Descriptors

1. On the Semantic Representation and Extraction of Complex Category Descriptors André Freitas, Rafael Vieira, Edward Curry, Danilo Carvalho, João C. Pereira da Silva Insight Centre for Data Analytics NLDB 2014 Montpellier, France

2. Outline  Motivation  Extracting Natural Language Category Descriptors (NLCDs)  Evaluation  Summary 2

3. Motivation 3

4. Big Data  Vision: More complete data-based picture of the world for systems and users. 4

5. “Schema” Growth & Complexity  Fundamental shift in the database landscape  How to build large ‘schemas’? 10s-100s attributes 1,000s-1,000,000s attributes 5

6. Target Motivational Scenario: Wikipedia  Decentralized content generation  300,000 editors have edited Wikipedia more than 10 times  > 280,000 distinct Natural Language Category Descriptors (NLCDs) 6

7. Natural Language Category Descriptors (NLCDs) 7

8. NLCDs  Natural Language Category Descriptors (NLCDs) are natural language descriptors for sets  Simple NLCDs: - ‘People’ - ‘Countries’ - ‘Films’  Complex NLCDs: - ‘French Senators Of The Second Empire’ - ‘United Kingdom Parliamentary Constituencies Represented By A Sitting Prime Minister’  Goal: - Parse NLCDs into an integrated structured graph 8

9. Assumptions N L C D  NLCDs as a more syntactically tractable subset of natural language  NLCDs as a low effort interface for structuring a domain of discourse IE 9

10. Formality vs. Usability Spectrum NLCDss NLCD graphss Information Extraction 10 NLCD graphss

11. Applications  Database Creation  Semantic Annotation  Entity/Semantic Search 11

12. Other Examples  IFRS and US GAAP - ‘Partially owned properties’ - ‘Residential portfolio segment’ - ‘Assets arising from exploration for and evaluation of mineral resources’ - ‘Key management personnel compensation’ - ‘Other long-term employee benefits’ 12

13. Extracting Natural Language Category Descriptors (NLCDs) 13

14. Natural Language Category Descriptors What is Big Data? 14

15. Core Features  Manual analysis of 10,000 NLCDs. 15

16. Features/Core Lexical Categories Distribution 16

17. Number of distinct POS Tag patterns 17

18. Graph Representation Model 18

19. Focus of the Representation  Taxonomic Structure  Context Representation (Open Relation Extraction) - Reification-based

20. Examples 20

21. Examples 21

22. Examples 22

23. Examples 23

24. NLCD Extractor 24

25. NLCD Extractor: POS Tagging 25

26. NLCD Extractor: Segmentation 26

27. NLCD Extractor: Named Entity Recognition 27

28. NLCD Extractor: Core Detection 28

29. NLCD Extractor: WSD 29

30. NLCD Extractor: Entity Linking 30 Dbpedia

31. NLCD Extractor: RDF Representation 31 Dbpedia

32. RDF Representation 32

33. Evaluation 33

34. Evaluation Setup  Total of 287,957 English Wikipedia categories (Open Domain scenario)  Selected random sample of 2,696 categories  Manual evaluation of the core extraction features - Entity segmentation - Relation identification - Unary operators - Specialization relations - Category core identification - Entity core identification - Word Sense Disambiguation (WordNet) - Entity linking (DBpedia) 34

35. Results  Performance: - (i) graph extraction time: 9.8 ms per graph - (ii) word sense disambiguation: 121.0 ms per word - (iii) entity linking: 530.0 ms per link * i5-3317U (1.70GHz) CPU computer with 4GB RAM (4 core, 2 threads per core). 35

36. Summary  NLCDs can provide a more tractable (from the IE perspective) natural language interface for structuring large KBs  We developed an approach for the representation, extraction and integration of NLCDs - ~75% extraction accuracy  Limitations: - Need for a more principled and formal definition for a NLCD - Need for a better entity recognition and linking approach  Future Work: evaluation under a domain-specific scenario 36

On the Semantic Representation and Extraction of Complex Category Descriptors

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a On the Semantic Representation and Extraction of Complex Category Descriptors

Semelhante a On the Semantic Representation and Extraction of Complex Category Descriptors (20)

Mais de Andre Freitas

Mais de Andre Freitas (20)

Último

Último (20)

On the Semantic Representation and Extraction of Complex Category Descriptors