SlideShare uma empresa Scribd logo
1 de 35
Survey on Discourse
Annotation for Arabic
A. Algarni, H. Alharbi and N. Almutairy
Supervisor: Dr. A. Alsaif
April 23, 2013
Kingdom of Saudi Arabia
Ministry of Higher Education
Imam Mohammed Ibn Saud Islamic University
College of computer and Information Sciences
CS465 - Natural Language Processing –
1
Outline
 Introduction
 The Leeds Arabic Discourse Treebank
 Discourse Connective Recognition
 Discourse Relation Recognition
 Semantic-Based Segmentation
 Discourse Segmentation Based on Rhetorical
Methods
 A Comprehensive Taxonomy of Arabic Discourse
Coherence Relations
2
Introduction
 Linguistic annotation covers any descriptive
or analytic notations applied to raw language
data.
 Annotated Discourse Corpora can be very
useful to facilitate theoretical studies along
with contributing in the development of NLP
applications.
3
Applications
 Information extraction
 Question-answering
 Summarization
 Machine translation, generation.
4
Discourse Relations and
Discourse Connectives
 Discourse Relation is the way that two
arguments (text segments) logically connected.
 Temporal, Comparison, Causal, Expansion..etc
 Discourse Connective (DC) :A lexical marker
used to link two abstract objects in a text.
 Abstract Object (AO) : Abstract objects in
discourse are things like proposition
, events, facts and opinions.
 Argument (Arg) : A text expressing an abstract
object and linked by a DC.
5
The Leeds Arabic Discourse
Treebank
6
• First effort towards producing an Arabic
Discourse Treebank was introduced in 2011
by A. Alsaif and K. Markert.
• Collected a large set of Arabic discourse
connectives using text analysis and corpus
based techniques.
•Final list contains 107 discourse
connectives.
Types of Discourse connectives
7
Types of Relations
8
Types of Relations Cont..
 COMPARISON.Similarity:
9
Arabic Discourse Annotation Tool
(ADA) and Annotation Process
10
Annotation Methodology
1. Measuring whether annotators agree on
the binary decision on whether an item
constitutes a discourse connective in
context.
2. Measuring whether annotators agree on
which discourse relation an identified
connective expresses. As annotators can
use sets of relations for a connective.
11
Results
 Agreement in task 1 is highly reliable
(N=23331) percentage agreement of
0.95, kappa of 0.88.
 Agreement in task 2 (relation assignment)
is relatively low (N=5586), percentage
agreement of 0.66, kappa 0.57, and alpha
of 0.58.
12
Discourse Connective Recognition
 To distinguish between discourse and non-
discourse usage of a connective.
 Example: once, while.
 A. Alsaif and K.Markert (2011) introduced
a Connective identifier for Arabic based on
syntactic features.
13
Discourse Connective Recognition
by A. Alsaif and K.Markert (2011)
Features:
 Surface Features (SConn)
 Lexical features of surrounding words
(Lex)
 Example
Arg1DC
Arg2.
[Children might be tired]Arg1 [and]DC [feel sleepy]Arg2 during school time if they did
not sleep well
14
Features:
 Part of Speech features (POS)
 Syntactic category of related phrases
(Syn) (E.g.: / the school is
very large and beautiful)
 Al-Masdar feature.
Discourse Connective Recognition
by A. Alsaif and K.Markert (2011) Cont…
15
 Results
Discourse Connective Recognition
by A. Alsaif and K.Markert (2011) Cont…
Features Acurr K
Baseline (not Conn) 68.9 0
M1 Conn only 75.7 0.48
Tokenization by white space + auto tagger
M2
M3
M4
Conn+ SConn+Lex
Conn+ SConn+Lex+POS
Conn+SConn+Lex+POS+Masdar
85.6 0.62
87.6 0.69
88.5 0.70
ATB-based features
M5
M6
M7
Conn+SConn+Lex
Conn+SConn+Lex+Syn/POS
Conn+SConn+Lex+Syn/POS+Masdar
86.2 0.65
91.2 0.79
92.4 0.82
M8
M9
Conn+SConn+Syn
SConn+Lex+Syn+Masdar
91.2 0.79
91.2 0.79
16
Discourse Relation Recognition
 To identify the type of the relation
 A. Alsaif and K.Markert (2011) introduced
the first algorithms to automatically
identify relations for Arabic
17
Features:
 Connective features
 Words and POS of arguments
 Masdar
 Tense and Negation
 Length, Distance and Order Features
 Argument Parent
 Production Rules
Discourse Relation Recognition
by A. Alsaif and K.Markert (2011)
18
Results
Acurr kFeatures
All connectives (6039)
52.5 0Baseline (CONJUNCTION)
77.2 0.60
78.7 0.66
78.3 0.65
Conn only (1)
Conn+Conn f+ Arg f (37)
Conn+Conn f+ Arg f+ Production rules (1237)
M1
M2
M3
Excluding wa at BOP (3813)
35 0Baseline (CONJUNCTION)
74.3 0.65
77.0 0.69
76.7 0.69
Conn only (1)
Conn+Conn f+ Arg f (37)
Conn+Conn f+ Arg f+ Production rules (1237)
M1
M2
M3
19
Results
Acurr kFeatures
All connectives (6039)
62.4 0Baseline (EXPANSION )
88.7 0.78
88.7 0.78
Conn only (1)
Conn+Conn f+ Arg f (37)
M1
M2
Excluding wa at BOP (3813)
41.8 0Baseline (EXPANSION)
82.7 0.74
83.5 0.75
Conn only (1)
Conn+Conn f+ Arg f (37)
M1
M2
20
Semantic-Based Segmentation of
Arabic Texts
 Corpus Analysis
 Definition: Let L be a list of candidate
segments connectors, each element c in L is
classified based on its effects on the text
segmentation as either active or passive
 Examples:
.1[
][
[
.2]][
]
[
21
Segmentation Process
 Identifying the connectors that indicate
complete segments.
 Locating the active connectors.
 Resolving the case where adjacent active
connectors exist.
 Setting the segments boundaries.
 Creating the final list of segments.
22
Discussion
 evaluate the segmentation process, they
collected ten essays.
 Each essay ranges between 500 and 700
words.
 After implementing the segmentation
process.
 Gave the output to judges to evaluate
them in terms of two factors: correct
hit and incorrect hit.
23
Discussion Cont..
Incorrect hitCorrect hitEssay
0331
1152
0253
1234
0205
1296
1267
2338
0269
02210
24
Arabic Discourse Segmentation
Based on Rhetorical Methods
 This Method is depends on the meaning of
the connector " " in Arabic language.
 There are six types of " " classified into
two classes, "Fasl" and "Wasl " :
 "Fasl " : segmenting place.
 "Wasl " : unsegmenting but connecting
the text.
25
Types of Connector " "
ClassExampleType
Fasl
Fasl
Fasl
Wasl
Wasl
Wasl
26
The Arabic sentence
Segmentation System
27
Feature Extraction
•The following are the features of " ":
X3 = noun and X7 = accusative mark.
28
Experiment and Results
 They used 1200 instances for training.
 They used 293 instances for testing after
testing there are 290 correct and 3
incorrect instances.
 The result with:
94.68%Recall
96.82%Precision
98.98 %Accuracy
29
A Comprehensive Taxonomy of Arabic
Discourse Coherence Relations
 Coherence relations are classified into two
types: explicit relations and implicit
relations.
exampleCoherence relations
I am very happy because I got
excellent marks in exams.
Explicit relations
I am very happy. I got excellent
marks in exams.
Implicit relations.
30
The procedure of creating an Arabic
Taxonomy of Coherence Relations
31
Examples of Implicit Arabic
relations
 "Impossible condition / " :
 "Cascaded questioning/ :
(
32
Results
 They got a set of 47 Arabic coherence
relations.
coherence relations.Result
From English coherence
relations.
31
additional Arabic explicit
coherence relations.
12
Arabic implicit relations.4
33
Conclusion
Discourse Annotation is a very fertile field
and it has many NLP applications, for
Arabic there are some challenges due to
the lack of annotated corpora and studies.
34
Thank You
35

Mais conteúdo relacionado

Mais procurados

Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Waqas Tariq
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 

Mais procurados (17)

Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Rule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
Rule-based Prosody Calculation for Marathi Text-to-Speech SynthesisRule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
Rule-based Prosody Calculation for Marathi Text-to-Speech Synthesis
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...An exploratory research on grammar checking of Bangla sentences using statist...
An exploratory research on grammar checking of Bangla sentences using statist...
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 

Destaque (17)

Syntactic parsing for arabic
Syntactic parsing for arabicSyntactic parsing for arabic
Syntactic parsing for arabic
 
Arabic speech recognition
Arabic speech recognitionArabic speech recognition
Arabic speech recognition
 
Coreference recognition in arabic
Coreference recognition in arabicCoreference recognition in arabic
Coreference recognition in arabic
 
Speech recognition for arabic
Speech recognition for arabicSpeech recognition for arabic
Speech recognition for arabic
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Arabic spell checkers
Arabic spell  checkersArabic spell  checkers
Arabic spell checkers
 
Automatic summaraitztion for_arabic
Automatic summaraitztion for_arabicAutomatic summaraitztion for_arabic
Automatic summaraitztion for_arabic
 
Discourse annotation for arabic
Discourse annotation for arabicDiscourse annotation for arabic
Discourse annotation for arabic
 
Discourse annotation for arabic 3
Discourse annotation for arabic 3Discourse annotation for arabic 3
Discourse annotation for arabic 3
 
Discourse annotation
Discourse annotationDiscourse annotation
Discourse annotation
 
Building corpus from www for arabic
Building corpus from www for arabicBuilding corpus from www for arabic
Building corpus from www for arabic
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 

Semelhante a Discourse annotation for arabic 2

Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
cscpconf
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
csandit
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
威華 王
 
Athifah procedia technology_2013
Athifah procedia technology_2013Athifah procedia technology_2013
Athifah procedia technology_2013
Nong Tiun
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
mathsjournal
 

Semelhante a Discourse annotation for arabic 2 (20)

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Dialect classification using acoustic and linguistic features in Arabic speech
Dialect classification using acoustic and linguistic features in Arabic speechDialect classification using acoustic and linguistic features in Arabic speech
Dialect classification using acoustic and linguistic features in Arabic speech
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four Classifiers
 
The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
 
Athifah procedia technology_2013
Athifah procedia technology_2013Athifah procedia technology_2013
Athifah procedia technology_2013
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
 
arabic.pdf
arabic.pdfarabic.pdf
arabic.pdf
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Discourse annotation for arabic 2

  • 1. Survey on Discourse Annotation for Arabic A. Algarni, H. Alharbi and N. Almutairy Supervisor: Dr. A. Alsaif April 23, 2013 Kingdom of Saudi Arabia Ministry of Higher Education Imam Mohammed Ibn Saud Islamic University College of computer and Information Sciences CS465 - Natural Language Processing – 1
  • 2. Outline  Introduction  The Leeds Arabic Discourse Treebank  Discourse Connective Recognition  Discourse Relation Recognition  Semantic-Based Segmentation  Discourse Segmentation Based on Rhetorical Methods  A Comprehensive Taxonomy of Arabic Discourse Coherence Relations 2
  • 3. Introduction  Linguistic annotation covers any descriptive or analytic notations applied to raw language data.  Annotated Discourse Corpora can be very useful to facilitate theoretical studies along with contributing in the development of NLP applications. 3
  • 4. Applications  Information extraction  Question-answering  Summarization  Machine translation, generation. 4
  • 5. Discourse Relations and Discourse Connectives  Discourse Relation is the way that two arguments (text segments) logically connected.  Temporal, Comparison, Causal, Expansion..etc  Discourse Connective (DC) :A lexical marker used to link two abstract objects in a text.  Abstract Object (AO) : Abstract objects in discourse are things like proposition , events, facts and opinions.  Argument (Arg) : A text expressing an abstract object and linked by a DC. 5
  • 6. The Leeds Arabic Discourse Treebank 6 • First effort towards producing an Arabic Discourse Treebank was introduced in 2011 by A. Alsaif and K. Markert. • Collected a large set of Arabic discourse connectives using text analysis and corpus based techniques. •Final list contains 107 discourse connectives.
  • 7. Types of Discourse connectives 7
  • 9. Types of Relations Cont..  COMPARISON.Similarity: 9
  • 10. Arabic Discourse Annotation Tool (ADA) and Annotation Process 10
  • 11. Annotation Methodology 1. Measuring whether annotators agree on the binary decision on whether an item constitutes a discourse connective in context. 2. Measuring whether annotators agree on which discourse relation an identified connective expresses. As annotators can use sets of relations for a connective. 11
  • 12. Results  Agreement in task 1 is highly reliable (N=23331) percentage agreement of 0.95, kappa of 0.88.  Agreement in task 2 (relation assignment) is relatively low (N=5586), percentage agreement of 0.66, kappa 0.57, and alpha of 0.58. 12
  • 13. Discourse Connective Recognition  To distinguish between discourse and non- discourse usage of a connective.  Example: once, while.  A. Alsaif and K.Markert (2011) introduced a Connective identifier for Arabic based on syntactic features. 13
  • 14. Discourse Connective Recognition by A. Alsaif and K.Markert (2011) Features:  Surface Features (SConn)  Lexical features of surrounding words (Lex)  Example Arg1DC Arg2. [Children might be tired]Arg1 [and]DC [feel sleepy]Arg2 during school time if they did not sleep well 14
  • 15. Features:  Part of Speech features (POS)  Syntactic category of related phrases (Syn) (E.g.: / the school is very large and beautiful)  Al-Masdar feature. Discourse Connective Recognition by A. Alsaif and K.Markert (2011) Cont… 15
  • 16.  Results Discourse Connective Recognition by A. Alsaif and K.Markert (2011) Cont… Features Acurr K Baseline (not Conn) 68.9 0 M1 Conn only 75.7 0.48 Tokenization by white space + auto tagger M2 M3 M4 Conn+ SConn+Lex Conn+ SConn+Lex+POS Conn+SConn+Lex+POS+Masdar 85.6 0.62 87.6 0.69 88.5 0.70 ATB-based features M5 M6 M7 Conn+SConn+Lex Conn+SConn+Lex+Syn/POS Conn+SConn+Lex+Syn/POS+Masdar 86.2 0.65 91.2 0.79 92.4 0.82 M8 M9 Conn+SConn+Syn SConn+Lex+Syn+Masdar 91.2 0.79 91.2 0.79 16
  • 17. Discourse Relation Recognition  To identify the type of the relation  A. Alsaif and K.Markert (2011) introduced the first algorithms to automatically identify relations for Arabic 17
  • 18. Features:  Connective features  Words and POS of arguments  Masdar  Tense and Negation  Length, Distance and Order Features  Argument Parent  Production Rules Discourse Relation Recognition by A. Alsaif and K.Markert (2011) 18
  • 19. Results Acurr kFeatures All connectives (6039) 52.5 0Baseline (CONJUNCTION) 77.2 0.60 78.7 0.66 78.3 0.65 Conn only (1) Conn+Conn f+ Arg f (37) Conn+Conn f+ Arg f+ Production rules (1237) M1 M2 M3 Excluding wa at BOP (3813) 35 0Baseline (CONJUNCTION) 74.3 0.65 77.0 0.69 76.7 0.69 Conn only (1) Conn+Conn f+ Arg f (37) Conn+Conn f+ Arg f+ Production rules (1237) M1 M2 M3 19
  • 20. Results Acurr kFeatures All connectives (6039) 62.4 0Baseline (EXPANSION ) 88.7 0.78 88.7 0.78 Conn only (1) Conn+Conn f+ Arg f (37) M1 M2 Excluding wa at BOP (3813) 41.8 0Baseline (EXPANSION) 82.7 0.74 83.5 0.75 Conn only (1) Conn+Conn f+ Arg f (37) M1 M2 20
  • 21. Semantic-Based Segmentation of Arabic Texts  Corpus Analysis  Definition: Let L be a list of candidate segments connectors, each element c in L is classified based on its effects on the text segmentation as either active or passive  Examples: .1[ ][ [ .2]][ ] [ 21
  • 22. Segmentation Process  Identifying the connectors that indicate complete segments.  Locating the active connectors.  Resolving the case where adjacent active connectors exist.  Setting the segments boundaries.  Creating the final list of segments. 22
  • 23. Discussion  evaluate the segmentation process, they collected ten essays.  Each essay ranges between 500 and 700 words.  After implementing the segmentation process.  Gave the output to judges to evaluate them in terms of two factors: correct hit and incorrect hit. 23
  • 24. Discussion Cont.. Incorrect hitCorrect hitEssay 0331 1152 0253 1234 0205 1296 1267 2338 0269 02210 24
  • 25. Arabic Discourse Segmentation Based on Rhetorical Methods  This Method is depends on the meaning of the connector " " in Arabic language.  There are six types of " " classified into two classes, "Fasl" and "Wasl " :  "Fasl " : segmenting place.  "Wasl " : unsegmenting but connecting the text. 25
  • 26. Types of Connector " " ClassExampleType Fasl Fasl Fasl Wasl Wasl Wasl 26
  • 28. Feature Extraction •The following are the features of " ": X3 = noun and X7 = accusative mark. 28
  • 29. Experiment and Results  They used 1200 instances for training.  They used 293 instances for testing after testing there are 290 correct and 3 incorrect instances.  The result with: 94.68%Recall 96.82%Precision 98.98 %Accuracy 29
  • 30. A Comprehensive Taxonomy of Arabic Discourse Coherence Relations  Coherence relations are classified into two types: explicit relations and implicit relations. exampleCoherence relations I am very happy because I got excellent marks in exams. Explicit relations I am very happy. I got excellent marks in exams. Implicit relations. 30
  • 31. The procedure of creating an Arabic Taxonomy of Coherence Relations 31
  • 32. Examples of Implicit Arabic relations  "Impossible condition / " :  "Cascaded questioning/ : ( 32
  • 33. Results  They got a set of 47 Arabic coherence relations. coherence relations.Result From English coherence relations. 31 additional Arabic explicit coherence relations. 12 Arabic implicit relations.4 33
  • 34. Conclusion Discourse Annotation is a very fertile field and it has many NLP applications, for Arabic there are some challenges due to the lack of annotated corpora and studies. 34