SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
A SURVEY OF ARABIC DISCOURSE
ANNOTATION
By:
Abeer Al-Qahtani
Afnan Al-Moadi
Nujoud Al-Ghamdi
INTRODUCTION
Arabic language discourse annotation or
segmentation have become a popular area of research.
The aim of this presentation is to survey and summarize
some techniques which used in discourse annotation and
segmentation and to show their methods and results.
CLAUSE-BASED DISCOURSE SEGMENTATION OF
ARABIC TEXTS
Discourse parsing consists in two steps:
1- discourse segmentation which aims at identifying
Elementary Discourse Units (EDU).
2- building the discourse structure by linking EDUs using a
set of rhetorical or discursive relations
Arabic language characteristics:
- An agglutinative.
- Does not have capital letters.
- Absence of diacritics.
METHODOLOGY
 Their analysis was carried out on two different corpus
genres: news articles and elementary school textbooks.
 They proposed a three steps segmentation algorithm:
 Step1: punctuation marks.
 Step2: lexical cues.
 Step3: Mixed of punctuation marks and lexical cues.
METHODOLOGY CONT.
 Step1- punctuation marks:
[ ]
[Dr. Tarak Swiden has treated various diseases.]
 Step2: lexical cues:
][][
[They will know when we start][but they don't know when
we finish]
METHODOLOGY CONT.
 Step3: Mixed of punctuation marks and lexical cues:
 If comma is followed by the conjunction " " (waw) or " " (fā)
and then by a preposition of localization) { },
it indicates the end of a segment.
Example:
.(
[Like Tunisian families, her family left Marsa city,]
[then, they found themselves at the wonderful Marsa’s beach.]
METHODOLOGY CONT.
 If comma is followed by the conjunction " " (waw) or " " (fā)
and then by a possessive noun {
}, it indicates the end of a segment.
Example:
[I saw my sister outside,] [with a talking doll]
 If a comma is followed by a demonstrative pronoun {
} and then by a word that is
not a verb, there is not a segment frontier.
Example:
[Mr. Hamed, our teacher, was standing up, looking at us.]
THE RESULT
SEMANTIC-BASED SEGMENTATION FOR ARABIC
TEXT
In this approach the aim is to divide the text into
complete meaningful parts which can exist
independently without their prefix or postfix parts .
 Connectors Classification:
 Active: words that indicate the beginning of a new
segment, the end of a segment or a complete
segment. ( – )
 Passive: words that don't indicate a new segment, an end
of a segment or a complete segment by
themselves, but when they come with active
elements, they contribute in determining the position of the
start or the end of the segments.
METHODOLOGY
 Identifying the
connectors that indicate
complete segments (with
S instances in the
SegBoundary property).
 Locating the active
connectors.
 Resolving the case where
adjacent active
connectors exist
 Setting the segments
boundaries.
 Creating the final list of
segments
THE RESULT
ARABIC DISCOURSE SEGMENTATION BASED ON
RHETORICAL METHOD
 This technique derived from Arabic Rhetorical as defined by
Arabic.
 Focuses on connector Waw “ ”.
 Categorizes the six known Rhetorical types of “ ” into tow classes:
“Fasl” and “Wasl”.
 They use SVM Machine Learning.
“Fasl”: 1,2 and 3
“Wasl”: 4,5 and 6
EXAMPLES
1Waw
[Professors teach students sciences and virtue, I swear to God, they have done a
great mission for their nation]
2Waw
[Young people are not the only ones who suffer, but their crises are part of the crises
of the whole society and someone may ask: Why have focused only on youth only
and not on the divisions of the whole society?]
3Waw
[Adolescents suffer from some psychological problems and there are, in general,
other numerous problems in the society.]
4Waw
[The teacher came smiley into the classroom.]
5Waw
[The couple sat together with the light of the moon.]
6Waw
[The study started and students and teachers enrolled in schools.]
METHODOLOGY
 Preprocessing
 Diacretization
 Discriminate the connector “ ” from the letter “ ”
 Feature Extraction
 They extract 22 features to distinguish each type of “ ”.
 Classification
FEATURE EXTRACTION
 Waw1:
 X1= “ ” and X7= genitive mark.
 X3=noun, X7= genitive mark and X16=no.
 Waw2: “ ”
 X1= “ ” and X7= accusative mark.
 X3=noun, X5= indefinite, X6≠genitive
mark and X7 = genitive mark.
 Waw3: “ ”
 X12≠X13.
 X14 ≠ X15.
 X19 ≠X20.
 X21=no and X22=no.
 Waw4: “ ”
 X16=yes.
 X1= “ ”, X10= verb and X11=past tense.
 Waw5: “ ”
 X3= noun and X7 = accusative mark.
 Waw6: “ ”
 X2=X3, X6=X7, and (X4=X5 OR X8=X9
OR X17= X18).
 X12=X13, X14=X15, X19=X20 and
(X21= yes OR X22= yes)
THE RESULT
 The Corpus of Arabic Discourse Segmentation incorporated in this
experiment.
 They use 1200 instances for training and 293 for testing.
 Class Waw5 did not appear in training and testing.
 Class Waw3 and 6 are the most appearance.
Segmentation
accuracy =
98.98%
THE LEEDS ARABIC DISCOURSE TREEBANK: ANNOTATING
DISCOURSE CONNECTIVES FOR ARABIC
 First effort toward producing an Arabic Discourse Treebank.
 Defining discourse connectives as lexical expression that relate two text
segment.
 Segments called arguments.
 Discourse relations play an important role in producing a coherent
discourse.
 Collecting Arabic Connectives:
 They using text analysis and corpus-based technique.
 Manually extracting connectives from 50 randomly selected texts from PATB and from
10 different websites.
 Resulting list was manually tested by two native speakers.
 107 discourse connectives.
CONT.
 Types Of Relations:
CONT.
 Agreement Studies:
 The Corpus: PATB
 ADA Tool & Annotating process.
After
annotating
METHODOLOGY
 Done by two independent Arabic native speakers.
 Agreement is measured on two tasks:
 Task1:
 measures whether annotators agree on the binary decision on
whether an item constitutes a discourse connective in context.
 Task2:
 measures whether annotators agree on which discourse
relation an identified connective expresses.
THE RESULT
 Agreement on TASK I is highly reliable.
 Agreement on TASK II (relation assignment) is
relatively low.
MODELLING DISCOURSE RELATIONS FOR
ARABIC.
 Discourse Connective Recognition.
 Discourse connective recognition distinguishes between
the discourse usage and non-discourse usage of
potential connectives.
 Conjunctions such as /w/and, /¯aw/or can have
discourse usage or just conjoin two non-abstract entities
as in /,mr w s¯arh/Omar and Sarah.
CONT.
 Features:
1. Surface Features (SConn).
2. Part of speech features(POS).
3. Lexical features of surrounding words (Lex). E.g.
4. Syntactic category of related phrases (Syn).
5. Al-Masdar feature:
RESULTS AND DISCUSSION
 Discourse Relation Recognition:
1. Connective features.
2. Words and POS of arguments. E.g. when the
first word of Arg2 is /qd/might/may or /k¯an/had, the
relation is likely to be EXPANSION.BACKGROUND or
EXPANSION.CONJUNCTION.
3. Tense and Negation.
4. Masdar.
5. Argument Parent.
6. Production Rules.
Performance of different models for identifying fine-
grained discourse relations on two datasets
Performance of different models for identifying
class-level discourse relations on two datasets
CONCLUSION
In this survey we presented some annotating
connectives and some segmentation techniques which
related with Arabic language and depended on different
corpora and methods. according to that , we get many
different results.
THANKS!

Mais conteúdo relacionado

Mais procurados

A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...Alexander Decker
 
SENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGION
SENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGIONSENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGION
SENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGIONijnlc
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systemsCJ Jenkins
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERijnlc
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiIJERA Editor
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONijnlc
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERcsandit
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizercsandit
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Book review of analyzing grammar an introduction
Book review of analyzing grammar  an introductionBook review of analyzing grammar  an introduction
Book review of analyzing grammar an introductionMehdi ZOUAOUI
 
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh UrduAn OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urduijtsrd
 
Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana Zhong
 
11.language input and second language acquisition
11.language input and second language acquisition11.language input and second language acquisition
11.language input and second language acquisitionAlexander Decker
 

Mais procurados (17)

A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...
 
SENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGION
SENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGIONSENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGION
SENTENCE-LEVEL DIALECTS IDENTIFICATION IN THE GREATER CHINA REGION
 
Lexical sets
Lexical setsLexical sets
Lexical sets
 
Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
 
Aw32322326
Aw32322326Aw32322326
Aw32322326
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Book review of analyzing grammar an introduction
Book review of analyzing grammar  an introductionBook review of analyzing grammar  an introduction
Book review of analyzing grammar an introduction
 
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh UrduAn OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
 
Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana's dissertation workshop 2
Ana's dissertation workshop 2
 
11.language input and second language acquisition
11.language input and second language acquisition11.language input and second language acquisition
11.language input and second language acquisition
 

Destaque

An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...CSCJournals
 
Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar...
Maximum-Length Comparison Method  Of Automatic Word Segmentation  for Myanmar...Maximum-Length Comparison Method  Of Automatic Word Segmentation  for Myanmar...
Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar...Htet Myet Lynn
 
Bengali Numeric Number Recognition
Bengali Numeric Number RecognitionBengali Numeric Number Recognition
Bengali Numeric Number RecognitionAmitava Choudhury
 
Word segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learningWord segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learningI3E Technologies
 
Holistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionHolistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionEditor IJCATR
 
A Pen Based Intelligent System for Educating Arabic Handwriting Deep Learning
A Pen Based Intelligent System  for Educating Arabic Handwriting Deep LearningA Pen Based Intelligent System  for Educating Arabic Handwriting Deep Learning
A Pen Based Intelligent System for Educating Arabic Handwriting Deep LearningMohamed Loey
 
Performance of Statistics Based Line Segmentation System for Unconstrained H...
Performance of Statistics Based Line Segmentation  System for Unconstrained H...Performance of Statistics Based Line Segmentation  System for Unconstrained H...
Performance of Statistics Based Line Segmentation System for Unconstrained H...AM Publications
 
Madina book 1 (bangla reader)
Madina book 1 (bangla reader)Madina book 1 (bangla reader)
Madina book 1 (bangla reader)Sonali Jannat
 
Madeenah Book-1 (Grammar Rules)
Madeenah Book-1 (Grammar Rules)Madeenah Book-1 (Grammar Rules)
Madeenah Book-1 (Grammar Rules)Zaffer Khan
 

Destaque (14)

An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
 
20120140502008
2012014050200820120140502008
20120140502008
 
Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar...
Maximum-Length Comparison Method  Of Automatic Word Segmentation  for Myanmar...Maximum-Length Comparison Method  Of Automatic Word Segmentation  for Myanmar...
Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar...
 
Ijetcas14 371
Ijetcas14 371Ijetcas14 371
Ijetcas14 371
 
Bengali Numeric Number Recognition
Bengali Numeric Number RecognitionBengali Numeric Number Recognition
Bengali Numeric Number Recognition
 
Word segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learningWord segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learning
 
Holistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionHolistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word Recognition
 
A Pen Based Intelligent System for Educating Arabic Handwriting Deep Learning
A Pen Based Intelligent System  for Educating Arabic Handwriting Deep LearningA Pen Based Intelligent System  for Educating Arabic Handwriting Deep Learning
A Pen Based Intelligent System for Educating Arabic Handwriting Deep Learning
 
Performance of Statistics Based Line Segmentation System for Unconstrained H...
Performance of Statistics Based Line Segmentation  System for Unconstrained H...Performance of Statistics Based Line Segmentation  System for Unconstrained H...
Performance of Statistics Based Line Segmentation System for Unconstrained H...
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
 
Madina book1-notes
Madina book1-notesMadina book1-notes
Madina book1-notes
 
Madina book 1 (bangla reader)
Madina book 1 (bangla reader)Madina book 1 (bangla reader)
Madina book 1 (bangla reader)
 
Madeenah Book-1 (Grammar Rules)
Madeenah Book-1 (Grammar Rules)Madeenah Book-1 (Grammar Rules)
Madeenah Book-1 (Grammar Rules)
 

Semelhante a Discourse annotation

FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONkevig
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...ijnlc
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parametersVelnar
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityijaia
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGkevig
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)ijnlc
 
HOW TO TEACH GRAMMAR
HOW TO TEACH GRAMMARHOW TO TEACH GRAMMAR
HOW TO TEACH GRAMMARJim DeLarge
 
Staircase of Complexity
Staircase of ComplexityStaircase of Complexity
Staircase of ComplexityTrish Huerster
 
Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...iwan_rg
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
 
Evaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxEvaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxturveycharlyn
 
TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...
TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...
TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...Pascual Pérez-Paredes
 

Semelhante a Discourse annotation (20)

FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
 
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
USING OBJECTIVE WORDS IN THE REVIEWS TO IMPROVE THE COLLOQUIAL ARABIC SENTIME...
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parameters
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
Syntax
SyntaxSyntax
Syntax
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGING
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
 
HOW TO TEACH GRAMMAR
HOW TO TEACH GRAMMARHOW TO TEACH GRAMMAR
HOW TO TEACH GRAMMAR
 
Staircase of Complexity
Staircase of ComplexityStaircase of Complexity
Staircase of Complexity
 
Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 
Evaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxEvaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docx
 
types of syllabus
types of syllabustypes of syllabus
types of syllabus
 
TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...
TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...
TALC 2008 - What do annotators annotate? An analysis of language teachers’ co...
 

Mais de Arabic_NLP_ImamU2013

Mais de Arabic_NLP_ImamU2013 (16)

Speech recognition for arabic
Speech recognition for arabicSpeech recognition for arabic
Speech recognition for arabic
 
Arabic spell checking approaches
Arabic spell checking approachesArabic spell checking approaches
Arabic spell checking approaches
 
Arabic spell checkers
Arabic spell  checkersArabic spell  checkers
Arabic spell checkers
 
Discourse annotation for arabic 3
Discourse annotation for arabic 3Discourse annotation for arabic 3
Discourse annotation for arabic 3
 
Syntactic parsing for arabic
Syntactic parsing for arabicSyntactic parsing for arabic
Syntactic parsing for arabic
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
Arabic speech recognition
Arabic speech recognitionArabic speech recognition
Arabic speech recognition
 
Discourse annotation for arabic 2
Discourse annotation for arabic 2Discourse annotation for arabic 2
Discourse annotation for arabic 2
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
Coreference recognition in arabic
Coreference recognition in arabicCoreference recognition in arabic
Coreference recognition in arabic
 
Building corpus from www for arabic
Building corpus from www for arabicBuilding corpus from www for arabic
Building corpus from www for arabic
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 
Discourse annotation for arabic
Discourse annotation for arabicDiscourse annotation for arabic
Discourse annotation for arabic
 
Automatic summaraitztion for_arabic
Automatic summaraitztion for_arabicAutomatic summaraitztion for_arabic
Automatic summaraitztion for_arabic
 

Último

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Último (20)

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 

Discourse annotation

  • 1. A SURVEY OF ARABIC DISCOURSE ANNOTATION By: Abeer Al-Qahtani Afnan Al-Moadi Nujoud Al-Ghamdi
  • 2. INTRODUCTION Arabic language discourse annotation or segmentation have become a popular area of research. The aim of this presentation is to survey and summarize some techniques which used in discourse annotation and segmentation and to show their methods and results.
  • 3. CLAUSE-BASED DISCOURSE SEGMENTATION OF ARABIC TEXTS Discourse parsing consists in two steps: 1- discourse segmentation which aims at identifying Elementary Discourse Units (EDU). 2- building the discourse structure by linking EDUs using a set of rhetorical or discursive relations Arabic language characteristics: - An agglutinative. - Does not have capital letters. - Absence of diacritics.
  • 4. METHODOLOGY  Their analysis was carried out on two different corpus genres: news articles and elementary school textbooks.  They proposed a three steps segmentation algorithm:  Step1: punctuation marks.  Step2: lexical cues.  Step3: Mixed of punctuation marks and lexical cues.
  • 5. METHODOLOGY CONT.  Step1- punctuation marks: [ ] [Dr. Tarak Swiden has treated various diseases.]  Step2: lexical cues: ][][ [They will know when we start][but they don't know when we finish]
  • 6. METHODOLOGY CONT.  Step3: Mixed of punctuation marks and lexical cues:  If comma is followed by the conjunction " " (waw) or " " (fā) and then by a preposition of localization) { }, it indicates the end of a segment. Example: .( [Like Tunisian families, her family left Marsa city,] [then, they found themselves at the wonderful Marsa’s beach.]
  • 7. METHODOLOGY CONT.  If comma is followed by the conjunction " " (waw) or " " (fā) and then by a possessive noun { }, it indicates the end of a segment. Example: [I saw my sister outside,] [with a talking doll]  If a comma is followed by a demonstrative pronoun { } and then by a word that is not a verb, there is not a segment frontier. Example: [Mr. Hamed, our teacher, was standing up, looking at us.]
  • 9. SEMANTIC-BASED SEGMENTATION FOR ARABIC TEXT In this approach the aim is to divide the text into complete meaningful parts which can exist independently without their prefix or postfix parts .  Connectors Classification:  Active: words that indicate the beginning of a new segment, the end of a segment or a complete segment. ( – )  Passive: words that don't indicate a new segment, an end of a segment or a complete segment by themselves, but when they come with active elements, they contribute in determining the position of the start or the end of the segments.
  • 10. METHODOLOGY  Identifying the connectors that indicate complete segments (with S instances in the SegBoundary property).  Locating the active connectors.  Resolving the case where adjacent active connectors exist  Setting the segments boundaries.  Creating the final list of segments
  • 12. ARABIC DISCOURSE SEGMENTATION BASED ON RHETORICAL METHOD  This technique derived from Arabic Rhetorical as defined by Arabic.  Focuses on connector Waw “ ”.  Categorizes the six known Rhetorical types of “ ” into tow classes: “Fasl” and “Wasl”.  They use SVM Machine Learning. “Fasl”: 1,2 and 3 “Wasl”: 4,5 and 6
  • 13. EXAMPLES 1Waw [Professors teach students sciences and virtue, I swear to God, they have done a great mission for their nation] 2Waw [Young people are not the only ones who suffer, but their crises are part of the crises of the whole society and someone may ask: Why have focused only on youth only and not on the divisions of the whole society?] 3Waw [Adolescents suffer from some psychological problems and there are, in general, other numerous problems in the society.] 4Waw [The teacher came smiley into the classroom.] 5Waw [The couple sat together with the light of the moon.] 6Waw [The study started and students and teachers enrolled in schools.]
  • 14. METHODOLOGY  Preprocessing  Diacretization  Discriminate the connector “ ” from the letter “ ”  Feature Extraction  They extract 22 features to distinguish each type of “ ”.  Classification
  • 15. FEATURE EXTRACTION  Waw1:  X1= “ ” and X7= genitive mark.  X3=noun, X7= genitive mark and X16=no.  Waw2: “ ”  X1= “ ” and X7= accusative mark.  X3=noun, X5= indefinite, X6≠genitive mark and X7 = genitive mark.  Waw3: “ ”  X12≠X13.  X14 ≠ X15.  X19 ≠X20.  X21=no and X22=no.  Waw4: “ ”  X16=yes.  X1= “ ”, X10= verb and X11=past tense.  Waw5: “ ”  X3= noun and X7 = accusative mark.  Waw6: “ ”  X2=X3, X6=X7, and (X4=X5 OR X8=X9 OR X17= X18).  X12=X13, X14=X15, X19=X20 and (X21= yes OR X22= yes)
  • 16. THE RESULT  The Corpus of Arabic Discourse Segmentation incorporated in this experiment.  They use 1200 instances for training and 293 for testing.  Class Waw5 did not appear in training and testing.  Class Waw3 and 6 are the most appearance. Segmentation accuracy = 98.98%
  • 17. THE LEEDS ARABIC DISCOURSE TREEBANK: ANNOTATING DISCOURSE CONNECTIVES FOR ARABIC  First effort toward producing an Arabic Discourse Treebank.  Defining discourse connectives as lexical expression that relate two text segment.  Segments called arguments.  Discourse relations play an important role in producing a coherent discourse.  Collecting Arabic Connectives:  They using text analysis and corpus-based technique.  Manually extracting connectives from 50 randomly selected texts from PATB and from 10 different websites.  Resulting list was manually tested by two native speakers.  107 discourse connectives.
  • 18. CONT.  Types Of Relations:
  • 19. CONT.  Agreement Studies:  The Corpus: PATB  ADA Tool & Annotating process. After annotating
  • 20. METHODOLOGY  Done by two independent Arabic native speakers.  Agreement is measured on two tasks:  Task1:  measures whether annotators agree on the binary decision on whether an item constitutes a discourse connective in context.  Task2:  measures whether annotators agree on which discourse relation an identified connective expresses.
  • 21. THE RESULT  Agreement on TASK I is highly reliable.  Agreement on TASK II (relation assignment) is relatively low.
  • 22. MODELLING DISCOURSE RELATIONS FOR ARABIC.  Discourse Connective Recognition.  Discourse connective recognition distinguishes between the discourse usage and non-discourse usage of potential connectives.  Conjunctions such as /w/and, /¯aw/or can have discourse usage or just conjoin two non-abstract entities as in /,mr w s¯arh/Omar and Sarah.
  • 23. CONT.  Features: 1. Surface Features (SConn). 2. Part of speech features(POS). 3. Lexical features of surrounding words (Lex). E.g. 4. Syntactic category of related phrases (Syn). 5. Al-Masdar feature:
  • 25.  Discourse Relation Recognition: 1. Connective features. 2. Words and POS of arguments. E.g. when the first word of Arg2 is /qd/might/may or /k¯an/had, the relation is likely to be EXPANSION.BACKGROUND or EXPANSION.CONJUNCTION. 3. Tense and Negation. 4. Masdar. 5. Argument Parent. 6. Production Rules.
  • 26. Performance of different models for identifying fine- grained discourse relations on two datasets Performance of different models for identifying class-level discourse relations on two datasets
  • 27. CONCLUSION In this survey we presented some annotating connectives and some segmentation techniques which related with Arabic language and depended on different corpora and methods. according to that , we get many different results.