SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
Addaall Arabic Search Engine: Improving Search based on Combination of
      Morphological Analysis and Generation Considering Semantic Patterns

   Mamoun Hattab*, Bassam Haddad†, Mustafa Yaseen‡, Asem Duraidi* and A. Abu Shmais*

                                             *Arabic Textware Inc., Amman-Jordan
                                        {m.hattab, a.duraidi, a.abushmais}@arabtext.ws
                            †
                              University of Petra, Department of Computer Science, Amman-Jordan
                                                      haddad@uop.edu.jo
                            ‡
                              Amman University, Department of Computer Science, Amman-Jordan
                                                myaseen@ammanu.edu.jo edu.jo



                                                               Abstract
This paper addresses some issues involved in utilization of different levels of Arabic morphological knowledge in improving the
search process in a search engine. The different levels of morphological knowledge can be considered as an incremental process
considering the next sensitive word pattern in the context of enhancing the possibility of covering a higher recall and precision.


                     1. Introduction
This paper attempts to demonstrate how the utilization of              need for a linguistic approach to handle the search process
different levels of Arabic morphological knowledge can                 (Brin S. and Page L., 1998).
improve the searching process in a Search Engine. We can               Addaall1 Arabic Search Engine built by Arabic Textware
show how combining morphological analysis and                          Inc2. utilizes a morphological analyzer and generator to
generation can be used for retrieving information based                construct different indices based on both the root and stem
on semi-semantic linguistic search in Arabic. This                     of a word. While retrieving information based on the root
technique extends the usage of roots and stems into a                  overflows the user with a complete but less relevant set of
method of categorization for morphological patterns into               results, the use of the stem based search not only retrieves
semantic groups, and uses a morphological generator to                 results in lesser numbers but also the set of results is less
produce the different inflections of a word to be retrieved            accurate.
based on the semantic distance from the word.
Google approach is primary based on the matching
process between the user’s key words and the texts under               2.      Morphological Knowledge implies
which it is indexed in terms of words (Brin S. and Page                        Syntactic and Semantic Indicators
L., 1998). Words are treated as a set of symbols, not
words with meanings to human users. And while plain                    There are much information produced by the
linguistic measures, such as stemming and structured data              morphological analysis which are useful in the next steps
search are used to enhance results, in nature Google and               of processing such as the syntactic and the semantic
such search engines are still "Symbolic Computing"                     analysis. Some of the information is:
machines.
In third generation of search engines, Natural Language                      The morphological type of a word
Processing technologies are applied in searching                             We mean by "morphological type" determining if the
extensively, because in the first place, the search is seen                  word is noun, verb, or particle; and what subtype of
as a language understanding process. This approach for                       nouns, verbs, particles it is. It is clearly noticed that
search is paradigmatically different and a level higher in                   the syntactic functionality of the word is highly
terms of the degrees of system difficulty and complexity.                    dependent on the morphological type of the word. For
And it offers more accurate and consistent search results                    example, the syntactic function of a "subject" requires
than the second generation search engines through its                        that the morphological type is a noun. The
intelligence in language understanding. In this context                      morphological type provides the semantic analysis
many researchers have considered this aspect (Abuleil and                    with the semantic properties of each word. For
Alsamra, 2004; Chen and Gey, 2002; Aljlayl M., 2002).
                                                                             example, the determining of (/‫ ,/ﻓَـﻌﹼﺎﻝ‬fa،، āl) to be an
For a language like Arabic where the structure of a word
can change according to many factors while maintaining                       exaggeration form tells that the meaning is much
the same meaning, "Symbolic Computing" does not
retrieve an accurate set of results and there is an immense            1
                                                                           http://www.addaall.com
                                                                       2
                                                                           http://www.arabtext.ws/




                                                                 159
emphasized.                                                                    root in Arabic is the part which contains the meaning;
                                                                                    it is the core of language. Anyway, some roots has
     The Determiner
     We mean by "determiner" any indeclinable noun, verb,                           syntactic properties like (/‫ )/ ﺣﺴﺐ‬which tells that the
     determiner or particle in addition to the affixes. It is                       verb needs two objects.
     clear that the variety of the syntactic functions which
     the determiners can do is wide. The proposition (/‫,/ ﰲ‬                      2.1 Morphological Levels for Search
     fī, in) provides that the word follows is a noun and
     genitive (Haddad B., 2007). Even the letters such as                        To measure the closeness between the meaning of the
                                                                                 word being searched for, and the suggested words to be
     (/‫,/ ﻭ‬    wāw) in (/‫ )/ ﻣﹸﺮﺳِﻠﻮ‬which is a pronoun, tells that               searched for, we adopted the concept of utilizing different
                                                                                 levels of morphological knowledge. The least degree of
     the whole word is sound masculine plural, which
                                                                                 relationship is the strongest between the original word and
     affects the whole sentence.
                                                                                 the alternatives.
     One of the examples of how the determiners benefits
                                                                                 In the Zero Level the same and identical word is
     the semantic analysis is the prefix (/‫/ ﺍﻝ‬l)3 which tells                   considered in the search process.
     that the word following is a definite noun.
     Furthermore the state of indefinite and dual is which                       Level One:
     can extracted from the morphology can also                                     A. For the verbs: verb inflections referring to the
     semantically be considered as a source if semantic                                  same tense. (e.g. ‫)ﺿﺮﺑﺘﻢ /ﺿﺮﺑﻮﺍ /ﺿﺮﺏﹶ‬
     knowledge such as unique quantification. Another
                                                                                     B. For the derivative nouns: its grammatical states
     example is the prefix (/ ‫ ,/ ﺱ‬s) that direct the meaning
                                                                                         (‫.)ﻗﺎﺋﻞٌ، ﻗﺎﺋﻼﹰ، ﻗﺎﺋﻞٍ، ﺍﻟﻘﺎﺋﻞُ، ﺍﻟﻘﺎﺋﻞَ، ﺍﻟﻘﺎﺋﻞ‬
     of present verb to future.
                                                                                     C. For the gerund: its grammatical states ( ،ً‫ﻗﻮﻝٌ، ﻗﻮﹾﻻ‬
     The Morphological Pattern of a word                                                 ٍ‫.)ﻗﻮﹾﻝ‬
     It is true, even not clear for every one, that the patterns
     of Arabic words include syntactic and semantic                              Level Two:
     content. The pattern (/َ‫ )/ﻓَﻌﹸﻞ‬of the past verb tells that the                A. For the verb: verb inflections referring to all
     verb is intransitive which means that the sentence is                                tenses. (e.g. ‫)...ﺗﻀﺮﺑﺎﻥ /ﺍﺿﺮﺏ /ﻳﻀﺮﺏ /ﺿﺮﺏﹶ‬
     complete with no need to have an object. An example                             B. For the Derivative noun: its grammatical and
     of the semantic profit is the meaning of request that
                                                                                          morphological states ( ٌ‫ﻗﺎﺋﻼﺕﹲ، ﻗﺎﺋﻠﲔ، /ﻗﺎﺋﻼﻥ /ﻗﺎﺋﻠﺔ ٌ /ﻗﺎﺋﻞ‬
     given by the pattern (/‫ )/ﺍﺳﺘﻔﻌﻞ‬and its derivatives.
                                                                                          ِ‫.).ﻗﺎﺋﻼﺕٍ، ﺍﻟﻘﺎﺋﻠﻮﻥ، ﺍﻟﻘﺎﺋﻼﺕﹸ، ﺍﻟﻘﺎﺋﻼﺕ‬
     The root                                                                        C. For           the         gerund:             its   grammatical           and
     The root: All roots have a semantic aspect because the                               morphological states (ِ‫.)ﻗﻮﻝٌ، ﻗﻮﹾﻻً، ﻗﻮﹾﻝٍ، ﺍﻟﻘﻮﹾﻝُ، ﺍﻟﻘﻮﹾﻝَ، ﺍﻟﻘﻮﹾﻝ‬

3                                                                                Level Three:
    Please notice that deep semantic analysis considering the
logical compositionality of determiners such as (/‫، ,/ ﺍﻝ‬l) is out of
                                                                                     A. Connecting the verb only to its gerund and vice
                                                                                          versa (e.g. ‫.)ﺍﻟﻀﺮﺏِ /ﺿﺮﹾﺏ /ﺿﺮﹶﺏﹶ‬
scope of this paper. Determiners in the logical sense (Haddad B.
2007) represent quantifiers and a special case of Generalized
Arabic Quantifier (GAQ) such as:                                                     B. Connecting the derivative noun to its
                                                                                        corresponding morphological counterparts that
⎡( /‫ , /ﻣﻌﻈﻢ-ﺍﻝ‬most-of -the)⎤                                                           share the same morphological category. For
⎢                           ⎥           ≡ λR.λS.( /‫(ﻣﻌﻈﻢ-ﺍﻝ‬x)/, most-of-
⎢CAT DETArab
⎣                           ⎥
                            ⎦     sem
                                                                                        Example connecting exaggeration pattern
the(x) ). (R, S), where |R ∩ S| > |R –S| , whereas and in                                 together such as (... ‫)/ﻗﻮﹼﺍﻝٌ / ﻣِﻘﻮﺍﻝ‬
this context (/‫، ,/ ﺍﻝ‬l) can be regarded as a numeric
                                                                                 Level Four:
quantifier:
                                                                                    A. Connecting the verb to its gerund and its
                                                                                         derivative nouns and vice versa (e.g. ‫ﻣﺴﺘﻘﻴﻞ /ﺍﺳﺘﻘﺎﻝ‬
⎢ ( /‫, /ﺍﻝ‬The)
⎡                              ⎤
                               ⎥
                                      as ⎡(/‫ , / 1 ﺍﻝ‬The1 )⎤                             / ‫.)ﻣﺴﺘﻘﺎﻝ‬
⎢                              ⎥
⎢ CAT DET                      ⎥         ⎣                 ⎦sem
                                                                where
⎢
⎢ AG ⎡ NUM              sing ⎤ ⎥
                             ⎦ ⎥sem
⎣          ⎣                   ⎦                                                     B. Connecting the gerunds and the derivative nouns
⎡(/‫ﺍﻝ‬
                                                                                         to each other ( ‫.)ﺍﺳﺘﻔﺎﺩﺓ /ﻣﺴﺘﻔﺎﺩ /ﻣﺴﺘﻔﻴﺪ‬
⎣     1   /, The1 )⎤
                   ⎦   sem
                             ≡ λP. λQ .∃x.(∀y (P(y) ⇔ x = y) ∧ Q(x))




                                                                           160
following:
    And finally there is the Root Level, representing the
    most possible search in the context of semantic                       In the same level of (/‫ ,/ ﺿَﺮﹶﺏﹶ‬hits) is the only suggested
    dependency.
                                                                          alternative.
                                                                          In next level or the first one, the suggested alternatives are
3. Enhancing Search based on different level of
          Morphological Knowledge                                         ( ‫ …ﺿَﺮﹶﺑﹾﺖﹶ /ﺿَﺮﹶﺑﹶﺎ /ﺿَﺮﹶﺑﹾﺘﻢ /ﺿَﺮﹶﺑﻮﺍ /ﺿَﺮﹶﺏﹶ‬etc.)
To enhance the quality of results by minimizing the                       The next level would suggest alternatives such as ( ‫/ﺿﺮﹶﺏﹶ‬
number of results while maintaining more accurate
relevancy and precision, a hybrid approach combining                      ‫…ﺍﻟﻀﺮﺏِ /ﺿﺮﹾﺏ‬etc.)
morphological analysis and generation was applied. In
this approach a query goes through the following                          In next level the suggested candidates are ( ‫/ﻳﻀﺮﺏ /ﺿﺮﺏﹶ‬
processes:
                                                                          ‫...ﺗﻀﺮﺑﺎﻥ /ﺍﺿﺮﺏ‬etc.)
         Morpho-Syntactic analysis to define its root/stem
         and part of speech.                                              In fourth level the suggested alternatives are ( ‫/ﺿﺎﺭﺏ /ﺿَﺮﹶﺏﹶ‬
         Morphological generation to produce the nearest                  ‫…ﻣﻀﺮﻭﺑﺘﻴﹾﻦ /ﻣﻀﺮﻭﺏ‬etc.)
         inflections to that word based on a semantic
         categorization of the different patterns that may                In last level the suggested alternatives are all the nouns
         apply to that root/stem.                                         and verbs that have the root (‫.)ﺿَﺮﹶﺏﹶ‬
When you search based on a word’s root, you are sure to
get all of the relevant results, but with other irrelevant
ones. Because although each result retrieved includes at
least one inflection of the word, this inflection might not                     4. Conclusion and Outlook
be necessarily relevant.                                                  In this paper, we have tried to stress on the importance of
This feature of Arabic language can not be controlled                     utilizing different level of morphological knowledge
without having a lexicon that stores semantic knowledge.                  towards improving the quality and quantity of the Addaall
In that case, all the words will be defined with their                    Arabic Search engine. The results are very promising as
semantic relationships to other words in the lexicon,                     the coverage and the sensitivity of this approach is
extending the possibilities to cover not only inflections                 relatively incredibly practical (see link to the Search
but synonyms and even related conceptual and ontological                  Engine http://www.addaall.com).
knowledge.                                                                 Furthermore, our approach has considered the concept of
In the case of stem search, stemming techniques for                       categorizing Arabic patterns according to their meanings,
Arabic are not very well defined until now. The definition                whereas some semi-semantic information has been useful
of a stem in Arabic is often exchanged with that of the                   in the search process. In this context, we have established
root. Many researchers even argue that if the stem has a                  a matrix correlating each pattern of speech to semantically
different meaning than a root then it does not apply to                   related pattern. The different levels of morphological
Arabic language. They consider it a foreign concept.                      knowledge can be considered as an incremental process
                                                                          considering the next sensitive word pattern; i.e. to in-
When applying stemming to an Arabic word, we mean
                                                                          crease the recall, and as a predication and heuristic proc-
that we eliminate or remove prefixes and suffixes from
                                                                          ess to increase the precision; as heuristically related word
that word. What is left is a partial word that in many
                                                                          might occurred in a near distance.
cases has possibly no meaning and does not exist in the
dictionary as it is. We then use this partial word to do                  Finally, although we understand that this approach pro-
partial search looking for words that include this part.                  duces a relative and not complete set of results, we can
                                                                          see that it in fact represents a significant improvement
The outcome of such a search has fewer results than the
                                                                          towards Arabic search engines in view of its practical and
search by root. But how significant are these in term of
                                                                          pragmatic use in real implementations.
precision and recall? That is why we believe that Arabic
stem search is neither accurate nor comprehensive.
This led us to suggest a method to improve Arabic search                                                   References
techniques by using the morphological generator. In order
to do so, we classified the morphological patterns of                     Abuleil S., Alsamra K. (2004). New Technique to Support
Arabic semantically according to their meanings. For each                   Arabic Noun Morphology: Arabic Noun Classifier
word we want to search for, the most morphologically and                    System (ANCS). International Journal of Computer
semantically related word form will be generated to be a                       Processing of Oriental Languages, Vol. 17 Issue 2.
subject of the improved search process.
                                                                          Aitao Chen and Fredric Gey (2002). Building an Arabic
An Example:
                                                                             stemmer for information retrieval. The Eleventh Text
If the user is searching for (/‫ ,/ ﺿَﺮﹶﺏﹶ‬hits) then we notice the            Retrieval Conference (TREC 2002), National Institute
                                                                             of Standards and Technology (NIST).



                                                                    161
Aljlayl M. (2002). Arabic Search: Improving the Retrieval
    Effectiveness via a Light Stemming Approach. ACM
    Eleventh Conference on Information and Knowledge
    Management.

Brin S. and Page L. (1998). The anatomy of a large
   hypertextual web search engine. Web publication,
   Stanford University.

Haddad Bassam (2007). Semantic Representation of Ara-
   bic: A Logical Approach towards Compositionality
   and Generalized Arabic Quantifiers. International
   Journal of Computer Processing of Oriental Lan-
   guages, Vol. 20. Nr. 1. 2007




                                                            162

Mais conteúdo relacionado

Mais procurados

Natural Language Generation from First-Order Expressions
Natural Language Generation from First-Order ExpressionsNatural Language Generation from First-Order Expressions
Natural Language Generation from First-Order ExpressionsThomas Mathew
 
Exploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language textExploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language texteSAT Journals
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002IJARTES
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsijnlc
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...ijnlc
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
 
Corpus-Based Vocabulary Learning in Technical English
Corpus-Based Vocabulary Learning in Technical EnglishCorpus-Based Vocabulary Learning in Technical English
Corpus-Based Vocabulary Learning in Technical EnglishCSCJournals
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...kevig
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesAlgoscale Technologies Inc.
 

Mais procurados (18)

Natural Language Generation from First-Order Expressions
Natural Language Generation from First-Order ExpressionsNatural Language Generation from First-Order Expressions
Natural Language Generation from First-Order Expressions
 
Exploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language textExploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language text
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002
 
A Proposition Bank of Urdu
A Proposition Bank of UrduA Proposition Bank of Urdu
A Proposition Bank of Urdu
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Ijetcas14 458
Ijetcas14 458Ijetcas14 458
Ijetcas14 458
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Corpus-Based Vocabulary Learning in Technical English
Corpus-Based Vocabulary Learning in Technical EnglishCorpus-Based Vocabulary Learning in Technical English
Corpus-Based Vocabulary Learning in Technical English
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 

Semelhante a Addlaall search-engine--hattab-haddad-yaseen-uop

USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityijaia
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerijnlc
 
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMDEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMkevig
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466IJRAT
 
Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Trijan Faam
 
Morphology pp
Morphology ppMorphology pp
Morphology ppMissHindA
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESLinda Garcia
 
Assignment on morphology
Assignment on morphologyAssignment on morphology
Assignment on morphologyLinda Midy
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERijnlc
 
Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...
Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...
Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...CSCJournals
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
 
Developmental reading
Developmental readingDevelopmental reading
Developmental readingreendee12
 

Semelhante a Addlaall search-engine--hattab-haddad-yaseen-uop (20)

USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
 
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEMDEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
DEVELOPING A SIMPLIFIED MORPHOLOGICAL ANALYZER FOR ARABIC PRONOMINAL SYSTEM
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
 
Chapter I Introduction
Chapter I IntroductionChapter I Introduction
Chapter I Introduction
 
Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)
 
Morphology pp
Morphology ppMorphology pp
Morphology pp
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
Gwc 2016
Gwc 2016Gwc 2016
Gwc 2016
 
Assignment on morphology
Assignment on morphologyAssignment on morphology
Assignment on morphology
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...
Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...
Deterministic Finite State Automaton of Arabic Verb System: A Morphological S...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Assessing grammar report
Assessing grammar reportAssessing grammar report
Assessing grammar report
 
Developmental reading
Developmental readingDevelopmental reading
Developmental reading
 

Addlaall search-engine--hattab-haddad-yaseen-uop

  • 1. Addaall Arabic Search Engine: Improving Search based on Combination of Morphological Analysis and Generation Considering Semantic Patterns Mamoun Hattab*, Bassam Haddad†, Mustafa Yaseen‡, Asem Duraidi* and A. Abu Shmais* *Arabic Textware Inc., Amman-Jordan {m.hattab, a.duraidi, a.abushmais}@arabtext.ws † University of Petra, Department of Computer Science, Amman-Jordan haddad@uop.edu.jo ‡ Amman University, Department of Computer Science, Amman-Jordan myaseen@ammanu.edu.jo edu.jo Abstract This paper addresses some issues involved in utilization of different levels of Arabic morphological knowledge in improving the search process in a search engine. The different levels of morphological knowledge can be considered as an incremental process considering the next sensitive word pattern in the context of enhancing the possibility of covering a higher recall and precision. 1. Introduction This paper attempts to demonstrate how the utilization of need for a linguistic approach to handle the search process different levels of Arabic morphological knowledge can (Brin S. and Page L., 1998). improve the searching process in a Search Engine. We can Addaall1 Arabic Search Engine built by Arabic Textware show how combining morphological analysis and Inc2. utilizes a morphological analyzer and generator to generation can be used for retrieving information based construct different indices based on both the root and stem on semi-semantic linguistic search in Arabic. This of a word. While retrieving information based on the root technique extends the usage of roots and stems into a overflows the user with a complete but less relevant set of method of categorization for morphological patterns into results, the use of the stem based search not only retrieves semantic groups, and uses a morphological generator to results in lesser numbers but also the set of results is less produce the different inflections of a word to be retrieved accurate. based on the semantic distance from the word. Google approach is primary based on the matching process between the user’s key words and the texts under 2. Morphological Knowledge implies which it is indexed in terms of words (Brin S. and Page Syntactic and Semantic Indicators L., 1998). Words are treated as a set of symbols, not words with meanings to human users. And while plain There are much information produced by the linguistic measures, such as stemming and structured data morphological analysis which are useful in the next steps search are used to enhance results, in nature Google and of processing such as the syntactic and the semantic such search engines are still "Symbolic Computing" analysis. Some of the information is: machines. In third generation of search engines, Natural Language The morphological type of a word Processing technologies are applied in searching We mean by "morphological type" determining if the extensively, because in the first place, the search is seen word is noun, verb, or particle; and what subtype of as a language understanding process. This approach for nouns, verbs, particles it is. It is clearly noticed that search is paradigmatically different and a level higher in the syntactic functionality of the word is highly terms of the degrees of system difficulty and complexity. dependent on the morphological type of the word. For And it offers more accurate and consistent search results example, the syntactic function of a "subject" requires than the second generation search engines through its that the morphological type is a noun. The intelligence in language understanding. In this context morphological type provides the semantic analysis many researchers have considered this aspect (Abuleil and with the semantic properties of each word. For Alsamra, 2004; Chen and Gey, 2002; Aljlayl M., 2002). example, the determining of (/‫ ,/ﻓَـﻌﹼﺎﻝ‬fa،، āl) to be an For a language like Arabic where the structure of a word can change according to many factors while maintaining exaggeration form tells that the meaning is much the same meaning, "Symbolic Computing" does not retrieve an accurate set of results and there is an immense 1 http://www.addaall.com 2 http://www.arabtext.ws/ 159
  • 2. emphasized. root in Arabic is the part which contains the meaning; it is the core of language. Anyway, some roots has The Determiner We mean by "determiner" any indeclinable noun, verb, syntactic properties like (/‫ )/ ﺣﺴﺐ‬which tells that the determiner or particle in addition to the affixes. It is verb needs two objects. clear that the variety of the syntactic functions which the determiners can do is wide. The proposition (/‫,/ ﰲ‬ 2.1 Morphological Levels for Search fī, in) provides that the word follows is a noun and genitive (Haddad B., 2007). Even the letters such as To measure the closeness between the meaning of the word being searched for, and the suggested words to be (/‫,/ ﻭ‬ wāw) in (/‫ )/ ﻣﹸﺮﺳِﻠﻮ‬which is a pronoun, tells that searched for, we adopted the concept of utilizing different levels of morphological knowledge. The least degree of the whole word is sound masculine plural, which relationship is the strongest between the original word and affects the whole sentence. the alternatives. One of the examples of how the determiners benefits In the Zero Level the same and identical word is the semantic analysis is the prefix (/‫/ ﺍﻝ‬l)3 which tells considered in the search process. that the word following is a definite noun. Furthermore the state of indefinite and dual is which Level One: can extracted from the morphology can also A. For the verbs: verb inflections referring to the semantically be considered as a source if semantic same tense. (e.g. ‫)ﺿﺮﺑﺘﻢ /ﺿﺮﺑﻮﺍ /ﺿﺮﺏﹶ‬ knowledge such as unique quantification. Another B. For the derivative nouns: its grammatical states example is the prefix (/ ‫ ,/ ﺱ‬s) that direct the meaning (‫.)ﻗﺎﺋﻞٌ، ﻗﺎﺋﻼﹰ، ﻗﺎﺋﻞٍ، ﺍﻟﻘﺎﺋﻞُ، ﺍﻟﻘﺎﺋﻞَ، ﺍﻟﻘﺎﺋﻞ‬ of present verb to future. C. For the gerund: its grammatical states ( ،ً‫ﻗﻮﻝٌ، ﻗﻮﹾﻻ‬ The Morphological Pattern of a word ٍ‫.)ﻗﻮﹾﻝ‬ It is true, even not clear for every one, that the patterns of Arabic words include syntactic and semantic Level Two: content. The pattern (/َ‫ )/ﻓَﻌﹸﻞ‬of the past verb tells that the A. For the verb: verb inflections referring to all verb is intransitive which means that the sentence is tenses. (e.g. ‫)...ﺗﻀﺮﺑﺎﻥ /ﺍﺿﺮﺏ /ﻳﻀﺮﺏ /ﺿﺮﺏﹶ‬ complete with no need to have an object. An example B. For the Derivative noun: its grammatical and of the semantic profit is the meaning of request that morphological states ( ٌ‫ﻗﺎﺋﻼﺕﹲ، ﻗﺎﺋﻠﲔ، /ﻗﺎﺋﻼﻥ /ﻗﺎﺋﻠﺔ ٌ /ﻗﺎﺋﻞ‬ given by the pattern (/‫ )/ﺍﺳﺘﻔﻌﻞ‬and its derivatives. ِ‫.).ﻗﺎﺋﻼﺕٍ، ﺍﻟﻘﺎﺋﻠﻮﻥ، ﺍﻟﻘﺎﺋﻼﺕﹸ، ﺍﻟﻘﺎﺋﻼﺕ‬ The root C. For the gerund: its grammatical and The root: All roots have a semantic aspect because the morphological states (ِ‫.)ﻗﻮﻝٌ، ﻗﻮﹾﻻً، ﻗﻮﹾﻝٍ، ﺍﻟﻘﻮﹾﻝُ، ﺍﻟﻘﻮﹾﻝَ، ﺍﻟﻘﻮﹾﻝ‬ 3 Level Three: Please notice that deep semantic analysis considering the logical compositionality of determiners such as (/‫، ,/ ﺍﻝ‬l) is out of A. Connecting the verb only to its gerund and vice versa (e.g. ‫.)ﺍﻟﻀﺮﺏِ /ﺿﺮﹾﺏ /ﺿﺮﹶﺏﹶ‬ scope of this paper. Determiners in the logical sense (Haddad B. 2007) represent quantifiers and a special case of Generalized Arabic Quantifier (GAQ) such as: B. Connecting the derivative noun to its corresponding morphological counterparts that ⎡( /‫ , /ﻣﻌﻈﻢ-ﺍﻝ‬most-of -the)⎤ share the same morphological category. For ⎢ ⎥ ≡ λR.λS.( /‫(ﻣﻌﻈﻢ-ﺍﻝ‬x)/, most-of- ⎢CAT DETArab ⎣ ⎥ ⎦ sem Example connecting exaggeration pattern the(x) ). (R, S), where |R ∩ S| > |R –S| , whereas and in together such as (... ‫)/ﻗﻮﹼﺍﻝٌ / ﻣِﻘﻮﺍﻝ‬ this context (/‫، ,/ ﺍﻝ‬l) can be regarded as a numeric Level Four: quantifier: A. Connecting the verb to its gerund and its derivative nouns and vice versa (e.g. ‫ﻣﺴﺘﻘﻴﻞ /ﺍﺳﺘﻘﺎﻝ‬ ⎢ ( /‫, /ﺍﻝ‬The) ⎡ ⎤ ⎥ as ⎡(/‫ , / 1 ﺍﻝ‬The1 )⎤ / ‫.)ﻣﺴﺘﻘﺎﻝ‬ ⎢ ⎥ ⎢ CAT DET ⎥ ⎣ ⎦sem where ⎢ ⎢ AG ⎡ NUM sing ⎤ ⎥ ⎦ ⎥sem ⎣ ⎣ ⎦ B. Connecting the gerunds and the derivative nouns ⎡(/‫ﺍﻝ‬ to each other ( ‫.)ﺍﺳﺘﻔﺎﺩﺓ /ﻣﺴﺘﻔﺎﺩ /ﻣﺴﺘﻔﻴﺪ‬ ⎣ 1 /, The1 )⎤ ⎦ sem ≡ λP. λQ .∃x.(∀y (P(y) ⇔ x = y) ∧ Q(x)) 160
  • 3. following: And finally there is the Root Level, representing the most possible search in the context of semantic In the same level of (/‫ ,/ ﺿَﺮﹶﺏﹶ‬hits) is the only suggested dependency. alternative. In next level or the first one, the suggested alternatives are 3. Enhancing Search based on different level of Morphological Knowledge ( ‫ …ﺿَﺮﹶﺑﹾﺖﹶ /ﺿَﺮﹶﺑﹶﺎ /ﺿَﺮﹶﺑﹾﺘﻢ /ﺿَﺮﹶﺑﻮﺍ /ﺿَﺮﹶﺏﹶ‬etc.) To enhance the quality of results by minimizing the The next level would suggest alternatives such as ( ‫/ﺿﺮﹶﺏﹶ‬ number of results while maintaining more accurate relevancy and precision, a hybrid approach combining ‫…ﺍﻟﻀﺮﺏِ /ﺿﺮﹾﺏ‬etc.) morphological analysis and generation was applied. In this approach a query goes through the following In next level the suggested candidates are ( ‫/ﻳﻀﺮﺏ /ﺿﺮﺏﹶ‬ processes: ‫...ﺗﻀﺮﺑﺎﻥ /ﺍﺿﺮﺏ‬etc.) Morpho-Syntactic analysis to define its root/stem and part of speech. In fourth level the suggested alternatives are ( ‫/ﺿﺎﺭﺏ /ﺿَﺮﹶﺏﹶ‬ Morphological generation to produce the nearest ‫…ﻣﻀﺮﻭﺑﺘﻴﹾﻦ /ﻣﻀﺮﻭﺏ‬etc.) inflections to that word based on a semantic categorization of the different patterns that may In last level the suggested alternatives are all the nouns apply to that root/stem. and verbs that have the root (‫.)ﺿَﺮﹶﺏﹶ‬ When you search based on a word’s root, you are sure to get all of the relevant results, but with other irrelevant ones. Because although each result retrieved includes at least one inflection of the word, this inflection might not 4. Conclusion and Outlook be necessarily relevant. In this paper, we have tried to stress on the importance of This feature of Arabic language can not be controlled utilizing different level of morphological knowledge without having a lexicon that stores semantic knowledge. towards improving the quality and quantity of the Addaall In that case, all the words will be defined with their Arabic Search engine. The results are very promising as semantic relationships to other words in the lexicon, the coverage and the sensitivity of this approach is extending the possibilities to cover not only inflections relatively incredibly practical (see link to the Search but synonyms and even related conceptual and ontological Engine http://www.addaall.com). knowledge. Furthermore, our approach has considered the concept of In the case of stem search, stemming techniques for categorizing Arabic patterns according to their meanings, Arabic are not very well defined until now. The definition whereas some semi-semantic information has been useful of a stem in Arabic is often exchanged with that of the in the search process. In this context, we have established root. Many researchers even argue that if the stem has a a matrix correlating each pattern of speech to semantically different meaning than a root then it does not apply to related pattern. The different levels of morphological Arabic language. They consider it a foreign concept. knowledge can be considered as an incremental process considering the next sensitive word pattern; i.e. to in- When applying stemming to an Arabic word, we mean crease the recall, and as a predication and heuristic proc- that we eliminate or remove prefixes and suffixes from ess to increase the precision; as heuristically related word that word. What is left is a partial word that in many might occurred in a near distance. cases has possibly no meaning and does not exist in the dictionary as it is. We then use this partial word to do Finally, although we understand that this approach pro- partial search looking for words that include this part. duces a relative and not complete set of results, we can see that it in fact represents a significant improvement The outcome of such a search has fewer results than the towards Arabic search engines in view of its practical and search by root. But how significant are these in term of pragmatic use in real implementations. precision and recall? That is why we believe that Arabic stem search is neither accurate nor comprehensive. This led us to suggest a method to improve Arabic search References techniques by using the morphological generator. In order to do so, we classified the morphological patterns of Abuleil S., Alsamra K. (2004). New Technique to Support Arabic semantically according to their meanings. For each Arabic Noun Morphology: Arabic Noun Classifier word we want to search for, the most morphologically and System (ANCS). International Journal of Computer semantically related word form will be generated to be a Processing of Oriental Languages, Vol. 17 Issue 2. subject of the improved search process. Aitao Chen and Fredric Gey (2002). Building an Arabic An Example: stemmer for information retrieval. The Eleventh Text If the user is searching for (/‫ ,/ ﺿَﺮﹶﺏﹶ‬hits) then we notice the Retrieval Conference (TREC 2002), National Institute of Standards and Technology (NIST). 161
  • 4. Aljlayl M. (2002). Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach. ACM Eleventh Conference on Information and Knowledge Management. Brin S. and Page L. (1998). The anatomy of a large hypertextual web search engine. Web publication, Stanford University. Haddad Bassam (2007). Semantic Representation of Ara- bic: A Logical Approach towards Compositionality and Generalized Arabic Quantifiers. International Journal of Computer Processing of Oriental Lan- guages, Vol. 20. Nr. 1. 2007 162