SlideShare a Scribd company logo
1 of 12
Download to read offline
UNIBA: Super Sense Tagging at EVALITA 2011

                                           Pierpaolo Basile

                                           basilepp@di.uniba.it
                                    Department of Computer Science
                                  University of Bari “A. Moro” (ITALY)




                                   EVALITA 2011, Rome, 24-25 January 2012



Pierpaolo Basile (basilepp@di.uniba.it)         SST Uniba                24/01/2012   1 / 12
Motivation


Motivation




         Super Sense Tagging as sequence labelling problem [1]
         Supervised approach
                 lexical/linguistic features
                 distributional features
         Main motivation: tackle the data sparseness problem using word
         similarity in a WordSpace




Pierpaolo Basile (basilepp@di.uniba.it)         SST Uniba        24/01/2012   2 / 12
WordSpace


WordSpace




                You shall know a word
                by the company it
                keeps! [5]
                Words are represented as
                points in a geometric
                space
                Words are related if they
                are close in that space




Pierpaolo Basile (basilepp@di.uniba.it)         SST Uniba   24/01/2012   3 / 12
WordSpace   Random Indexing


WordSpace: Random Indexing


Random Indexing [4]
         builds WordSpace using document as context
         no matrix factorization required
         word-vectors are inferred using an incremental strategy
            1    a random vector is assigned to each context
                         sparse, high-dimensional and ternary ({-1, 0, 1})
                         a small number of randomly distributed non-zero elements
            2    random vectors are accumulated incrementally by analyzing contexts in
                 which terms occur
                         word-vector assigned to each word is the sum of the random vectors of
                         the contexts in which the term occur




Pierpaolo Basile (basilepp@di.uniba.it)         SST Uniba                      24/01/2012   4 / 12
WordSpace   Random Indexing


WordSpace: Random Indexing

Formally Random Indexing is based on Random Projection [2]

                                          An,m · R m,k = B n,k      k<m                    (1)

where An,m is, for example, a term-doc matrix




After projection the distance between points is preserved: d = c· dr

Pierpaolo Basile (basilepp@di.uniba.it)               SST Uniba               24/01/2012   5 / 12
WordSpace     Random Indexing


WordSpace: context

        Two WordSpaces using a
        different definition of context
                Wikipediap : a random                            Table: WordSpaces info
                vector is assigned to
                each Wikipedia page
                                                              WordSpace            C         D
                Wikipediac : a random
                vector is assigned to                         Wikipediap    1,617,449     4,000
                each Wikipedia                                Wikipediac       98,881     1,000
                category
                        categories can identify
                        more general concepts                     C=number of contexts
                        in the same way of                        D=vector dimension
                        super-senses



Pierpaolo Basile (basilepp@di.uniba.it)           SST Uniba                       24/01/2012   6 / 12
Methodology


Methodology



         Learning method: LIBLINEAR (SVM) [3]
         Features
            1    word, lemma, PoS-tag, the first letter of the PoS-tag
            2    the super-sense assigned to the most frequent sense of the word
                 computed according to sense frequency in MultiSemCor
            3    word starts with an upper-case character
            4    grammatical conjugation (e.g. -are, -ere and -ire for Italian verbs)
            5    distributional features: word-vector in the WordSpace




Pierpaolo Basile (basilepp@di.uniba.it)           SST Uniba                 24/01/2012   7 / 12
Evaluation


Evaluation




                                          Table: Results of the evaluation
                   System                       A             P        R        F
                   close                     0.8696        0.7485   0.7583   0.7534
                   no distr feat             0.8822        0.7728   0.7818   0.7773
                   Wikipediac                0.8877        0.7719   0.8020   0.7866
                   Wikipediap                0.8864        0.7700   0.7998   0.7846




Pierpaolo Basile (basilepp@di.uniba.it)                 SST Uniba                24/01/2012   8 / 12
Final Remarks


Final Remarks




         Main motivation: distributional features tackle data sparseness
         problem in SST task
                 increment in recall proves our idea
         Further work: try a different supervised approach more suitable for
         sequence labelling task
                 in this first attempt we are not interested in the learning method
                 performance itself




Pierpaolo Basile (basilepp@di.uniba.it)            SST Uniba              24/01/2012   9 / 12
Final Remarks




                               That’s all folks!




Pierpaolo Basile (basilepp@di.uniba.it)            SST Uniba   24/01/2012   10 / 12
Final Remarks


For Further Reading I


        Ciaramita, M., Altun, Y.: Broad-coverage sense disambiguation and
        information extraction with a supersense sequence tagger. In:
        Proceedings of the 2006 Conference on Empirical Methods in Natural
        Language Processing. pp. 594–602. Association for Computational
        Linguistics (2006)
        Dasgupta, S., Gupta, A.: An elementary proof of a theorem of
        Johnson and Lindenstrauss. Random Structures & Algorithms 22(1),
        60–65 (2003)
        Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: A library
        for large linear classification. The Journal of Machine Learning
        Research 9, 1871–1874 (2008)



Pierpaolo Basile (basilepp@di.uniba.it)            SST Uniba     24/01/2012   11 / 12
Final Remarks


For Further Reading II



        Sahlgren, M.: An introduction to random indexing. In: Methods and
        Applications of Semantic Indexing Workshop at the 7th International
        Conference on Terminology and Knowledge Engineering, TKE. vol. 5
        (2005)
        Sahlgren, M.: The Word-Space Model: Using distributional analysis to
        represent syntagmatic and paradigmatic relations between words in
        high-dimensional vector spaces. Ph.D. thesis, Stockholm: Stockholm
        University, Faculty of Humanities, Department of Linguistics (2006)




Pierpaolo Basile (basilepp@di.uniba.it)            SST Uniba   24/01/2012   12 / 12

More Related Content

Similar to Sst evalita2011 basile_pierpaolo

Semantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course EquivalenciesSemantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course EquivalenciesBeibei Yang
 
2.7.king
2.7.king2.7.king
2.7.kingafacct
 
AsiaCALL 2017 presentation
AsiaCALL 2017 presentationAsiaCALL 2017 presentation
AsiaCALL 2017 presentationTakeshi Sato
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
 
Verbal or written metadiscourse presentation-mexico
Verbal or written metadiscourse presentation-mexicoVerbal or written metadiscourse presentation-mexico
Verbal or written metadiscourse presentation-mexicoRolf K. Baltzersen
 
CLaSIC 2016 presentation
CLaSIC 2016 presentationCLaSIC 2016 presentation
CLaSIC 2016 presentationTakeshi Sato
 
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Ashish Gupta
 
Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...
Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...
Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...Antonio Lieto
 
Expanding the metadiscourse concept
Expanding the metadiscourse conceptExpanding the metadiscourse concept
Expanding the metadiscourse conceptRolf K. Baltzersen
 
Multiple representations talk, Middlesex University. February 23, 2018
Multiple representations talk, Middlesex University. February 23, 2018Multiple representations talk, Middlesex University. February 23, 2018
Multiple representations talk, Middlesex University. February 23, 2018University of Huddersfield
 
Text Structures and Thinking Maps
Text Structures and Thinking MapsText Structures and Thinking Maps
Text Structures and Thinking MapsMichael Reginald
 
CALL 2012 conference presentation
CALL 2012 conference presentationCALL 2012 conference presentation
CALL 2012 conference presentationTakeshi Sato
 
GloCALL 2013 conference presentation
GloCALL 2013 conference presentationGloCALL 2013 conference presentation
GloCALL 2013 conference presentationTakeshi Sato
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
 
Communicative-discursive models and cognitive linguistics
Communicative-discursive models and cognitive linguisticsCommunicative-discursive models and cognitive linguistics
Communicative-discursive models and cognitive linguisticsalaidarindira0202
 
Towards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic WikiTowards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic WikiChristoph Lange
 
Vocabulary Instruction in Science_Burzynski.pptx
Vocabulary Instruction in Science_Burzynski.pptxVocabulary Instruction in Science_Burzynski.pptx
Vocabulary Instruction in Science_Burzynski.pptxTcherReaQuezada
 

Similar to Sst evalita2011 basile_pierpaolo (20)

Semantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course EquivalenciesSemantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course Equivalencies
 
2.7.king
2.7.king2.7.king
2.7.king
 
AsiaCALL 2017 presentation
AsiaCALL 2017 presentationAsiaCALL 2017 presentation
AsiaCALL 2017 presentation
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
Verbal or written metadiscourse presentation-mexico
Verbal or written metadiscourse presentation-mexicoVerbal or written metadiscourse presentation-mexico
Verbal or written metadiscourse presentation-mexico
 
CLaSIC 2016 presentation
CLaSIC 2016 presentationCLaSIC 2016 presentation
CLaSIC 2016 presentation
 
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
 
esbb presentation
esbb presentationesbb presentation
esbb presentation
 
Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...
Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...
Cognitive Agents with Commonsense - Invited Talk at Istituto Italiano di Tecn...
 
Expanding the metadiscourse concept
Expanding the metadiscourse conceptExpanding the metadiscourse concept
Expanding the metadiscourse concept
 
Multiple representations talk, Middlesex University. February 23, 2018
Multiple representations talk, Middlesex University. February 23, 2018Multiple representations talk, Middlesex University. February 23, 2018
Multiple representations talk, Middlesex University. February 23, 2018
 
Text Structures and Thinking Maps
Text Structures and Thinking MapsText Structures and Thinking Maps
Text Structures and Thinking Maps
 
CALL 2012 conference presentation
CALL 2012 conference presentationCALL 2012 conference presentation
CALL 2012 conference presentation
 
Design-based Research
Design-based ResearchDesign-based Research
Design-based Research
 
GloCALL 2013 conference presentation
GloCALL 2013 conference presentationGloCALL 2013 conference presentation
GloCALL 2013 conference presentation
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
Communicative-discursive models and cognitive linguistics
Communicative-discursive models and cognitive linguisticsCommunicative-discursive models and cognitive linguistics
Communicative-discursive models and cognitive linguistics
 
Towards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic WikiTowards Scientific Collaboration in a Semantic Wiki
Towards Scientific Collaboration in a Semantic Wiki
 
Vocabulary Instruction in Science_Burzynski.pptx
Vocabulary Instruction in Science_Burzynski.pptxVocabulary Instruction in Science_Burzynski.pptx
Vocabulary Instruction in Science_Burzynski.pptx
 

More from Pierpaolo Basile

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsPierpaolo Basile
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaPierpaolo Basile
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesPierpaolo Basile
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterPierpaolo Basile
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017Pierpaolo Basile
 
Diachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramDiachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramPierpaolo Basile
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOPierpaolo Basile
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationPierpaolo Basile
 

More from Pierpaolo Basile (19)

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisions
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storia
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language games
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian Tweets
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexing
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017
 
Diachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramDiachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google Ngram
 
Diachronic Analysis
Diachronic AnalysisDiachronic Analysis
Diachronic Analysis
 
(Open) data hacking
(Open) data hacking(Open) data hacking
(Open) data hacking
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing Machine
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spaces
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional Spaces
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHO
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutation
 

Recently uploaded

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Recently uploaded (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Sst evalita2011 basile_pierpaolo

  • 1. UNIBA: Super Sense Tagging at EVALITA 2011 Pierpaolo Basile basilepp@di.uniba.it Department of Computer Science University of Bari “A. Moro” (ITALY) EVALITA 2011, Rome, 24-25 January 2012 Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 1 / 12
  • 2. Motivation Motivation Super Sense Tagging as sequence labelling problem [1] Supervised approach lexical/linguistic features distributional features Main motivation: tackle the data sparseness problem using word similarity in a WordSpace Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 2 / 12
  • 3. WordSpace WordSpace You shall know a word by the company it keeps! [5] Words are represented as points in a geometric space Words are related if they are close in that space Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 3 / 12
  • 4. WordSpace Random Indexing WordSpace: Random Indexing Random Indexing [4] builds WordSpace using document as context no matrix factorization required word-vectors are inferred using an incremental strategy 1 a random vector is assigned to each context sparse, high-dimensional and ternary ({-1, 0, 1}) a small number of randomly distributed non-zero elements 2 random vectors are accumulated incrementally by analyzing contexts in which terms occur word-vector assigned to each word is the sum of the random vectors of the contexts in which the term occur Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 4 / 12
  • 5. WordSpace Random Indexing WordSpace: Random Indexing Formally Random Indexing is based on Random Projection [2] An,m · R m,k = B n,k k<m (1) where An,m is, for example, a term-doc matrix After projection the distance between points is preserved: d = c· dr Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 5 / 12
  • 6. WordSpace Random Indexing WordSpace: context Two WordSpaces using a different definition of context Wikipediap : a random Table: WordSpaces info vector is assigned to each Wikipedia page WordSpace C D Wikipediac : a random vector is assigned to Wikipediap 1,617,449 4,000 each Wikipedia Wikipediac 98,881 1,000 category categories can identify more general concepts C=number of contexts in the same way of D=vector dimension super-senses Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 6 / 12
  • 7. Methodology Methodology Learning method: LIBLINEAR (SVM) [3] Features 1 word, lemma, PoS-tag, the first letter of the PoS-tag 2 the super-sense assigned to the most frequent sense of the word computed according to sense frequency in MultiSemCor 3 word starts with an upper-case character 4 grammatical conjugation (e.g. -are, -ere and -ire for Italian verbs) 5 distributional features: word-vector in the WordSpace Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 7 / 12
  • 8. Evaluation Evaluation Table: Results of the evaluation System A P R F close 0.8696 0.7485 0.7583 0.7534 no distr feat 0.8822 0.7728 0.7818 0.7773 Wikipediac 0.8877 0.7719 0.8020 0.7866 Wikipediap 0.8864 0.7700 0.7998 0.7846 Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 8 / 12
  • 9. Final Remarks Final Remarks Main motivation: distributional features tackle data sparseness problem in SST task increment in recall proves our idea Further work: try a different supervised approach more suitable for sequence labelling task in this first attempt we are not interested in the learning method performance itself Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 9 / 12
  • 10. Final Remarks That’s all folks! Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 10 / 12
  • 11. Final Remarks For Further Reading I Ciaramita, M., Altun, Y.: Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. pp. 594–602. Association for Computational Linguistics (2006) Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms 22(1), 60–65 (2003) Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008) Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 11 / 12
  • 12. Final Remarks For Further Reading II Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE. vol. 5 (2005) Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm: Stockholm University, Faculty of Humanities, Department of Linguistics (2006) Pierpaolo Basile (basilepp@di.uniba.it) SST Uniba 24/01/2012 12 / 12