SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
“Al. I. Cuza”, University of Iasi, Romania

Faculty of Computer Science




                           Adrian Iftene, Alexandru Lucian Gînscă




       ICCCC 2012, 8-12 May, Băile Felix, Oradea, Romania
   System overview
   Data acquisition
   Topic detection
   Data processing
   Identification of opinions
   Results
   Visualization
   Conclusions



                                 ICCCC 2012, 8-12 May, Băile Felix, Oradea
ICCCC 2012, 8-12 May, Băile Felix, Oradea   3
   Scenario: Street protests in Romania (between 13
    and 26 January, 2012)

   Crawler component, RSS feeds

   Scraping: removed links, photos, menus, special
    characters

   Data locally stored



                                   ICCCC 2012, 8-12 May, Băile Felix, Oradea   4
   The topic is very important in detecting articles
    reffering to a crisis situation

   Latent Dirichlet Allocation: state of the art topic model

   Problems:
     • The number of topics needs to be specified from start
     • The results are lists of representative words for each topic resulting
       for a need for human intervention in interpreting them

   Solution: WordNet based similarity measures
     • WuPalmer
     • Lin
     • Resnik (best results)

                                             ICCCC 2012, 8-12 May, Băile Felix, Oradea   5
   Computing the similarity between 2 sets of words




T1, T2 = two sets of words.
sim(t1, t2) = one of the Wu and Palmer, Resnik or Lin similarity measures.




                                                ICCCC 2012, 8-12 May, Băile Felix, Oradea   6
   LDA results for our street protests corpus when tracking 3
    topics




                                        ICCCC 2012, 8-12 May, Băile Felix, Oradea   7
   Language specific resources that contain cities (Iasi,
    Bucuresti, Ploiesti, etc.), regions (Bucovina, Moldova,
    Transilvania, etc.) (Iftene et al., 2011)

   Introducing a more localized approach: new resources
    and rules for street (Iasi, Bulevardul Independentei,
    Bucuresti, Calea Victoriei, etc.) and smaller inner city
    regions identification (Pacurari district, center of Iasi,
    Arch of Triumph Square)
   Example of Rules: to identify streets (Street + entity,
    Boulevard + entity, etc.), to identify small regions (the
    area between street A and street B or the area of the
    building A)
                                    ICCCC 2012, 8-12 May, Băile Felix, Oradea   8
   538 files with 2,806 entities of "street" and “area”
    types

   The overall quality of NE identification component
    is around 92% and the quality of NE classification
    component is around 67%

   Problems:
    ◦ incorrect spelling
    ◦ anaphora resolution
    ◦ ambigous situations when from the context we cannot
      conclude that the NE is a person name or a street
      name

                                   ICCCC 2012, 8-12 May, Băile Felix, Oradea   9
   Rule based opinion mining system (Gînscă et al., 2011)

   Easily adaptible from a crisis scenario to another – in
    opposition with a statistical approach

   Use of manually built resources to identify opinion
    keywords (good, bad etc.), amplifiers (most, more etc.),
    diminishers (less, etc.), negation (not, never etc.)

   Calculate the valences for groups of feelings and pairing
    named entities with scores based on the distance,
    punctuation and context

   Use a dedicated vocabulary for a specific crisis situation
    with 21 initial words (protest, conflict, fight, etc.) + similar
    words from WordNet (synonyms, hypernyms, etc.)
                                       ICCCC 2012, 8-12 May, Băile Felix, Oradea   10
   Greedy approach – adding iteratively
    intermediate green points to the current path
    until solution cannot be improved

   Advantages – we reduce the search space for
    optimal routes and the Greedy solution is
    obtained very fast

   Disavantages – the Greedy solution is closed
    to the optimal solution

                              ICCCC 2012, 8-12 May, Băile Felix, Oradea   11
   Cumulated sentiment values by days
30


20


10


 0
          13   14   15   16   17   18   19     20        21       22        23       25

-10


-20


-30


-40


                                             ICCCC 2012, 8-12 May, Băile Felix, Oradea    12
   Location type entities mentions by day
250



200



150



100



 50



 0
          13   14   15   16   17   18   19       20        21       22        23         25


                                             ICCCC 2012, 8-12 May, Băile Felix, Oradea        13
   GoogleMaps API

   Our algorithm is able to find another path (longer)
    which passes near the red islands and prefers the
    ways near the green islands

   Thus, at every step is possible to insert penalties
    when the partial solution crosses red islands (with
    potential risks) and add bonuses when the partial
    solution crosses green islands (without potential
    risk)

                                  ICCCC 2012, 8-12 May, Băile Felix, Oradea   14
ICCCC 2012, 8-12 May, Băile Felix, Oradea   15
ICCCC 2012, 8-12 May, Băile Felix, Oradea   16
   When we haven’t green islands we must specify another
    method to select intermediate points in order to
    improve the quality of current solution

   If in the cases of streets and boulevards the
    GoogleMaps API is able to put these entities on the
    map, for specific squares and areas it is not able to do
    this. In such cases we built an additional resource
    which specifies the GIS coordinates for them




                                    ICCCC 2012, 8-12 May, Băile Felix, Oradea   17
   We present a system that can be easily adapted from a
    crisis situation to another (changing the dictionaries,
    changing the interest topics)

   Efficient topic identification using LDA

   Suggestive visualization using GoogleAPI




                                    ICCCC 2012, 8-12 May, Băile Felix, Oradea   18
ICCCC 2012, 8-12 May, Băile Felix, Oradea   19

Mais conteúdo relacionado

Destaque

Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRMShyaamini Balu
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities台灣資料科學年會
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Datagu wendong
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 

Destaque (8)

Chapter 11 crm
Chapter 11 crmChapter 11 crm
Chapter 11 crm
 
Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Data
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 

Mais de Faculty of Computer Science

Using Artificial Intelligence in Software Engineering
Using Artificial Intelligence in Software EngineeringUsing Artificial Intelligence in Software Engineering
Using Artificial Intelligence in Software EngineeringFaculty of Computer Science
 
Eye and Voice Control for an Augmented Reality Cooking Experience
Eye and Voice Control for an Augmented Reality Cooking ExperienceEye and Voice Control for an Augmented Reality Cooking Experience
Eye and Voice Control for an Augmented Reality Cooking ExperienceFaculty of Computer Science
 
Exploiting Social Networks. Technological Trends
Exploiting Social Networks. Technological TrendsExploiting Social Networks. Technological Trends
Exploiting Social Networks. Technological TrendsFaculty of Computer Science
 
I See You, You Can't See Me: On People's Perception About Surveillance In Po...
I See You, You Can't See Me: On People's Perception About Surveillance In Po...I See You, You Can't See Me: On People's Perception About Surveillance In Po...
I See You, You Can't See Me: On People's Perception About Surveillance In Po...Faculty of Computer Science
 
Question Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishQuestion Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishFaculty of Computer Science
 
Question Answering on Romanian, English and French Languages
Question Answering on Romanian, English and French LanguagesQuestion Answering on Romanian, English and French Languages
Question Answering on Romanian, English and French LanguagesFaculty of Computer Science
 
Recovering Diacritics using Wikipedia and Google
Recovering Diacritics using Wikipedia and GoogleRecovering Diacritics using Wikipedia and Google
Recovering Diacritics using Wikipedia and GoogleFaculty of Computer Science
 
Hypothesis Transformation and Semantic Variability Rules Used in RTE
Hypothesis Transformation and Semantic Variability Rules Used in RTEHypothesis Transformation and Semantic Variability Rules Used in RTE
Hypothesis Transformation and Semantic Variability Rules Used in RTEFaculty of Computer Science
 
Improving a Question Answering System for Romanian Using Textual Entailment
Improving a Question Answering System for Romanian Using Textual EntailmentImproving a Question Answering System for Romanian Using Textual Entailment
Improving a Question Answering System for Romanian Using Textual EntailmentFaculty of Computer Science
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentFaculty of Computer Science
 
Formalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable NetworkFormalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable NetworkFaculty of Computer Science
 

Mais de Faculty of Computer Science (19)

Using Artificial Intelligence in Software Engineering
Using Artificial Intelligence in Software EngineeringUsing Artificial Intelligence in Software Engineering
Using Artificial Intelligence in Software Engineering
 
Eye and Voice Control for an Augmented Reality Cooking Experience
Eye and Voice Control for an Augmented Reality Cooking ExperienceEye and Voice Control for an Augmented Reality Cooking Experience
Eye and Voice Control for an Augmented Reality Cooking Experience
 
Learn Chemistry with Augmented Reality
Learn Chemistry with Augmented RealityLearn Chemistry with Augmented Reality
Learn Chemistry with Augmented Reality
 
Exploiting Social Networks. Technological Trends
Exploiting Social Networks. Technological TrendsExploiting Social Networks. Technological Trends
Exploiting Social Networks. Technological Trends
 
Augmented Reality in Education
Augmented Reality in EducationAugmented Reality in Education
Augmented Reality in Education
 
Diversification in an Image Retrieval System
Diversification in an Image Retrieval SystemDiversification in an Image Retrieval System
Diversification in an Image Retrieval System
 
Augmented reality
Augmented realityAugmented reality
Augmented reality
 
I See You, You Can't See Me: On People's Perception About Surveillance In Po...
I See You, You Can't See Me: On People's Perception About Surveillance In Po...I See You, You Can't See Me: On People's Perception About Surveillance In Po...
I See You, You Can't See Me: On People's Perception About Surveillance In Po...
 
Named Entity Recognition for Romanian
Named Entity Recognition for RomanianNamed Entity Recognition for Romanian
Named Entity Recognition for Romanian
 
Question Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishQuestion Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and English
 
Identify Experts from a Domain of Interest
Identify Experts from a Domain of Interest Identify Experts from a Domain of Interest
Identify Experts from a Domain of Interest
 
Question Answering on Romanian, English and French Languages
Question Answering on Romanian, English and French LanguagesQuestion Answering on Romanian, English and French Languages
Question Answering on Romanian, English and French Languages
 
Recovering Diacritics using Wikipedia and Google
Recovering Diacritics using Wikipedia and GoogleRecovering Diacritics using Wikipedia and Google
Recovering Diacritics using Wikipedia and Google
 
UAIC Participation at RTE4
UAIC Participation at RTE4UAIC Participation at RTE4
UAIC Participation at RTE4
 
Hypothesis Transformation and Semantic Variability Rules Used in RTE
Hypothesis Transformation and Semantic Variability Rules Used in RTEHypothesis Transformation and Semantic Variability Rules Used in RTE
Hypothesis Transformation and Semantic Variability Rules Used in RTE
 
Improving a Question Answering System for Romanian Using Textual Entailment
Improving a Question Answering System for Romanian Using Textual EntailmentImproving a Question Answering System for Romanian Using Textual Entailment
Improving a Question Answering System for Romanian Using Textual Entailment
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
 
Graph Coloring using Peer-to-Peer Networks
Graph Coloring using Peer-to-Peer NetworksGraph Coloring using Peer-to-Peer Networks
Graph Coloring using Peer-to-Peer Networks
 
Formalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable NetworkFormalizing Peer-to-Peer Systems based on Content Addressable Network
Formalizing Peer-to-Peer Systems based on Content Addressable Network
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Using opinion mining techniques for early crisis detection

  • 1. “Al. I. Cuza”, University of Iasi, Romania Faculty of Computer Science Adrian Iftene, Alexandru Lucian Gînscă ICCCC 2012, 8-12 May, Băile Felix, Oradea, Romania
  • 2. System overview  Data acquisition  Topic detection  Data processing  Identification of opinions  Results  Visualization  Conclusions ICCCC 2012, 8-12 May, Băile Felix, Oradea
  • 3. ICCCC 2012, 8-12 May, Băile Felix, Oradea 3
  • 4. Scenario: Street protests in Romania (between 13 and 26 January, 2012)  Crawler component, RSS feeds  Scraping: removed links, photos, menus, special characters  Data locally stored ICCCC 2012, 8-12 May, Băile Felix, Oradea 4
  • 5. The topic is very important in detecting articles reffering to a crisis situation  Latent Dirichlet Allocation: state of the art topic model  Problems: • The number of topics needs to be specified from start • The results are lists of representative words for each topic resulting for a need for human intervention in interpreting them  Solution: WordNet based similarity measures • WuPalmer • Lin • Resnik (best results) ICCCC 2012, 8-12 May, Băile Felix, Oradea 5
  • 6. Computing the similarity between 2 sets of words T1, T2 = two sets of words. sim(t1, t2) = one of the Wu and Palmer, Resnik or Lin similarity measures. ICCCC 2012, 8-12 May, Băile Felix, Oradea 6
  • 7. LDA results for our street protests corpus when tracking 3 topics ICCCC 2012, 8-12 May, Băile Felix, Oradea 7
  • 8. Language specific resources that contain cities (Iasi, Bucuresti, Ploiesti, etc.), regions (Bucovina, Moldova, Transilvania, etc.) (Iftene et al., 2011)  Introducing a more localized approach: new resources and rules for street (Iasi, Bulevardul Independentei, Bucuresti, Calea Victoriei, etc.) and smaller inner city regions identification (Pacurari district, center of Iasi, Arch of Triumph Square)  Example of Rules: to identify streets (Street + entity, Boulevard + entity, etc.), to identify small regions (the area between street A and street B or the area of the building A) ICCCC 2012, 8-12 May, Băile Felix, Oradea 8
  • 9. 538 files with 2,806 entities of "street" and “area” types  The overall quality of NE identification component is around 92% and the quality of NE classification component is around 67%  Problems: ◦ incorrect spelling ◦ anaphora resolution ◦ ambigous situations when from the context we cannot conclude that the NE is a person name or a street name ICCCC 2012, 8-12 May, Băile Felix, Oradea 9
  • 10. Rule based opinion mining system (Gînscă et al., 2011)  Easily adaptible from a crisis scenario to another – in opposition with a statistical approach  Use of manually built resources to identify opinion keywords (good, bad etc.), amplifiers (most, more etc.), diminishers (less, etc.), negation (not, never etc.)  Calculate the valences for groups of feelings and pairing named entities with scores based on the distance, punctuation and context  Use a dedicated vocabulary for a specific crisis situation with 21 initial words (protest, conflict, fight, etc.) + similar words from WordNet (synonyms, hypernyms, etc.) ICCCC 2012, 8-12 May, Băile Felix, Oradea 10
  • 11. Greedy approach – adding iteratively intermediate green points to the current path until solution cannot be improved  Advantages – we reduce the search space for optimal routes and the Greedy solution is obtained very fast  Disavantages – the Greedy solution is closed to the optimal solution ICCCC 2012, 8-12 May, Băile Felix, Oradea 11
  • 12. Cumulated sentiment values by days 30 20 10 0 13 14 15 16 17 18 19 20 21 22 23 25 -10 -20 -30 -40 ICCCC 2012, 8-12 May, Băile Felix, Oradea 12
  • 13. Location type entities mentions by day 250 200 150 100 50 0 13 14 15 16 17 18 19 20 21 22 23 25 ICCCC 2012, 8-12 May, Băile Felix, Oradea 13
  • 14. GoogleMaps API  Our algorithm is able to find another path (longer) which passes near the red islands and prefers the ways near the green islands  Thus, at every step is possible to insert penalties when the partial solution crosses red islands (with potential risks) and add bonuses when the partial solution crosses green islands (without potential risk) ICCCC 2012, 8-12 May, Băile Felix, Oradea 14
  • 15. ICCCC 2012, 8-12 May, Băile Felix, Oradea 15
  • 16. ICCCC 2012, 8-12 May, Băile Felix, Oradea 16
  • 17. When we haven’t green islands we must specify another method to select intermediate points in order to improve the quality of current solution  If in the cases of streets and boulevards the GoogleMaps API is able to put these entities on the map, for specific squares and areas it is not able to do this. In such cases we built an additional resource which specifies the GIS coordinates for them ICCCC 2012, 8-12 May, Băile Felix, Oradea 17
  • 18. We present a system that can be easily adapted from a crisis situation to another (changing the dictionaries, changing the interest topics)  Efficient topic identification using LDA  Suggestive visualization using GoogleAPI ICCCC 2012, 8-12 May, Băile Felix, Oradea 18
  • 19. ICCCC 2012, 8-12 May, Băile Felix, Oradea 19