O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Harnessing Linked Knowledge Sources for Topic Classification in Social Media

1.158 visualizações

Publicada em

Presented at Hypertext'13.
Topic classification (TC) of short text messages o↵ers an ef- fective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolu- tion). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the top- ics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detec- tion (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Harnessing Linked Knowledge Sources for Topic Classification in Social Media

  1. 1. A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, andYulan He°Knowledge Media Institute, The Open University, Milton KeynesŸ University of Sheffield, Sheffieldw Lancaster University, Lancaster° Aston University, BirminghamUK. 2013Harnessing Linked Knowledge Sources forTopic Classification in Social Media
  2. 2. INTRODUCTIONSocial Media Streams - Risk in violent and criminal activities
  3. 3. INTRODUCTIONResearch Questions:o  Can semantic features help in topic classification (TC)?o  Which knowledge source (KS) data and KS taxonomiesprovide useful information for improving the TC of tweets?
  4. 4. OUTLINE• Introduction- Topic Classification (TC) of Microposts- Related Work- State of the art limitations• Proposed Approach• Experiments• Findings• Conclusions
  5. 5. INTRODUCTIONu  Difficulties of Topic Classification of micropostso  Restricted number of characterso  Irregular and ill-formed words•  Mixing upper and lowercase letter§  Makes it difficult to detect proper nouns, and other part ofspeech tags.•  Wide variety of language§  E.g., “see u soon”o  Event-dependent emerging jargon• Volatile jargon relevant to particular events§  E.g., “Jan.25” (used during the Egyptian revolutiono  High Topical Diversityo  Sparse data
  6. 6. INTRODUCTIONSocial Knowledge Sources (KS)DBpedia* Yago2 FreebaseResources 2.35 million 447million 3.6 millionClasses 359 562,312 1,450Properties 1,820 253,213,842 7,000*Using dbpedia ontologyo  Structured Semantic Web Representation of data•  Maintained by thousand of editors§  E.g DBpedia, derived from Wikipedia§  Freebase•  Evolves and adapts as knowledge changes [Syed et al,2008]o  Cover a broad range of topicso  Characterise topics with a large number of resources
  7. 7. INTRODUCTIONLocal and External Metadata of a Tweet
  8. 8. INTRODUCTIONLocal and External Metadata of a TweetNER:CountryNER:PersonNER:Person
  9. 9. INTRODUCTIONLocal and External Metadata of a TweetNER:CountryNER:PersonNER:Person<http://dbpedia.org/resource/Barack_Obama<http://dbpedia.org/resource/Egypt<http://dbpedia.org/resource/Hosni_Mubarak
  10. 10. PROPOSED APPROACHo  State of the art limitations§  Use of single knowledge sources§  Entities’ metadata is constrained by the used NER service(e.g OpenCalais, Alchemy).o  Our approach§  Exploits multiple knowledge sources.§  Enhances the entity metadata by deriving semantic graphs.§  Leverages the graph structures surrounding entities presentin a KS for the TC task.Exploiting Knowledge Sources for the Topic Classification ofMicroposts
  11. 11. OUTLINE• Introduction• Proposed Approach• Semantic Meta-graphs• Weighting Schemas• Enhancing TC with Semantic Features• Experiments• Findings• Conclusions
  12. 12. PROPOSED APPROACHRationale…12
  13. 13. PROPOSED APPROACHRationale…12Could be more indicativeof War and Conflict
  14. 14. PROPOSED APPROACHRationale…2Not necessarily a goodindicator of War andConflict
  15. 15. PROPOSED APPROACHRationale…12Can the graph structure of existing Knowledge sources providean abstraction of the use of these entity types for representing atopic ?
  16. 16. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets1 Datasets CollectionSPARQL query for all resources from agiven Topic (e.g. War )
  17. 17. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets2 Datasets EnrichmentFrom tweets and articles’ abstracts, extractentities and link them to resources inDBpedia and Freebase.
  18. 18. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets2 Datasets EnrichmentFrom tweets and articles’ abstracts, extractentities and link them to resources inDBpedia and Freebase.
  19. 19. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets2 Datasets EnrichmentFrom tweets and articles’ abstracts, extractentities and link them to resources inDBpedia and Freebase.
  20. 20. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets3 Semantic Features Derivation
  21. 21. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets4Build a Topic Classifier based on FeaturesDerived from Crossed-Sources
  22. 22. PROPOSED APPROACHFramework for Topic Classification of TweetsConcept EnrichmentDBFBDB-FBRetrieveArticlesTWRetrieveTweetsDerive Semantic FeaturesBuild Cross-Source Topic ClassifierAnnotateTweets4Build a Topic Classifier based on FeaturesDerived from Crossed-Sources
  23. 23. PROPOSED APPROACHDeriving Semantic Meta-Graphs<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates><dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  24. 24. PROPOSED APPROACHDeriving Semantic Meta-Graphs<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates><dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  25. 25. PROPOSED APPROACHDefinition 1- Resource Meta-graphIs a sequence of tuples G:=(R,P,C,Y) where•  R, P, C are finite sets whose elements are resources,properties and classes;•  Y is a ternary relation representing ahypergraph with ternary edges.•  Y is a tripartite graph where the verticesareY ! R " P "CH Y( ) = V, DD = r, p,c{ } r, p,c( ) ! Y{ }
  26. 26. PROPOSED APPROACHResource Meta-graphThe meta-graph of entity e is the aggregation of all resources,properties and classes related to this entity.ObamabirthPlaceauthorspouseProjecting on Properties Projecting on ClassesLivingPeoplePresidentOfTheUnitedStatesObamaPersonAuthor
  27. 27. PROPOSED APPROACHResource Meta-graphThe meta-graph of entity e is the aggregation of all resources,properties and classes related to this entity.ObamabirthPlaceauthorspouseProjecting on Properties Projecting on ClassesLivingPeoplePresidentOfTheUnitedStatesObamaPersonAuthorHow can we weight these graphs to reveal semanticfeatures characterise Obama in the context ofViolence??????? ?
  28. 28. PROPOSED APPROACHWeighting Semantic FeaturesSpecificityMeasures the relative importance of a property toa given class in a KS graph GKS:p ! G e( )c ! G e( )specificityKS p,c( ) = pN R(c)( )N(R(c))
  29. 29. PROPOSED APPROACHWeighting Semantic FeaturesGeneralityCaptures the specialisation of a property p to a given class c,by computing the property’s frequency among othersemantically related classes R’(c).Where N(R’(c)) is the number of resources whose type iseither c or a specialisation of c’s parent classes.generalityKS p,c( ) =N R(c)( )pN (R(c))
  30. 30. PROPOSED APPROACHWeighting Semantic FeaturesSG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
  31. 31. PROPOSED APPROACHEnhancing Feature Space with Semantic FeaturesSemantic Augmentation (A1)Class FeaturesProperty FeaturesClass+ Property FeaturesA1!CF = F + CFA1!PF = F + pFA1!C+PF = F + cF + pF
  32. 32. PROPOSED APPROACHEnhancing Feature Space with Semantic FeaturesSemantic Augmentation (A1)Class FeaturesProperty FeaturesClass+ Property FeaturesA1!CF = F + CFA1!PF = F + pFA1!C+PF = F + cF + pFFpresident, obama, televised, statement, hosni, mubarak, resignation,cnn, says, egyptFA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/populationDensity….FA1+ CPopulatedPlace, Office_holder, PresidentOfTheUnitedStates,Politician…
  33. 33. PROPOSED APPROACHEnhancing Feature Space with Semantic FeaturesSemantic Augmentation with Generalisation (A2)This augmentation exploits the subsumption relation amongclasses within the DBpedia or Freebase ontologies. In thiscases we consider the set of parent classes of c.Parent(c) FeaturesParent(c) + Property FeaturesA2!CF = F + parent(c)FA2!C+PF = F + pF + parent(c)F
  34. 34. PROPOSED APPROACHEnhancing Feature Space with Semantic FeaturesSemantic Augmentation with Generalisation (A2)This augmentation exploits the subsumption relation amongclasses within the DBpedia or Freebase ontologies. In thiscases we consider the set of parent classes of c.Parent(c) FeaturesParent(c)+Property FeaturesA2!CF = F + parent(c)FA2!C+PF = F + pF + parent(c)FFpresident, obama, televised, statement, hosni, mubarak, resignation,cnn, says, egyptFA2+ parent(c)Place, Office_holder, President, Politician…
  35. 35. OUTLINE• Introduction• Proposed Approach• Experiments• Dataset• Baseline Features• Results• Findings• Conclusions
  36. 36. PROPOSED APPROACHDatasetso  Twitter Dataset [Abel et al., 2011] (TW)§  Collected during two months starting on Nov 2010.§  Topically annotated§  Using tweets labelled as “War & Conflict” (War),“Law & Crime” (Cri), “Disaster &Accident” (DisAcc).§  Multilabelled dataset comprising 10,189 Tweets.o  DBpedia (DB) and Freebase (FB) Dataset§  SPARQL queried endpoints for all resources fromcategories and subcategories of skos:concept of War,Cri, DisAcc.•  DBpedia – 9,465 articles•  Freebase – 16,915 articles
  37. 37. PROPOSED APPROACHDatasets
  38. 38. PROPOSED APPROACHExperimental Setup A1.  Use annotated Tweets for training (TW)-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),and Part of Speech tags (PoS).-  Enhance Features using the DBpedia and Freebasegraphs.2.  Train a SVM classifier based on the TW corpus. Trained/Tested on 80%-20% over five independent runs.3.  Compute Precision, Recall, and F-measure.
  39. 39. PROPOSED APPROACHResults for TW dataset
  40. 40. PROPOSED APPROACHExperimental Setup B1.  Use labelled articles from DBpedia (DB) and Freebase(FB) for training-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),and Part of Speech tags (PoS).-  Enhance Features using the DBpedia and Freebasegraphs.2.  Train a SVM classifier based on the DB, FB, DB+FB, DB+FB+TW training corpus and test on TW. Trained/Testedon 80%-20% over five independent runs.3.  Compute Precision, Recall, and F-measure.
  41. 41. PROPOSED APPROACHResults for Training on KS articles, and Testing on TW
  42. 42. PROPOSED APPROACHFactors contributing to the performance of a KS graph for TC1.  Topic-Class Entropy2.  Entity-Class Entropy3.  Topic-Class-Property Entropy
  43. 43. PROPOSED APPROACHCorrelating Entropy metrics with the performance of thecross-source TC classifiers.
  44. 44. PROPOSED APPROACHCorrelating Entropy metrics with the performance of thecross-source TC classifiers.Indicates that the higher the number of ambiguousentities in a topic within a KS graph, the lower theperformance of the TC.
  45. 45. FINDINGS1.  KSs combined with Twitter data provide complementaryinformation for TC of Tweets, outperforming the KSapproaches and the approach using Tweets only.2.  A KS performance on TC depends on the coverage ofthe entities within that KS.3.  When entities have low coverage in a KS, exploiting themapping between corresponding KSs’ ontologies isbeneficial.
  46. 46. CONCLUSIONS•  Explored the task of topic classification of tweets•  Exploited information in KSs (e.g. DBpedia, Freebase)using semantic graphs for concepts and propertiessurrounding an entity.•  Presented the importance of considering graphstructures in KSs for the supervised classification oftweets, by achieving significant improvement overvarious state-of-the-art approaches using both singleKSs and Tweets only.
  47. 47. CONTACT USA.  Elizabeth Cano•  http://people.kmi.open.ac.uk/cano/B.  Andrea Varga•  http://sites.google.com/site/missandreavarga/C.  Matthew Rowe•  http://lancs.ac.uk/staff/rowem/D.  Fabio Ciravegna•  http://staffwww.dcs.shef.ac.uk/people/F.CiravegnaE.  Yulan He•  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

×