O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Topic Models

22.717 visualizações

Publicada em

  • Entre para ver os comentários

Topic Models

  1. 1. Topic Models Claudia Wagner Graz, 16.9.2010
  2. 2. Semantic Representation of Text <ul><li>a) Network Model (nodes and edges) </li></ul><ul><li>b) Space Model (points and proximity) </li></ul><ul><li>c) Probabilistic Models (words belong to a set of probabilistic topics) </li></ul>(Griffiths, 2007)
  3. 3. Topic Models <ul><li>= probabilistic models for uncovering the underlying semantic structure of a document collection based on a hierarchical Bayesian analysis of the original texts (Blei, 2003) </li></ul><ul><li>Aim: discover patterns of word-use and connect documents that exhibit similar patterns </li></ul><ul><li>Idea: documents are mixtures of topics and a topic is a probability distribution over words </li></ul>
  4. 4. Topic Models source: http://www.cs.umass.edu/~wallach/talks/priors.pdf
  5. 5. Topic Models Topic 1 Topic 2 3 latent variables: Word distribution per topic (word-topic-matrix) Topic distribution per doc (topic-doc-matrix) Topic word assignment (Steyvers, 2006)
  6. 6. Summary <ul><li>Observed variables: </li></ul><ul><ul><li>Word-distribution per document </li></ul></ul><ul><li>3 latent variables </li></ul><ul><ul><li>Topic distribution per document : P(z) = θ (d) </li></ul></ul><ul><ul><li>Word distribution per topic: P(w, z) = φ (z) </li></ul></ul><ul><ul><li>Word-Topic assignment: P(z|w) </li></ul></ul><ul><li>Training: Learn latent variables on trainings-collection of documents </li></ul><ul><li>Test: Predict topic distribution θ (d) of an unseen document d </li></ul>
  7. 7. Topic Models <ul><li>pLSA (Hoffmann, 1999) </li></ul><ul><li>LDA (Blei, 2003) </li></ul><ul><li>Author Model (McCallum, 1999) </li></ul><ul><li>Author-Topic Model (Rosen-Zvi, 2004) </li></ul><ul><li>Author-Recipient Topic Model (McCallum, 2004) </li></ul><ul><li>Group-Topic Model (Wang, 2005) </li></ul><ul><li>Community-Author-Recipient Topic Model (Pathak, 2009) </li></ul><ul><li>Semi-Supervised Topic Models </li></ul><ul><ul><li>Labeled LDA (Ramage, 2009) </li></ul></ul>
  8. 8. pLSA (Hoffmann, 1999) <ul><li>Problem: Not a proper generative model for new documents! </li></ul><ul><li>Why? Because we do not learn any corpus-level parameter  we learn for each doc of the trainingsset a topic-distribution </li></ul>number of documents number of words P( z | θ ) P( w | z ) Topic distribution of a document
  9. 9. Latent Dirichlet Allocation (LDA) (Blei, 2003) <ul><li>Advantage: We learn topic distribution of a corpus  we can predict topic distribution of an unseen document of this corpus by observing its words </li></ul><ul><li>Hyper-parameters α and β are corpus-level parameters  are only sampled once </li></ul>P( w | z, φ (z) ) P( φ (z) | β ) number of documents number of words
  10. 10. Dirichlet Prior α <ul><li>α is a prior on the topic-distribution of documents (of a corpus) </li></ul><ul><li>α is a corpus-level parameter (is chosen once) </li></ul><ul><li>α is a force on the topic combinations </li></ul><ul><li>Amount of smoothing determined by α </li></ul><ul><li>Higher α  more smoothing  less „distinct“ topics </li></ul><ul><li>Low α  the pressure is to pick for each document a topic distribution favoring just a few topics </li></ul><ul><li>Recommended value: α = 50/T (or less if T is very small) </li></ul>High α Low α Each doc’s topic distribution θ is a smooth mix of all topics Each doc’s topic distribution θ must favor few topics Topic-distr. of Doc1 = (1/3, 1/3, 1/3) Topic-distr. of Doc2 = (1, 0, 0) Doc1 Doc2
  11. 11. Dirichlet Prior β <ul><li>β is a prior on the word-distribution </li></ul><ul><li>β is a corpus-level parameter (is chosen once) </li></ul><ul><li>β is a force on the word combinations </li></ul><ul><li>Amount of smoothing determined by β </li></ul><ul><li>Higher β  more smoothing </li></ul><ul><li>Low β  the pressure is to pick for each topic w word distribution favoring just a few words </li></ul><ul><li>Recommended values: β = 0.01 </li></ul>High β Low β Topic-distr. of Doc1 = (1/3, 1/3, 1/3) Word-distr. of Topic2 = (1, 0, 0) Topic1 Topic2
  12. 12. Matrix Representation of LDA observed latent latent θ (d) φ (z)
  13. 13. Statistical Inference and Parameter Estimation <ul><ul><li>Key problem: </li></ul></ul><ul><ul><li>Compute posterior distribution of the hidden variables given a document </li></ul></ul><ul><ul><li>Posterior distribution is intractable for exact inference </li></ul></ul>(Blei, 2003) Latent Vars Observed Vars and Priors
  14. 14. Statistical Inference and Parameter Estimation <ul><li>How can we estimate posterior distribution of hidden variables given a corpus of trainings-documents ? </li></ul><ul><ul><li>Direct (e.g. via expectation maximization, variational inference or expectation propagation algorithms) </li></ul></ul><ul><ul><li>Indirect  i.e. estimate the posterior distribution over z (i.e. P(z)) </li></ul></ul><ul><ul><li>Gibbs sampling, a form of Markov chain Monte Carlo, is often used to estimate the posterior probability over a high-dimensional random variable z </li></ul></ul>
  15. 15. Markov Chain Example <ul><li>Random var X refers to the weather </li></ul><ul><li>X t is value of var X at time point t </li></ul><ul><li>State space of X = {sunny, rain} </li></ul><ul><li>Transition probability matrix: </li></ul><ul><ul><li>P(sunny|sunny) = 0.9 </li></ul></ul><ul><ul><li>P(sunny|rain) = 0.1 </li></ul></ul><ul><ul><li>P(rain|sunny) = 0.5 </li></ul></ul><ul><ul><li>P(rain|rain) = 0.5 </li></ul></ul><ul><li>Today ist sunny. </li></ul><ul><li>What will be the wheather tomorrow? </li></ul><ul><li>The day after tomorrow? </li></ul>source: http://en.wikipedia.org/wiki/Examples_of_Markov_chains
  16. 16. Markov Chain Example <ul><li>With increasing number of days n predictions for the weather tend towards a “steady state vector” q. </li></ul><ul><ul><li>q is independent from initial conditions </li></ul></ul><ul><ul><li>it must be unchanged when transformed by P . </li></ul></ul><ul><ul><li>This makes q an eigenvector (with eigenvalue 1), and means it can be derived from P </li></ul></ul>
  17. 17. Gibbs Sampling <ul><li>generates a sequence of samples from the joint probability distribution of two or more random variables. </li></ul><ul><li>Aim: compute posterior distribution over latent variable z </li></ul><ul><li>Pre-request: we must know the conditional probability of z </li></ul><ul><ul><li>P( z i = j | z -i , w i , d i , . ) </li></ul></ul><ul><ul><li>Why do we need to estimate P(z|w) via random walk? </li></ul></ul><ul><ul><li>z is a high-dimensional random variable </li></ul></ul><ul><ul><li>If num of topics T = 50 and num of words = 1000 </li></ul></ul><ul><ul><li>We must visit 50 1000 points and compute P(z) for all of them. </li></ul></ul>
  18. 18. Gibbs Sampling for LDA <ul><li>Random start </li></ul><ul><li>Iterative </li></ul><ul><li>For each word we compute </li></ul><ul><li>How dominante is a topic z in the doc d? How often was the topic z already used in doc d? </li></ul><ul><li>How likely is a word for a topic z? How often was the word w already assigned to topic z? </li></ul>
  19. 19. Run Gibbs Sampling Example (1) 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 <ul><ul><li>Random topic assignments </li></ul></ul><ul><ul><li>2 count-matrices: </li></ul></ul><ul><ul><li>C WT  Words per topic </li></ul></ul><ul><ul><li>C DT  Topics per document </li></ul></ul>1 2 Stream 2 2 River 1 2 Loan 6 3 bank 2 3 money topic2 topic1 topic2 topic1 4 4 doc1 4 4 doc2 4 4 doc3
  20. 20. Gibbs Sampling for LDA Probability that topic j is chosen for word w i , conditioned on all other assigned topics of words in this doc and all other observed vars. Count number of times a word token w i was assigned to a topic j across all docs Count number of times a topic j was already assigned to some word token in doc d i unnormalized! => divide the probability of assigning topic j to word wi by the sum over all topics T
  21. 21. Run Gibbs Sampling <ul><li>Start: assign each word token to a random topic </li></ul><ul><li>C WT = Count number of times a word token wi was assigned to a topic j </li></ul><ul><li>C DT = Count number of times a topic j was already assigned to some word token in doc di </li></ul><ul><li>First Iteration: </li></ul><ul><ul><li>For each word token, the count matrices C WT and C DT are first decremented by one for the entries that correspond to the current topic assignment </li></ul></ul><ul><ul><li>Then, a new topic is sampled from the current topic-distribution of a doc and the count matrices C WT and C DT are incremented with the new topic assignment. </li></ul></ul><ul><li>Each Gibbs sample consists the set of topic assignments to all N word tokens in the corpus, achieved by a single pass through all documents </li></ul>
  22. 22. Run Gibbs Sampling Example (2) 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 <ul><ul><li>First Iteration: </li></ul></ul><ul><ul><li>Decrement C DT and C WT for current topic j </li></ul></ul><ul><ul><li>Sample new topic from the current topic-distribution of a doc </li></ul></ul>3 2 2 5 3 1 2 Stream 2 2 River 1 2 Loan 6 3 bank 2 3 money topic2 topic1 topic2 topic1 4 4 doc1 4 4 doc2 4 4 doc3
  23. 23. Run Gibbs Sampling Example (2) 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 <ul><ul><li>First Iteration: </li></ul></ul><ul><ul><li>Decrement C DT and C WT for current topic j </li></ul></ul><ul><ul><li>Sample new topic from the current topic-distribution of a doc </li></ul></ul>2 4 2 5 5 6 1 2 Stream 2 2 River 1 2 Loan 6 3 bank 3 2 money topic2 topic1 topic2 topic1 5 3 doc1 4 4 doc2 4 4 doc3
  24. 24. Run Gibbs Sampling Example (3) <ul><li>α = 50/T = 25 and β = 0.01 </li></ul>“ Bank” is assigned to Topic 2 How often were all other topics used in doc d i How often was topic j used in doc d i
  25. 25. Summary: Run Gibbs Sampling <ul><li>Gibbs sampling is used to estimate topic assignment for each word of each doc </li></ul><ul><li>Factors affecting topic assignments </li></ul><ul><ul><li>How likely is a word w for a topic j? </li></ul></ul><ul><ul><li>Probability of word w under topic j </li></ul></ul><ul><ul><li>How dominante is a topic j in a doc d? </li></ul></ul><ul><ul><li>Probability that topic j has under the current topic distribution for document d </li></ul></ul><ul><li>Once many tokens of a word have been assigned to topic j (across documents), the probability of assigning any particular token of that word to topic j increases  all other topics become less likely for word w ( Explaining Away ). </li></ul><ul><li>Once a topic j has been used multiple times in one document, it will increase the probability that any word from that document will be assigned to topic j  all other documents become less likely for topic j ( Explaining Away ). </li></ul>
  26. 26. Gibbs Sampling Convergence Black = topic 1 White = topic2 <ul><li>Random Start </li></ul><ul><li>N iterations </li></ul><ul><li>Each iteration updates count-matrices </li></ul><ul><li>Convergence: </li></ul><ul><li>count-matrices stop changing </li></ul><ul><li>Gibbs samples start to approximate the target distribution (i.e., the posterior distribution over z) </li></ul>
  27. 27. Gibbs Sampling Convergence <ul><li>Ignore some number of samples at the beginning (Burn-In period) </li></ul><ul><li>Consider only every n th sample when averaging values to compute an expectation </li></ul><ul><li>Why? </li></ul><ul><ul><li>successive Gibbs-samples are not independent  they form a Markov chain with some amount of correlation </li></ul></ul><ul><ul><li>The stationary distribution of the Markov chain is the desired joint distribution over the latent variables, but it may take a while for that stationary distribution to be reached </li></ul></ul><ul><li>Techniques that may reduce autocorrelation between several latent variables are simulated annealing, collapsed Gibbs sampling or blocked Gibbs sampling; </li></ul>
  28. 28. Gibbs Sampling Parameter Estimation <ul><li>Gibbs sampling estimates posterior distribution of z. But we need word-distribution φ of each topic and topic-distribution θ of each document. </li></ul>num of times word wi was related with topic j num of times all other words were related with topic j num of times topic j was related with doc d num of times all other topics were related with doc d predictive distributions of sampling a new token of word i from topic j , predictive distributions of sampling a new token in document d from topic j
  29. 29. Author-Topic (AT)Model (Rosen-Zvi, 2004) <ul><li>Aim: discover patterns of word-use and connect authors that exhibit similar patterns </li></ul><ul><li>Idea/Intuition: Words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture </li></ul><ul><li>Each author == distribution over topics </li></ul><ul><li>Each topic == distribution over words </li></ul><ul><li>Each document with multiple authors == distribution over topics that is a mixture of the distributions associated with the authors. </li></ul>
  30. 30. AT-Model Algorithm <ul><li>Sample author </li></ul><ul><ul><li>For each doc d and each word w of that doc an author x is sampled from the doc‘s author distribution/set a d . </li></ul></ul><ul><li>Sample topic </li></ul><ul><ul><li>For each doc d and each word w of that doc a topic z is sampled from the topic distribution θ (x) of the author x which has been assigned to that word. </li></ul></ul><ul><li>Sample word </li></ul><ul><ul><li>From the word-distribution φ (z) of each sampled topic z a word w is sampled. </li></ul></ul>P( w | z, φ (z) ) P( z | x, θ (x) )
  31. 31. AT Model Latent Variables Latent Variables: 2) Author-distribution of each topic  determines which topics are used by which authors  count matrix C AT 1) Author-Topic assignment for each word 3) Word-distribution of each topic  count matrix C WT ?
  32. 32. Matrix Representation of Author-Topic-Model source: http://www.ics.uci.edu/~smyth/kddpapers/UCI_KD-D_author_topic_preprint.pdf θ (x) φ (z) a d observed observed latent latent
  33. 33. Example (1) 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 <ul><ul><li>Random topic-author assignments </li></ul></ul><ul><ul><li>2 count-matrices: </li></ul></ul><ul><ul><li>C WT  Words per topic </li></ul></ul><ul><ul><li>C AT  Authors per topic </li></ul></ul>1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 1 2 stream 2 2 river 1 2 loan 6 3 bank 2 3 money topic2 topic1 8 8 author2 topic2 topic1 0 4 author1 4 0 author3
  34. 34. Gibbs Sampling for Author-Topic-Model <ul><li>Estimate posterior distribution of 2 random variables: z and x. </li></ul><ul><li>For each word, we draw an author x i and a topic z i (OR a pair (z i ; x i ) as a block) conditioned on all other variables </li></ul><ul><li>Blocked Gibbs sampling improves convergence of the Gibbs sampler when the variables are highly dependent </li></ul>Count number of times an author k was already assigned to topic j. Count number of times a word token w i was assigned to a topic j across all docs
  35. 35. Problems of the AT Model <ul><li>AT model learns author‘s topic distribution for a document-corpus </li></ul><ul><li>But we don‘t learn topic distribution of documents </li></ul><ul><li>AT model cannot model idiosyncratic aspects of a document </li></ul>
  36. 36. AT Model with Fictitious Authors <ul><li>Add one fictitious author for each document; a d +1 </li></ul><ul><li>uniform or non-uniform distribution over authors (including the fictitious author) </li></ul><ul><li>Each word is either sampled from a real author‘s or the fictitious author‘s topic distribution. </li></ul><ul><li>i.e., we learn topic-distribution for real-authors and for fictitious „author“ (= documents). </li></ul><ul><li>Problem reported in (Hong, 2010): topic distribution of each twitter message learnt via AT-model was worse than LDA with USER schema  sparse messages and not all words of one message are used to learn document‘s topic distribution. </li></ul>
  37. 37. Predictive Power of different models (Rosen-Zvi, 2005) Experiment: Trainingsdata: 1 557 papers Testdata:183 papers (102 are single-authored papers). They choose test data documents in such a way that each author of a test set document also appears in the training set as an author.
  38. 38. Author-Recipients-Topic (ART) Model (McCallum, 2004) <ul><li>Observed Variables: </li></ul><ul><ul><li>Words per message </li></ul></ul><ul><ul><li>Authors per message </li></ul></ul><ul><ul><li>Recipients per message </li></ul></ul><ul><li>Sample for each word </li></ul><ul><ul><li>a recipient-author pair AND </li></ul></ul><ul><ul><li>a topic conditioned on the receiver-author pair‘s topic distribution θ (A,R) </li></ul></ul><ul><li>Learn 2 corpus-level variables: </li></ul><ul><ul><li>Author-recipient-pair distribution for each topic </li></ul></ul><ul><ul><li>Word-distribution for each topic </li></ul></ul><ul><li>2 count matrices: </li></ul><ul><ul><li>Pair-topic </li></ul></ul><ul><ul><li>Word-topic </li></ul></ul>, R , x P( z | x, a d , θ (A,R) ) P( w | z, φ (z) )
  39. 39. Gibbs Sampling ART-Model Random Start: Sample author-recipient pair for each word Sample topic for each word Compute for each word w i : Number of recipients of message to which word w i belongs Number of times topic t was assigned to an author-recipient-pair Number of times current word token was assigned to topic t Number of times all other topics were assigned to an author-recipient-pair Number of times all other words were assigned to topic t Number of words * beta
  40. 40. Labeled LDA (Ramage, 2009) <ul><ul><li>Word-topic assignments are drawn from a document’s topic distribution θ which is restricted to the topic distribution Λ of the labels observed in d. Topic distribution of a label l is the same as topic distribution of all documents containing label l. </li></ul></ul><ul><ul><li>The document’s labels Λ are first generate using a Bernoulli coin toss for each topic k with a labeling prior φ . </li></ul></ul><ul><ul><li>Constraining the topic model to use only those topics that correspond to a document’s (observed) label set. </li></ul></ul><ul><ul><li>Topic assignments are limited to the document’s labels </li></ul></ul><ul><ul><li>One-to-one correspondence between LDA’s latent topics and user tags/labels </li></ul></ul>
  41. 41. Group-Topic Model (Wang, 2005) <ul><li>Discovery of groups is guided by the emerging topics </li></ul><ul><li>Discovery of topics is guided by the emerging groups </li></ul><ul><li>GT-model is an extension of the blockstructure model  group-membership is conditioned on a latent variable associated with the attributes of the relation (i.e., the words)  latent variable represents the topics which have generated the words. </li></ul><ul><li>GT model discovers topics relevant to relationships between entities in the social network </li></ul>
  42. 42. Group-Topic Model (Wang, 2005) <ul><li>Generative process </li></ul><ul><ul><li>for each event (an interaction between entities) pick the topic t of the event and then generates all the words describing the event according to the topics’s word-distribution φ </li></ul></ul><ul><ul><li>for each entity s, which interacts within this event, the group assignment g is chosen conditionally from a particular multinomial (discrete) distribution θ over groups for each topic t. </li></ul></ul><ul><ul><li>For each event we have a matrix V which stores whether groups of 2 entities behaved the same or not during an event. </li></ul></ul>Number of events (=interactions between entities) Number of entities
  43. 43. CART Model (Pathak, 2008) <ul><li>Generative process </li></ul><ul><ul><li>To generate email e d a community c d is chosen uniformly at random </li></ul></ul><ul><ul><li>Based the community c d , the author a d and the set of recipients ρ d are chosen </li></ul></ul><ul><ul><li>To generate every word w (d,i) in that email, a recipient r (d,i) is chosen uniformly at random from the set of recipients ρ d </li></ul></ul><ul><ul><li>Based on the community c d , author a d and recipient r (d,i), a topic z (d,i) is chosen </li></ul></ul><ul><ul><li>The word w (d,i) itself is chosen based on the topic z (d,i) </li></ul></ul>Gibbs-sampling: alternates between updating latent communities c conditioned on other variables, and updating recipient-topic tuples (r, z) for each word conditioned on other variables.
  44. 44. Copycat Model (Dietz, 2007) <ul><li>Topics of a citing document are a “weighted sum” of documents it cites. </li></ul><ul><li>The weights of the terms capture the notion of the influence </li></ul><ul><li>Generative process </li></ul><ul><li>For each word of the citing publication d a cited publication c’ is picked from the set of all cited publications γ . </li></ul><ul><li>For each word in the citing publication d a topic is picked according to the current topic distribution which is a mix of the topic distribution of the assigned cited documents c’ . </li></ul>
  45. 45. Copycat Model (Dietz, 2007) <ul><ul><li>Example: A publication c is cited by two publications d1 and d2. </li></ul></ul><ul><ul><li>The topic mixture of c is not only about all words in the cited publication c, but also about some words in d1 and d2, which are associated </li></ul></ul><ul><ul><li>with c. </li></ul></ul><ul><ul><li>This way, the topic mixture of c is influenced by the citing publications d1 and d2! </li></ul></ul><ul><ul><li>The topic distribution of the cited document c in turn influences the association of words in d1 and d2 to c. </li></ul></ul><ul><ul><li>All tokens that are associated with a cited publication are called the topical atmosphere of a cited publication </li></ul></ul>d1 c d2 cites
  46. 46. Copycat Model (Dietz, 2007) <ul><ul><li>Bi-partite citation graph </li></ul></ul><ul><ul><ul><li>2 disjoint node sets D and C </li></ul></ul></ul><ul><ul><ul><li>D contains only nodes with outgoing citation links (the citing publications) </li></ul></ul></ul><ul><ul><ul><li>C contains nodes with incoming links (the cited publications). Documents in the original citation graph with incoming and outgoing links are represented as two nodes </li></ul></ul></ul>
  47. 47. Copycat Model (Dietz, 2007) <ul><ul><li>Problem: bidirectional interdependence of links and topics caused by the topical atmosphere </li></ul></ul><ul><ul><li>Publications originated in one research area (such as Gibbs sampling, which originated in physics) will also be associated with topics they are often cited by (such as machine learning). </li></ul></ul><ul><ul><li>Problem: enforces each word in a citing publication to be associated with a cited publication  noise </li></ul></ul>
  48. 48. Citation InfluenceModel (Dietz, 2007) <ul><li>Copycat Model enforces each word in a citing publication to be associated with a cited publication  this introduces noise </li></ul><ul><li>A citing publication may choose to draw a word’s topic from a topic mixture of a citing publication θ c (the topical atmosphere) or from it’s own topic mixture ψ d . </li></ul><ul><li>The choice is modeled by a flip of an unfair coin s. The parameter λ of the coin is learned by the model, given an asymmetric beta prior, which prefers the topic mixture θ of a cited publication. </li></ul><ul><li>The parameter λ yields an estimate for how well a publication fits to all its citations </li></ul>i nnovation topic mixture of a citing publication distribution of citation influences parameter of the coin flip, choosing to draw topics from θ or ψ
  49. 49. References <ul><li>David M. Blei, Andrew Y. Ng, Michael I. Jordan: Latent Dirichlet Allocation. Journal of Machine Learning Research 3: 993-1022 (2003). </li></ul><ul><li>Dietz, L., Bickel, S. and Scheffer, T. (2007). Unsupervised prediction of citation influences. Proc. ICML, 2007. </li></ul><ul><li>Thomas Hoffmann, Probabilistic Latent Semantic Analysis, Proc. of Uncertainty in Artificial Intelligence, UAI'99, (1999). </li></ul><ul><li>Thomas L. Griffiths,  Joshua B. Tenenbaum, Mark Steyvers, Topics in semantic representation, (2007). </li></ul><ul><li>Michal Rosen-Zvi,  Chaitanya Chemudugunta, Thomas Griffiths,  Padhraic Smyth, Irvine Mark Steyvers, Learning author-topic models from text corpora, (2010). </li></ul><ul><li>Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers and Padhraic Smyth, The author-topic model for authors and documents, In Proceedings of the 20th conference on Uncertainty in artificial intelligence (2004). </li></ul><ul><li>Andrew Mccallum, Andres Corrada-Emmanuel, Xuerui Wang, The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email, Tech-Report, (2004). </li></ul><ul><li>Nishith Pathak, Colin Delong, Arindam Banerjee, Kendrick Erickson, Social Topic Models for Community Extraction, In The 2nd SNA-KDD Workshop ’08, (2008). </li></ul><ul><li>Steyvers and Griffiths, Probabilistic Topic Models, (2006). </li></ul><ul><li>Ramage, Daniel and Hall, David and Nallapati, Ramesh and Manning, Christopher D., Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora, EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (2009) </li></ul><ul><li>Xuerui Wang, Natasha Mohanty, Andrew McCallum, Group and topic discovery from relations and text, (2005). </li></ul><ul><li>Hanna M. Wallach, David Mimno and Andrew McCallum, Rethinking LDA: Why Priors Matter (2009) </li></ul>

×