O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

All good things

17.775 visualizações

Publicada em

And out again I curve and flow
To join the gleaming seer,
For models may come and models may go,
But I go on for ever.

Publicada em: Tecnologia, Educação
  • Entre para ver os comentários

All good things

  1. 1. Natural Language Summarization of Text and Videos using Topic Models Pradipto Das PhD Dissertation Defense CSE Department, SUNY at Buffalo Rohini K. Srihari Sargur N. Srihari Aidong Zhang Professor and Committee Chair Distinguished Professor Professor and Chair CSE Dept., SUNY Buffalo CSE Dept., SUNY Buffalo CSE Dept., SUNY Buffalo Download this presentation from http://bit.ly/pdasthesispptx or http://bit.ly/pdasthesispptxpdf Primary committee members
  2. 2. Using Tag-Topic Models and Rhetorical Structure Trees to Generate Bulleted List Summaries[journal submission] The Road Ahead (modulo presenter) Discovering Voter Preferences using Mixtures of Topic Models [AND Wkshp 2009] Simultaneous Joint and Conditional Modeling of documents Tagged from Two Perspectives [CIKM 2011] A Thousand Frames in just a Few Words: Lingual Descriptions of Videos through Latent Topic Models and Sparse Object Stitching [CVPR 2013] Translating Related Words to Videos and Back through Latent Topics [WSDM 2013] Introduction to LDA Learning to Summarize using Coherence [NIPS Wkshp 2009]
  3. 3. • Stay hungry • Stay foolish The answers are coming within the next 60-75 minutes.. so.. Steve Jobs: Stanford Commencement Speech, 2005 there is great food, green tea and coffee at the back! But if you stay hungry I will happily grab the leftovers!
  4. 4. Contributions of this thesis We can explore our data, extrapolate from our data and use context to guide decisions about new information Can we find topics from a corpus without human intervention? Can we use these topics to annotate documents and use annotations to organize, summarize and search text? Well, yes, LDA does that for us! That is so 2003!  Well, can LDA model documents tagged from at least two different viewpoints or perspectives? No!  Can we do that after reading this thesis? Yes we can!  Can we generate bulleted lists from multiple documents after reading this thesis? Yes we can!  Can we go further and translate videos into text and vice versa after reading this thesis? Yes we can! Bottomline:
  5. 5. http://www.cs.princeton.edu/~blei/kdd-tutorial.pdf DavidBlei’stalkatKDD2012 DavidBlei’stalkatICML2012
  6. 6. • Unsupervised topic exploration using LDA – Full text of first 50 patents from uspto.gov using search keywords of “rocket” & full text of 50 scientific papers from American journal of Aerospace Engineering – Vocabulary size: 10102 words; Total word count: 219568 Theme 1 Theme 2 Theme 3 Theme 4 Theme 5 insulation fuel launch rocket system composition matter mission assembly fuel fiber A-B space nozzle engine system engineer system surface combustion sensor tower vehicle portion propulsion fire magnetic earth ring pump water electron orbit motor oxidizeTopic from patent documents Topic from journal papers Topic from patent documents Topic from journal papers Topic from journal papers Explore and extrapolate from context
  7. 7. Power of LDA: Language independence Topic Translation Topic Translation Topic Translation , , , , , , , Tsunami, earthquake, Chile, Pichilemu, gone, warning , news, city , , , , , , , , , flight, Air, France, Brazil, A, 447, disappear, ocean France , , , , , , , China, Olympic, Beijing, Gore, function, stadium, games Topic Translation Topic Translation Topic Translation , , xx->xx, , , , , :xx->xx Tsunami, earthquake, earthquake:x x->xx, city, local, UTC, Mayor, Tsunami:xx- >xx xx- >xx, xx->xx, xx->xx, Brazil, A, disappeared, search, flight, aircraft:xx- >xx, ocean, ship:xx->xx, air:xx->xx, air, space xx->xx, xx- >xx, xx- >xx, xx- >xx, China, Olympic, China:xx->xx, Olympic:xx- >xx, Gore:xx- >xx, Gore, gold, Beijing:xx- >xx, National TopicsoverwordsTopicsovercontrolledvocabulary
  8. 8. How does LDA look at documents? A boring view of Wikipedia
  9. 9. What about other perspectives? Words forming other Wiki articles Article specific content words Words forming section titles An exciting view of Wikipedia
  10. 10. Insulation, composition, fiber system, sensor, fire, water Fuel, matter, A-B Engineer, tower magnetic, electron Rocket, assembly, Nozzle, surface, Portion, ring, motor Launch, mission, Space, system, Vehicle, earth orbit We are identifying the landscape from within the landscape – similar to finding the map of a maze from within the maze! Fuel, matter, A-B Engineer, tower magnetic, electron Explore and extrapolate from context
  11. 11. Mostly from premier topic model research groups Year I joined UB Today! Success of LDA: a Generative Model August
  12. 12. Success of LDA • Fitting themes to an UNSEEN patent document on insulating a rocket motor using basalt fibers, nanoclay compositions etc. Theme 1 Theme 2 Theme 3 Theme 4 Theme 5 insulation fuel launch rocket system composition matter mission assembly fuel fiber A-B space nozzle engine system engineer system surface combustion sensor tower vehicle portion propulsion fire magnetic earth ring pump water electron orbit motor oxidize “What is claimed is: 1. An insulation composition comprising: a polymer comprising at least one of a nitrile butadiene rubber and polybenzimidazole fibers; basalt fibers having a diameter that is at least 5 .mu.m 2. (lots more) …” Topic from patent documents Topic from journal papers Topic from patent documents Topic from journal papers Topic from journal papers
  13. 13. K-Means Hierarchical Clustering LDA: VB LDA: Gibbs Dynamic LDA MMLDA Corr-LDA Hierarchi cal LDA Markov LDA Syntactic LDA Suffix Tree LDA TagLDA Corr- METag2LDA Corr- MMG LDA Model Complexities (modulo presenter) GMM
  14. 14. Model Complexities (modulo presenter) K-Means GMM Hierarchical Clustering LDA: VB Dynamic LDA MMLDA Corr-LDA Hierarchi cal LDA Markov LDA Syntactic LDA Suffix Tree LDA TagLDA Corr- METag2LDA Corr- MMG LDA Hair Loss LDA: Gibbs
  15. 15. Why do we want to explore? Master Yoda, how do I find wisdom from so many things happening around us? Go to the center of the data and find your wisdom you will
  16. 16. parkour perform traceur area flip footage jump park urban run outdoor outdoors kid group pedestrian playground lobster burger dress celery Christmas wrap roll mix tarragon steam season scratch stick live water lemon garlic floor parkour wall jump handrail locker contestant school run interview block slide indoor perform build tab duck make dog sandwich man outdoors guy bench black sit park white disgustingly toe cough feed rub contest parody Can you find your wisdom? Corr- MMGLDA
  17. 17. Corr- MMGLDA parkour perform traceur area flip footage jump park urban run outdoor outdoors kid group pedestrian lobster burger dress celery Christmas wrap roll mix tarragon steam season scratch stick live water lemon floor parkour wall jump handrail locker contestant school run interview block slide indoor perform build tab duck make dog sandwich man outdoors guy bench black sit park white disgustingly toe cough feed rub contest parody tutorial: man explains how to make lobster rolls from scratch One guy is making sandwich outdoors montage of guys free running up a tree and through the woods interview with parkour contestants Kid does parkour around the park Footage of group of performing parkour outdoors A family holds a strange burger assembly and wrapping contest at Christmas Actualground-truthsynopsesoverlaid Man performs parkour in various locations Are these what you were thinking?
  18. 18. 1 10 11 12 13 142 3 4 5 6 7 8 9 • No ground truth label assignments are known The Classical Partitioning Problem
  19. 19. 1 10 11 12 13 142 3 4 5 6 7 8 9 • Then, select the one with the lowest loss; for example the one shown – blue = +1, red = -1 • But we don’t really have a good way to measure loss here! Distance from or closeness to a central point The Classical Partitioning Problem
  20. 20. 1 10 11 12 13 142 3 4 5 6 7 8 9 • Then, select the one with the lowest loss; for example the one shown – blue = +1, red = -1 • But we don’t really have a good way to measure loss here! Distance from or closeness to a central point Lets sample one more point
  21. 21. The Ground Truth – Two “Topics” The seven virtues The seven vices Assume, now, that we have some vocabulary V of English words X is a set of positions and each element of X is labeled with an element from V
  22. 22. If X is a multi-set of words (set of positions), then it has an inherent structure in it: for e.g. • We no longer see: • We are used to: and #pow is in #doing Additional Partitioning: Documents The seven virtues The seven vices
  23. 23. Success behind LDA  Allocate as few topics to a document  Allocate as few words to each topic I am Nik WalLenDABalancing Act This checker board pattern has a significance – in general NP-Hard to figure out the correct pattern from limited samples even for 2 topics  The topic ALLOCATION is controlled by the parameter of a DIRICHLET distribution governing a LATENT proportion of topics over each document
  24. 24. Current Timeline Consequent Timeline Event Categories: Accidents/Natural Disasters; Attacks (Criminal/Terrorist); Health & Safety; Endangered Resources; Investigations (Criminal/Legal/Other) Previously, long long time ago
  25. 25. Centers of an utterance – Entities serving to link that utterance to other utterances in the current discourse segment Sparse Coherence Flows [BarbaraJ.Grosz,ScottWeinstein,andArvindK.Joshi.Centering:Aframeworkfor modelingthelocalcoherenceofdiscourse.InComputationalLinguistics,volume21, pages203–225,1995] a. Bob opened a new dealership last week. [Cf=Bob, dealership; Cp=Bob; Cb=undefined] b. John took a look at the Fords in his lot. [Cf=John, Fords; Cp=John; Cb=Bob] {Retain} c. He ended up buying one. i. [Cf=John; Cp=John; Cb=John] {Smooth-Shift} OR ii. [Cf=Bob; Cp=Bob; Cb=Bob] {Continue} Previously, long long time ago Centerapproximation=the(word,[Grammatical/ Semantic]role)pair(GSR)e.g.(Bob,Subject),(John, Subject),(dealership,Noun) Algorithmically By inspection For n+1 = 3 and case ii
  26. 26. Global (document/section level) focus Problems with Centering Theory a. The house appeared to have been burgled. [Cf=house ] b. The door was ajar. [ Cb=house; Cf=door, house; Cp=door] c. The furniture was in disarray. [ Cb=house; Cf=furniture, house; Cp=furniture] {?} Previously, long long time ago For n+1 = 3  Utterances like these are the majority in most free text documents [redundancy reduction]  In general, co-reference resolution is very HARD
  27. 27. An example summary sentence from folder D0906B-A of TAC2009 A timeline: • “A fourth day of thrashing thunderstorms began to take a heavier toll on southern California on Sunday with at least three deaths blamed on the rain, as flooding and mudslides forced road closures and emergency crews carried out harrowing rescue operations.” The next two contextual sentences in the document of the previous sentence are: • “In Elysian Park, just north of downtown, a 42-year-old homeless man was killed and another injured when a mudslide swept away their makeshift encampment.” • “Another man was killed on Pacific Coast Highway in Malibu when his sport utility vehicle skidded into a mud patch and plunged into the Pacific Ocean.” If the query is, “Describe the effects and responses to the heavy rainfall and mudslides in Southern California,” observe the focus of attention on mudslides as subject in the first two sentences in the table below: Sentence-GSR grid for a sample summary document slice Summarization using Coherence  Incorporating coherence this way does not necessarily lead to the final summary being coherent  Coherence is best obtained in a post processing step using the Traveling Salesman Problem
  28. 28. measure project lady tape indoor sew marker pleat highwaist zigzag scissor card mark teach cut fold stitch pin woman skirt machine fabric inside scissors make leather kilt man beltloop sew woman fabric make machine show baby traditional loom blouse outdoors blanket quick rectangle hood knit indoor stitch scissors pin cut iron studio montage measure kid penguin dad stuff thread One lady is doing sewing project indoors Woman demonstrating different stitches using a serger/sewing machine dad sewing up stuffed penguin for kids Woman makes a bordered hem skirt A pair of hands do a sewing project using a sewing machine ground-truthsynopsesoverlaid But what we really want is this
  29. 29. ground-truthsynopsesoverlaid clock mechanism repair computer tube wash machine lapse click desk mouse time front wd40 pliers reattach knob make level video water control person clip part wire inside indoor whirlpool man gear machine guy repair sew fan test make replace grease vintage motor box indoor man tutorial fuse bypass brush wrench repairman lubricate workshop bottom remove screw unscrew screwdriver video wire How to repair the water level control mechanism on a Whirlpool washing machine a man is repairing a whirlpool washer how to remove blockage from a washing machine pump Woman demonstrates replacing a door hinge on a dishwasher A guy shows how to make repairs on a microwave How to fix a broken agitator on a Whirlpool washing machine A guy working on a vintage box fan And this
  30. 30. And this
  31. 31. And this
  32. 32. Roadmap Introduction to LDA Discovering Voter Preferences Using Mixtures of Topic Models [AND’09 Oral] Learning to Summarize Using Coherence [NIPS 09 Poster] Core NLP including summarization, information extraction, unsupervised grammar induction, dependency parsing, rhetorical parsing, sentiment and polarity analysis… Non-parametric Bayes Applied StatisticsExit 2 Exit 1 Uncharted territory – proceed at your own risk
  33. 33. Why When Who Where TagLDA: More Observed Constraints Domain knowledge Topic distribution over words Annotation/ Tag distribution over words Is there a model which can take additional clues and attempt to correct the misclassifications?
  34. 34. Why When Who Where Domain knowledge Incorporating Prior Knowledge Topic distribution over words but conditioned over tags Number of parameters = (K+T)V TagLDA switches to this view for partial normalization of some weights - x5 and x10 are annotated with the orange label and x5 co-occurs with x9 both in documents d1 and d2 - It is thus likely that x5, x9 and x10 belong to the same class since both d1 and d2 should contain as few topics
  35. 35. Why When Who Where Domain knowledge Incorporating Prior Knowledge LDA TagLDA
  36. 36. Incorporating Prior Knowledge With Additional Perspectives Why When Who Where Domain knowledge LDA TagLDA LDA
  37. 37. Words indicative of important Wiki concepts Actual human generated Wiki category tags – words that summarize/ categorize the document Wikipedia Ubiquitous Bi-Perspective Document Structure
  38. 38. Words indicative of questions Actual tags for the forum post – even frequencies are available! Words indicative of answers StackOverflow Ubiquitous Bi-Perspective Document Structure
  39. 39. Words indicative of document title Actual tags given by users Words indicative of image description Yahoo! Flickr Ubiquitous Bi-Perspective Document Structure
  40. 40. News Article What if the documents are plain text files? Understanding the Two Perspectives
  41. 41. It is believed US investigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the US group and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland. Dorothea Holland, until four months ago was the only prosecuting lawyer on the German case. News Article Imagine browsing over many reports on an event Understanding the Two Perspectives
  42. 42. It is believed USinvestigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the USgroup and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland. Dorothea Holland, until four months ago was the only prosecutinglawyer on the German case.News Article The “document level” perspective What words can we remember after a first browse? German, US, investigations, GM, Dorothea Holland, Lopez, prosecute Understanding the Two Perspectives
  43. 43. Important Verbs and Dependents Named Entities What helped us remember? ORGANIZATION It is believed US investigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the US group and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland. Dorothea Holland, until four months ago was the only prosecuting lawyer on the German case. News Article LOCATION MISC PERSON WHAT HAPPENED? The “word level” perspective The “document level” perspective German, US, investigations, GM, Dorothea Holland, Lopez, prosecute Understanding the Two Perspectives
  44. 44. Summarization power of the perspectives It is believed US investigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the US group and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland Dorothea Holland, until four months ago was the only prosecuting lawyer on the German case. German, US, investigations, GM, Dorothea Holland, Lopez, prosecute Sentence Boundaries What if we turn the document off?BeginMiddleEnd
  45. 45. A young man climbs an artificial rock wall indoors Adjective modifier (What kind of wall?) Direct Object Direct Subject Adverb modifier (climbing where?) Major Topic: Rock climbing Sub-topics: artificial rock wall, indoor rock climbing gym And as if that wasn’t enough!
  46. 46. Categories: Weather hazards to aircraft | Accidents involving fog | Snow or ice weather phenomena | Fog | Psychrometrics Labeled by human editors BeginningMiddleEnd A Wikipedia Article on “fog”
  47. 47.  Take the first category label – “weather hazards to aircraft”  “aircraft” doesn’t occur in the document body!  “hazard” only appears in a section title read as “Visibility hazards”  “Weather” appears only 6 out of 15 times in the main body  However, the images suggest that fog is related to concepts like fog over the Golden Gate bridge, fog in streets, poor visibility and quality of air Wiki categories: Abstract or specific? Labeled by a Tag2LDA model from title and image captions Categories: Weather hazards to aircraft | Accidents involving fog | Snow or ice weather phenomena | Fog | Psychrometrics Labeled by human editors Categories: fog, San Francisco, visible, high, temperature, streets, Bay, lake, California, bridge, air
  48. 48. • How do we model such a document collection?
  49. 49. METag2LDA Corr-METag2LDAMMLDA CorrMMLDATagLDA Combines TagLDA and MMLDA Combines TagLDA and Corr- MMLDA MM = Multinomial + Multinomial; ME = Multinomial + Exponential Made Possible with Tag2LDA Models E- Harmony!
  50. 50. Topic ALLOCATION is controlled by the parameter of a DIRICHLET distribution governing a LATENT proportion of topics over each document I am Nik WalLenDA Bi-Perspective Topic Model – METag2LDA And this balancing act got a whole lot tougher
  51. 51. Exponential State Space Bayes Ball
  52. 52. Constructing Variational Dual
  53. 53. Mean Field Distributions
  54. 54. Mean Field Distributions
  55. 55. Mean Field Distributions Hmmm… a smudge… wipe.. wipe.. wipe.. 2 plates, 2 arrows, 4 circles… no smudges… even and nice!
  56. 56. Mixture Model: Real valued data
  57. 57. y x Mixture Model: Real valued data
  58. 58. Mean Parameters
  59. 59. Mean Field Optimization Empirical mean p belongs to exponential family by MaxEnt
  60. 60. Forward Mapping Backward Mapping Mean Field Optimization Sufficient statistics
  61. 61. Mean Field Optimization Very similar to finding the basic feasible solution (BFS) in linear programming • Start with pivot at the origin (only slack variables as solution) • Cycle the pivot through the extreme points i.e. replace slacks in BFS until solution is found
  62. 62. Mean Field Optimization However, mean field optimization space is inherently non-convex over the set of tractable distributions due to the delta functions which match the extreme points of the convex hull of sufficient statistics of the original discrete distributions
  63. 63. ELBO: Evidence Lower BOund
  64. 64. Mean Field Inference
  65. 65. Mean Field Inference
  66. 66. Mean Field Inference ELBO
  67. 67. Topics conditioned on different section identifiers (WL tag categories) Topic Marginals Topics over image captions Correspondence of DL tag words with content words Topic Labeling Faceted Bi-Perspective Document Organization All of the inference machinery *is needed* to generate exploratory outputs like this!
  68. 68. • METag2LDA: A topic generating all DL tags in a document does not necessarily mean that the same topic generates all words in the document • Corr-METag2LDA: A topic generating *all* DL tags in a document does mean that the same topic generates all words in the document - a considerable strongpoint Topic concentration parameter Document specific topic proportions Document content words Document Level (DL) tags Word Level (WL) tags Indicator variables Topic Parameters Tag Parameters CorrME- Tag2LDA METag2LDA The Family of Tag2LDA Models
  69. 69. Experiments  Wikipedia articles with images and captions manually collected along {food, animal, countries, sport, war, transportation, nature, weapon, universe and ethnic groups} concepts  Annotations/tags used:  DL Tags – image caption words and the article titles  WL Annotations – Positions of sections binned into 5 bins  Objective: to generate category labels for test documents  Evaluation – ELBO: to see performance among various TagLDA models – WordNet based similarity evaluation between actual category labels and proxies for them from caption words
  70. 70. Held-out ELBO Selected Wikipedia Articles  WL annotations – Section positions in the document  DL tags – image caption words and article titles  TagLDA perplexity is comparable to MM(METag2)LDA  The (image caption words + article titles) and the content words are independently discriminative enough  Corr-MM(METag2)LDA performs best since almost all image caption words and the article title for a Wikipedia document are about a specific topic 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 K=20 K=50 K=100 K=200Millions MMLDA TagLDA corrLDA METag2LDA corrMETag2LDA
  71. 71. 0 0.5 1 1.5 2 40 60 80 100 Millions MMLDA METag2LDA corrLDA corrMETag2LDA TagLDA Held-out ELBO DUC05 Newswire Dataset (Recent Experiments with TagLDA Included)  WL annotations – Named Entities  DL tags – abstract coherence tuples like (subject, object) e.g. “Mary(Subject) taught the class. Everybody liked Mary(Object).” *Ignoring coref resolution]  Abstract markers like (“subj” “obj”) acting as DL perspective are not document discriminative or even topical markers  Rather they indicate a semantic perspective of coherence which is intricately linked to words  By ignoring the DL perspective completely leads to better fit by TagLDA due to variations in word distributions only 1.35 1.4 1.45 1.5 1.55 1.6 1.65 40 60 80 100 Millions MMLDA METag2LDA corrLDA corrMETag2LDA
  72. 72. Are Categories more abstract or specific? Inverse Hop distance in WordNet ontology  Top 5 words from the caption vocabulary are chosen  Max Weighted Average = 5, Max Best = 1  METag2LDA almost always wins by narrow margins  METag2LDA reweights the vocabulary of caption words and article titles that are about a topic and hence may miss specializations relevant to document within the top (5) ones  In WordNet ontology, specializations lead to more hop distance  Ontology based scoring helps explain connections to caption words to ground truths e.g. Skateboard skate glide snowboard 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 K=20 K=50 K=100 K=200 METag2LDA- AverageDistance corrMETag2LDA- AverageDistance METag2LDA- BestDistance corrMETag2LDA- BestDistance
  73. 73. • Applications – Document classification using reduced dimensions – Find faceted topics automatically through word level tags – Learn correspondences between perspectives – Label topics through document level multimedia – Create recommendations based on perspectives – Video analysis: word prediction given video features – Tying “multilingual comparable corpora” through topics – Multi-document summarization using coherence – E-Textbook aided discussion forum mining: • Explore topics through the lens of students and teachers • Label topics from posts through concepts in the e-textbook Model Usefulness and Applications
  74. 74. Roadmap Introduction to LDA Discovering Voter Preferences Using Mixtures of Topic Models [AND’09 Oral] Learning to Summarize Using Coherence [NIPS 09 Poster] Core NLP including summarization, information extraction, unsupervised grammar induction, dependency parsing, rhetorical parsing, sentiment and polarity analysis… Non-parametric Bayes Computer Vision and Applications – Core Technologies Applied Statistics Supervised Learning, Structured Prediction Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives [CIKM 2011 Oral]
  75. 75. Mostly from premier topic model research groups Year I joined UB Today! Success of LDA: Image Annotation August
  76. 76. Previously Words forming other Wiki articles Article specific content words Caption corresponding to the embedded multimedia [P. Das, R. K. Srihari and Y. Fu. “Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives,” CIKM, Glasgow Scotland, 2011+
  77. 77. Afterwards Words forming other Wiki articles Article specific content words Caption corresponding to the embedded multimedia [P. Das, R. K. Srihari and J. J. Corso. “Translating Related Words to Videos and Back through Latent Topics,” WSDM, Rome, Italy, 2013+
  78. 78.  Expensive frame-wise manual annotation efforts by drawing bounding boxes  Difficulties: camera shakes, camera motion, zooming  Careful consideration to which objects/concepts to annotate?  Focus on object/concept detection – noisy for videos in-the-wild  Does not answer which objects/concepts are important for summary generation? Man with microphone Climbing person Annotations for training object/concept models Trained Models Information Extraction from Videos
  79. 79. Learning latent translation spaces a.k.a topics A young man is climbing an artificial rock wall indoors Human Synopsis  Mixed membership of latent topics  Some topics capture observations that co- occur commonly  Other topics allow for discrimination  Different topics can be responsible for different modalities No annotations needed – only need clip level summary Translating across modalities MMGLDA model
  80. 80. Translating across modalities Using learnt translation spaces for prediction ? Text Translation ( ) ( ) , , , , 1 1 1 1 ( | , ) ( | ) ( | ) v O H O K H K O H d o i v i d h i v i o i h i p w w w p w p w  Topics are marginalized out to permute vocabulary for predictions  The lower the correlation among topics, the better the permutation  Sensitive to priors for real valued data MMGLDA model
  81. 81. Translating across modalities Use learnt translation spaces for prediction ? Text Translation ( ) ( ) , , , , 1 1 1 1 ( | , ) ( | ) ( | ) v O H O K H K O H d o i v i d h i v i o i h i p w w w p w p w  Topics are marginalized out to permute vocabulary for predictions  The lower the correlation among topics, the better the permutation  Sensitive to priors for real valued dataResponsibility of topic i over real valued observations Responsibility of topic i over discrete video features Probability of learnt topic i explaining words in the text vocabulary MMGLDA model
  82. 82. • We first formulated the MMGLDA model just two rooms left of where I am standing now! An aside
  83. 83. 1. There is a guy climbing on a rock-climbing wall. Multiple Human Summaries: (Max 10 words i.e. imposing a length constraint) 2. A man is bouldering at an indoor rock climbing gym. 3. Someone doing indoor rock climbing. 4. A person is practicing indoor rock climbing. 5. A man is doing artificial rock climbing. To understand whether we speak all that we see?
  84. 84. 1. There is a guy climbing on a rock-climbing wall. Multiple Human Summaries: (Max 10 words for imposing a length constraint) Hand holding climbing surface How many rocks? The sketch in the board Wrist-watch What’s there in the back? Color of the floor/wall Dress of the climber Not so important! 2. A man is bouldering at an indoor rock climbing gym. Empty slots 3. Someone doing indoor rock climbing. 4. A person is practicing indoor rock climbing. 5. A man is doing artificial rock climbing. Summaries point toward information needs! Center of Attentions: Central Objects and Actions
  85. 85. Skateboarding Feeding animals Landing fishes Wedding ceremony Woodworking project Multimedia Topic Model – permute event specific vocabularies Bag of keywords multi-document summaries Sub-events e.g. skateboarding, snowboarding, sur fing Multiple sets of documents (sets of frames in videos) Natural language multi-document summaries Multiple sentences (group of segments in frames) Once again: A Summarization Perspective
  86. 86. Evaluation: Held out ELBOs  In a purely multinomial MMLDA model, failures of independent events contribute highly negative terms to the log likelihoods  NOT a measure of keyword summary generation power  Test ELBOs on events 1-5 in the Dev-T set  Prediction ELBOs on events 1-5 in the Dev-T set
  87. 87. Skateboarding Feeding animals Landing fishes Wedding ceremony Woodworking project Multimedia Topic Model – permute event specific vocabularies Bag of words multi-document summaries Sub-events e.g. skateboarding, snowboarding, sur fing Multiple sets of documents (sets of frames in videos) Natural language multi-document summaries Multiple sentences (group of segments in frames)  A c-SVM classier from the libSVM package is used with default settings for multiclass (15 classes) classification  55% test accuracy easily achievable (completely off-the-shelf) Evaluate using ROUGE-1 HEXTAC 2009: 100-word human references vs. 100-word manually extracted summaries Average Recall: 0.37916 (95%-confidence interval 0.37187 - 0.38661) Average Precision: 0.39142 (95%-confidence interval 0.38342 - 0.39923) Event Classification and Summarization
  88. 88. Skateboarding Feeding animals Landing fishes Wedding ceremony Woodworking project Multimedia Topic Model – permute event specific vocabularies Bag of words multi-document summaries Sub-events e.g. skateboarding, snowboarding, sur fing Multiple sets of documents (sets of frames in videos) Natural language multi-document summaries Multiple sentences (group of segments in frames)  A c-SVM classier from the libSVM package is used with default settings for multiclass (15 classes) classification  55% test accuracy easily achievable (completely off-the-shelf) Event Classification and Summarization Evaluate using ROUGE-1 HEXTAC 2009: 100-word human references vs. 100-word manually extracted summaries Average Recall: 0.37916 (95%-confidence interval 0.37187 - 0.38661) Average Precision: 0.39142 (95%-confidence interval 0.38342 - 0.39923)  If we can achieve 10% of this for 10 word summaries, we are doing pretty good!  Caveat – Text multi-document summarization task is much more complex
  89. 89.  MMLDA can show poor ELBO – a bit misleading  Performs quite well on predicting summary worthy keywords  Sum-normalizing the real valued data to lie in [0,1]P distorts reality for Corr- MGLDA w.r.t. quantitative evaluation  Summary worthiness of predicted keywords is not good but topics are good  MMGLDA produces better topics and higher ELBO  Summary worthiness of keywords almost same as MMLDA for lower n Evaluation: ROUGE-1 Performance
  90. 90. • Simply predicting more and more keywords (or creating sentences out of them) does not improve the relevancy of the generated summaries • Instead, selecting sentences from the training set in an intuitive way almost doubles the relevancy of the lingual descriptions Improving ROUGE-1/2 performance
  91. 91. YouCook, iAnalyze Das et al. WSDM 2013 Das et al. CVPR 2013 Precision 2-gram Precision 1-gram Recall 2-gram Recall 1-gram Precision 2-gram Precision 1-gram Recall 2-gram Recall 1-gram 0.006 15.47 0.006 19.02 5.14 25.76 6.49 32.87 ROUGE scores for “YouCook” dataset[Corso et al.]
  92. 92. Roadmap Introduction to LDA Discovering Voter Preferences Using Mixtures of Topic Models [AND’09 Oral] Learning to Summarize Using Coherence [NIPS 09 Poster] Non-parametric Bayes Computer Vision and Applications – Core Technologies Translating Related Words to Videos and Back through Latent Topics [WSDM 2013 Oral] Applied Statistics Supervised Learning, Structured Prediction Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives [CIKM 2011 Oral] Core NLP including summarization, information extraction, unsupervised grammar induction, dependency parsing, rhetorical parsing, sentiment and polarity analysis… Using Tag-Topic Models and Rhetorical Structure Trees to Generate Bulleted List Summaries[to be submitted to TOIS] Linear, Quadratic and Conic Programming Variants A Thousand Frames in just a Few Words: Lingual Descriptions of Videos through Latent Topic Models and Sparse Object Stitching [CVPR 2013 Spotlight]
  93. 93. Just one last thing… • We want to analyze documents not only for topic discovery but also for turning these
  94. 94. Just one last thing… • into this  A previous study on sleep deprivation that less sleep resulted in impaired glucose metabolism.  Women who slept less than or equal to 5 hours a night were twice as likely to suffer from hypertension than women. [*]  Children ages 3 to 5 years get 11-13 hours of sleep per night.  Chronic sleep deprivation can do more it can also stress your heart.  Sleeping less than eight hours at night, frequent nightmares and difficulty initiating sleep were significantly associated with drinking.  A single night of sleep deprivation can limit the consolidation of memory the next day.  Women’s health is much more at risk. [*] [*] means that the sentences belong to the same document
  95. 95. Just one last thing… • using these Accidents and Natural Disasters Attacks Health and Safety Endangered Resources Investigations and Trials Document sets or “Docsets” Global Tag-Topic Model Local Models Documents and sentences Local Models Local Models Local Models Training using documents Fitting sentences from Docsets to the learnt model Candidate summary sentence for a Docset Weighting a summary sentence from local and global models Candidate summary sentence for a Docset
  96. 96. • and these Attribution Cause Elaboration Just one last thing… distractions such as computers or video games in kids ' bedrooms may lessen sleep quality. that only 20 percent of adolescents get the recommended nine hours of sleep ; The National Sleep Foundation reported in 2006 Satellite (Leaf: Span 1) Nucleus (Leaf: Span 2) Nucleus (Leaf: Span 3) Nucleus [2] Root [2, 3] Attribution Joint and need more than eight hours of sleep per day . because they 're nocturnal Sleep-deprived teens crash just about anywhere Nucleus (Leaf: Span 1) Nucleus (Leaf: Span 2) Nucleus (Leaf: Span 3) Satellite [2,3] Root [1, 3] Explanation Joint early-risers are actually at a higher risk of developing heart problems. but a Japanese study says Generations have praised the wisdom of getting up early in the morning, Nucleus (Leaf: Span 1) Satellite (Leaf: Span 2) Nucleus (Leaf: Span 3) Nucleus [2,3] Root [1, 3] Contrast Attribution Fortunately for sleepy women , a Penn State College of Medicine study found, Satellite (Leaf: Span 1) Nucleus [2,4] Root [1, 4] that they 're much better than men at enduring sleep deprivation, Nucleus (Leaf: Span 2) possibly because of '' profound demands of infant and child care Nucleus (Leaf: Span 3) placed on them for most of mankind 's history. Satellite (Leaf: Span 4) Satellite [2,3]
  97. 97. • With scores like these Just one last thing…
  98. 98. Just one last thing… • and these
  99. 99. • We want to analyze documents not only for topic discovery but also for turning these • into this • using these • and these • with scores like these • and these The final song: Recap
  100. 100. The ending… Interviewer: Do you agree with President Obama’s approach towards Libya? Presidential: [Libya??] I just wanted to make sure we're talking about the same Candidate thing before I say, 'Yes, I agreed' or 'No I didn't agree.' I do not agree with the way he handled it for the following reason -- nope, that's a different one. I got all this stuff twirling around in my head • So that we can always have the right information on our fingertips
  101. 101. Summary • Summarize a task using contextual exploratory analysis tools as well as deep NLP and • Make decisions for us! • Topic models can now talk to structured prediction models • Efficient text summarization/translation of domain specific videos is now possible • With multi-document summarization systems which exploit meaning in text, we are getting closer to our ultimate dream: – Construct an artificial assistant who can
  102. 102. Future Directions • Core Algorithms – Non-parametric Tag2LDA family models – Address sparsity in tags and scaling of real-valued variables in mixed domain topic models – Efficient inference with more structure among hidden variables • Applications – Type in text and get an object detector [borrowed from VPML] – Intention analysis of videographers in social networks and the evolution of intentions over time – Large scale visualization using rhetorics and topic analysis – Large scale multi-media multi-document summarization
  103. 103. Thank You All for Listening Questions?

×