SlideShare uma empresa Scribd logo
1 de 16
CSE5230/DMS/2004/9




                     Data Mining - CSE5230


                       Hidden Markov Models (HMMs)




CSE5230 - Data Mining, 2004                                Lecture 9.1




                              Lecture Outline
         Time- and space-varying processes
     u
         First-order Markov models
     u
         Hidden Markov models
     u
         Examples: coin toss experiments
     u
         Formal definition
     u
         Use of HMMs for classification
     u
         The three HMM problems
     u
           v The evaluation problem
           v The Forward algorithm
           v The Viterbi and Forward-Backward algorithms
         HMMs for web-mining
     u
         References
     u



CSE5230 - Data Mining, 2004                                Lecture 9.2




                                   Page 1
Time- and Space-varying Processes (1)
     u The    data mining techniques we have discussed
         so far have focused on the classification,
         prediction or characterization of single data
         points, e.g.:
           vAssign a record to one of a set of classes
              » Decision trees, back-propagation neural networks,
                Bayesian classifiers, etc.
           vPredicting the value of a field in a record given the
            values of the other fields
              » Regression, back-propagation neural networks, etc.
           vFinding regions of feature space where data points are
            densely grouped
              » Clustering, self-organizing maps

CSE5230 - Data Mining, 2004                                 Lecture 9.3




  Time- and Space-varying Processes (2)

     u In  the methods we have considered so far, we
         have assumed that each observed data point is
         statistically independent from the observation
         that preceded it, e.g.
           vClassification: the class of data point x t (from time t) is
            not influenced by the class of x_t-1 (from time t – 1), or
            indeed any other data point
           vPrediction: the value of a field for a record depends only
            on the values of the field of that record, not on values in
            any other records.
     u Several  important real-world data mining
         problems can not be modeled in this way.

CSE5230 - Data Mining, 2004                                 Lecture 9.4




                                  Page 2
Time- and Space-varying Processes (3)
           We often encounter sequences of observations, where
      u
           each observation may depend on the observations which
           preceded it
           Examples
      u
              v Sequences of phonemes (fundamental sounds) in speech
                (speech recognition)
              v Sequences of letters or words in text (text categorization,
                information retrieval, text mining)
              v Sequences of web page accesses (web usage mining)
              v Sequences of bases (CGAT) in DNA (genome projects
                [human, fruit fly, etc.))
              v Sequences of pen-strokes (hand-writing recognition)
           In all these cases, the probability of observing a particular
      u
           value in the sequence can depend on the values which
           came before it


CSE5230 - Data Mining, 2004                                                                       Lecture 9.5




                                       Example: web log
           Consider the following extract from a web log:
      u

xxx - - [16/Sep/2002:14:50:34 +1000]   quot;GET /courseware/cse5230/ HTTP/1.1quot;                                       200   13539
xxx - - [16/Sep/2002:14:50:42 +1000]   quot;GET /courseware/cse5230/html/research_paper.html HTTP/1.1quot;               200   11118
xxx - - [16/Sep/2002:14:51:28 +1000]   quot;GET /courseware/cse5230/html/tutorials.html HTTP/1.1quot;                    200   7750
xxx - - [16/Sep/2002:14:51:30 +1000]   quot;GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1quot;             200   32768
xxx - - [16/Sep/2002:14:51:31 +1000]   quot;GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1quot;             206   146390
xxx - - [16/Sep/2002:14:51:40 +1000]   quot;GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1quot;           200   17100
xxx - - [16/Sep/2002:14:51:40 +1000]   quot;GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1quot;           206   14520
xxx - - [16/Sep/2002:14:51:56 +1000]   quot;GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1quot;   200   17137
xxx - - [16/Sep/2002:14:51:56 +1000]   quot;GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1quot;   206   16017
xxx - - [16/Sep/2002:14:52:03 +1000]   quot;GET /courseware/cse5230/html/lectures.html HTTP/1.1quot;                     200   9608
xxx - - [16/Sep/2002:14:52:05 +1000]   quot;GET /courseware/cse5230/assets/images/week03.ppt HTTP/1.1quot;               200   121856
xxx - - [16/Sep/2002:14:52:24 +1000]   quot;GET /courseware/cse5230/assets/images/week06.ppt HTTP/1.1quot;               200   527872


           Cleary the URL which is requested depends on the URL
      u
           which was requested before
              v If the user uses the “Back” button in his/her browser, the
                requested URL may depend on earlier URLs in the sequence
                too
           Given a particular observed URL, we can calculate the
      u
           probabilities of observing all the other possible URLs next.
              v Note that we may even observe the same URL next.

CSE5230 - Data Mining, 2004                                                                       Lecture 9.6




                                                        Page 3
First-Order Markov Models (1)

         In order to model processes such as these, we make use of
     u
         the idea of states. At any time t, we consider the system to
         be in state q(t).
         We can consider a sequence of successive states of length
     u
         T:

                    qT = (q(1), q(2), …, q(T))
         We will model the production of such a sequence using
     u
         transition probabilities:

                          P ( q j (t + 1) | qi (t )) = aij
     Which is the probability that the system will be in state qj at
      time t+1 given that it was in state qi at time t


CSE5230 - Data Mining, 2004                                  Lecture 9.7




           First-Order Markov Models (2)

     uA  model of states and transition probabilities,
       such as the one we have just described, is called
       a Markov model.
     u Since we have assumed that the transition
       probabilities depend only on the previous state,
       this is a first-order Markov model
           vHigher order Markov models are possible, but we will
            not consider them here.
     u For   example, Markov models for human speech
         could have states corresponding phonemes
           vA Markov model for the word “cat” would have states for
            /k/, /a/, /t/ and a final silent state


CSE5230 - Data Mining, 2004                                  Lecture 9.8




                                      Page 4
Example: Markov model for “cat”



                              /a/                 /t/
             /k/                                                      /silent/




CSE5230 - Data Mining, 2004                                    Lecture 9.9




                     Hidden Markov Models
         In the preceding example, we have said that the states
     u
         correspond to phonemes
         In a speech recognition system, however, we don’t have
     u
         access to phonemes – we can only measure properties of
         the sound produced by a speaker
         In general, our observed data does not correspond directly
     u
         to a state of the model: the data corresponds to the visible
         states of the system
           v The visible states are directly accessible for measurement.
         The system can also have internal “hidden” states which
                                                          ,
     u
         can not be observed directly
           v For each hidden state, there is a probability of observing each
             visible state.
         This sort of model is called Hidden Markov Model (HMM)
     u


CSE5230 - Data Mining, 2004                                   Lecture 9.10




                                    Page 5
Example: coin toss experiments
     u Let us imagine a scenario where we are in a
       room which is divided in two by a curtain.
     u We are on one side of the curtain, and on the
       other is a person who will carry out a procedure
       using coins resulting in a head (H) or a tail (T).
     u When the person has carried out the procedure,
       they call out the result, H or T, which we record.
     u This system will allow us to generate a sequence
       of Hs and Ts, e.g.
         HHTHTHTTHTTTTTHHTHHHHTHHHTTHHHHHHTTT
         TTTTTHTHHTHTTTTTHHTHTHHHTHTHHTTTTHHT
         TTHHTHHTTTHTHTHTHTHHHTHHTTHT….

CSE5230 - Data Mining, 2004                              Lecture 9.11




                   Example: single fair coin
         Imagine that the person behind the curtain has a single fair
     u
         coin (i.e. it has equal probabilities of coming up heads or
         tails). This generates sequences such as
         THTHHHTTTTHHHHHTHHTTHHTTHHTHHHHHHHTTHTTHHHH
         THTTTHHTHTTHHHHTHTHHTTHTHTTHHTHTHHHTHHTHT…
         We could model the process producing the sequence of Hs
     u
         and Ts as a Markov model with two states, and equal
         transition probabilities:
                                     0.5

                     0.5                                  0.5
                              H               T
                                     0.5

         Note that here the visible states correspond exactly to the
     u
         internal states – the model is not hidden
         Note also that states can transition to themselves
     u

CSE5230 - Data Mining, 2004                              Lecture 9.12




                                  Page 6
Example: a fair and a biased coin
           Now let us imagine a more complicated scenario. The
     u
           person behind the curtain has three coins, two fair and
           one biased (for example, P(T) = 0.9)
           v One fair coin and the biased coin are used to produce output
              – these are the “output coins”. The other fair coin is used to
              decide whether to switch output coins.
           1. The person starts by picking an output coin a random
           2. The person tosses the coin, and calls out the result (H or T)
           3. The person tosses the other fair coin. If the result was H, the
              person switches output coins
           4. Go back to step 2, and repeat.
           This process generates sequences like:
     u

           HHHTTTTTTTHTHTTHTHTTTTTTHTTTTTTTTTTTTHHTHT
           TTHHTTHTHTHTTTHTTTTTTTTHTHTTTTHTTTTHTTTHTH
           HTTHTTTHTTHTTTTTTTHTTTTTHT…
           Note this looks quite different from the sequence for the
     u
           fair coin example.
CSE5230 - Data Mining, 2004                                                Lecture 9.13




         Example: a fair and a biased coin
     u In  this scenario, the visible state no longer
         corresponds exactly to the hidden state of the
         system:
           vVisible state: output of H or T
           vHidden state: which coin was tossed
     u We       can model this process using a HMM:
                                                0.5

                     0.5                                                    0.5
                                    Fair               Biased

                              0.5                                    0.9
                                                           0.1
                                        0.5     0.5
                                                                              T
                       T                                         H
                                    H

CSE5230 - Data Mining, 2004                                                Lecture 9.14




                                              Page 7
Example: a fair and a biased coin
     u We    see from the diagram on the preceding slide
         that we have extended our model
           vThe visible states are shown in blue, and the emission
            probabilities are shown too.
     u As   well as internal states qj(t) and state transition
         probabilities aij, we have visible states vk(t) and
         emission probabilities bjk

                              P (vk (t) | q j (t )) = b jk
     u We    now have full model such as this is called a
         Hidden Markov Model


CSE5230 - Data Mining, 2004                                  Lecture 9.15




                HMM: formal definition (1)
         We can now give a more formal definition of a first-order
     u
         Hidden Markov Model (adapted from [RaJ1986]:
           v There is a finite number of (internal) states, N
           v At each time t, a new state is entered, based upon a transition
             probability distribution which depends on the state at time t –
             1. Self-transitions are allowed
           v After each transition is made, a symbol is output, according to
             a probability distribution which depends only on the current
             state. There are thus N such probability distributions.
         If we want to build an HMM to model a real sequence, we
     u
         have to solve several problems. We must estimate:
           v the number of states N
           v the transition probabilities aij
           v the emission probabilities bjk


CSE5230 - Data Mining, 2004                                  Lecture 9.16




                                        Page 8
HMM: formal definition (2)
          When an HMM is “run”, it produces an sequence of
     u
          symbols. This called an observation sequence O of length
          T:
                         O = (O(1), O(2), …, O(T))

          In order to talk about using, building and training an HMM,
     u
          we need some definitions:
           N — the number of states in the model
           M — the number of different symbols that can observed
           Q = {q1, q 2, …, qN} — the set of internal states
           V = { v1 , v2, …, vM} — the set of observable symbols
           A = {aij} — the set of state transition probabilities
           B = {bjk} — the set of symbol emission probabilities
           π = {πi = P(qi(1)} — the initial state probability distribution
           λ = (A, B, π) — a particular HMM model


CSE5230 - Data Mining, 2004                                       Lecture 9.17




                      Generating a Sequence

           To generate an observation sequence using an HMM, we
     u
           use the following algorithm:

           Set t = 1
     1.

           Choose an initial state q(1) according to πi
     2.
           Output a symbol O(T) according to b q(t)k
     3.
           Choose the next state q(t + 1) according to a q(t)q(t+1)
     4.
           Set t = t + 1; if t < T, go to 3
     5.



           In applications, however, we don’t actually do this. We
     u
           assume that the process that generates our data does
           this. The problem is to work out which HMM is
           responsible for a data sequence


CSE5230 - Data Mining, 2004                                       Lecture 9.18




                                      Page 9
Use of HMMs
         We have now seen what sorts of processes can be
     u
         modeled using HMMs, and how an HMM is specified
         mathematically.
         We now consider how HMMs are actually used.
     u
         Consider the two H and T sequences we saw in the
     u
         previous examples:
           v How could we decided which coin-toss system was most
             likely to have produced each sequence?
         To which system would you assign these sequences?
     u

         1:    TTHHHTHHHTTTTTHTTTTTTHTHTHTTHHHHTHTH
         2:    TTTTTTHTHHTHTTHTTTTHHHTHHHHTTHTHTTTT
         3:    THTTHTTTTHTTHHHTHTTTHTHHHHTTHTHHHTHT
         4:    TTTHHTTTHHHTTTTTTTHTTTTTHHTHTTHTTTTH
         We can answer this question using a Bayesian formulation
     u
         (see last week’s lecture)

CSE5230 - Data Mining, 2004                                        Lecture 9.19




            Use of HMMs for classification
         HMMs are often used to classify sequences
     u
         To do this, a separate HMM is built and trained (i.e. the
     u
         parameters are estimated) for each class of sequence in
         which we are interested
           v e.g. we might have an HMM for each word in a speech
             recognition system. The hidden states would correspond to
             phonemes, and the visible states to measured sound features
         This gives us a set of HMMs, {λl }
     u
         For a given observed sequence O, we estimate the
     u
         probability that each HMM λl generated it:
                                             P (O | λl ) P( λl )
                              P( λl | O) =
                                                  P( O )
         We assign the sequence to the model with the highest
     u
         posterior probability
           v i.e. the probability given the evidence, where the evidence is
             the sequence to be classified

CSE5230 - Data Mining, 2004                                        Lecture 9.20




                                       Page 10
The three HMM problems
     u If we want to apply HMMs real problems with real
         data, we must solve three problems:
            vThe evaluation problem: given an observation
             sequence O and an HMM model λ, compute P(O|λ), the
             probability that the sequence was produced by the
             model
            vThe decoding problem: given an observation
             sequence O and a model λ, find the most likely state
             sequence q(1), q(2), … q(T) to have produced O
            vThe learning problem: given a training sequence O,
             find a model λ, specified by parameters A, B, π to
             maximize P(O|λ) (we assume for now that Q and V are
             known)
     u The   evaluation problem has a direct solution. The
         others are harder, and involve optimization

CSE5230 - Data Mining, 2004                                           Lecture 9.21




                     The evaluation problem
         The simplest way to solve the evaluation problem is to go
     u
         over all possible state sequences Ir of length T and
         calculate the probability that each of them produced O:
                                         rmax
                       P (O | λ ) = ∑ P(O | I r , λ ) P( I r )
                                         r =1
         where
     u

                       P (O | I , λ ) = bi1O1 bi2O2 ...biT OT
                       P ( I | λ ) = π i1 ai1i 2 ai2 i3 ...aiT −1iT
         While this could in principle be done, there is a problem:
     u
         computational complexity. Our model λ has N states. There
         are thus rmax = NT possible sequences of length T. The
         computational complexity is O(NT T).
            v Even if we have small N and T, this is not feasible: for N = 5
              and T = 100, there are ~1072 computations needed!
CSE5230 - Data Mining, 2004                                           Lecture 9.22




                                          Page 11
The Forward Algorithm (1)
     u Luckily, there is a solution to this problem: we do
       not need to do the full calculation
     u We can do a recursive evaluation, using an
       auxiliary variable α t(i), called the forward variable:
               α t (i) = P ((O (1), O (2),..., O (t )), i(t ) = qi | λ )
       this is the probability of the partial observation
       sequence (up until time t) and internal state q i(t)
       given the model λ
     u Why does this help? Because in a first-order
       HMM, the transition and emission probabilities
       only depend on the current state. This makes a
       recursive calculation possible.

CSE5230 - Data Mining, 2004                                        Lecture 9.23




                The Forward Algorithm (2)
     u We    can calculate a t+1(j) – the next step – using
         the previous one:
                                                  N
                              α t +1 (i ) = b jOt+1 ∑ aijα t (i)
                                                  i =1

     u This    just says that the probability of the
         observation sequence up to time t + 1 and being
         in state qj at time t + 1 is:
           the probability of observing symbol Ot+1 when in state qj,
             bjOt+1, times the sum of
               the probabilities of getting to state qj from state qi
                        times the probability of the observation
                        sequence up to time t and being in state qi
              that we have to keep track of α t(i) for all N
     u Note
         possible internal states

CSE5230 - Data Mining, 2004                                        Lecture 9.24




                                        Page 12
The Forward Algorithm (3)
           If we know αT (i) for all the possible states, we can
     u
           calculate the overall probability of the sequence given the
           model (as we wanted on slide 9.22):
                                                       N
                                          P( O | λ ) = ∑ α T (i)
                                                      i =1

           We can now specify the forward algorithm, which will let
     u
           us calculate the αT(i):
     for i = 1 to N { α1 (i) = π i biO1 } /* initialize */
     for t = 1 to T – 1 {
           for j = 1 to N {
                                    N
               α t +1 (i) = b jOt +1 ∑ a ijα t ( i)
                                    i=1
           }
     }
CSE5230 - Data Mining, 2004                                        Lecture 9.25




                The Forward Algorithm (4)

     u The   forward algorithm allows us to calculate
         P(O|λ), and has computational complexity
         O(N2 T), as can be seen from the algorithm
           vThis is linear in T, rather than exponential, as the direct
            calculation was. This means that it is feasible
     u We    can use P(O|λ) in the Bayesian equation on
         slide 9.20 to use a set of HMMs a classifier
           vThe other terms in the equation can be estimated from
            the training data, as with the Naïve Bayesian Classifier




CSE5230 - Data Mining, 2004                                        Lecture 9.26




                                               Page 13
The Viterbi and Forward-
                   Backward Algorithms

     u The  most common solution to the decoding
       problem uses the Viterbi algorithm [RaJ1986],
       which also uses partial sequences and recursion
     u There is no known method for finding an optimal
       solution to the learning problem. The most
       commonly used optimization technique is known
       as the forward-backward algorithm, or the Baum-
       Welch algorithm [RaJ1986,DHS2000]. It is a
       generalized expectation maximization algorithm.
       See references for details


CSE5230 - Data Mining, 2004                    Lecture 9.27




                 HMMs for web-mining (1)

     u HMMs   can be used to analyse to clickstreams
       the web users leave in the log files of web
       servers (see slide 9.6).
     u Ypma and Heskes (2002) report the application
       of HMMs to:
           vWeb page categorization
           vUser clustering
     u They   applied their system to real-word web logs
         from a large Dutch commercial web site
         [YpH2002]


CSE5230 - Data Mining, 2004                    Lecture 9.28




                              Page 14
HMMs for web-mining (2)


     uA    mixture of HMMs is used to learn page
         categories (the hidden state variables) and inter-
         category transition probabilities
           vThe page URLs are treated as the observations
     u Web   surfer types are modeled by HMMs
     u A clickstream is modeled as a mixture of HMMs,
       to account for several types of user being present
       at once



CSE5230 - Data Mining, 2004                                  Lecture 9.29




                 HMMs for web-mining (3)
         When applied to data from a commercial web site, page
     u
         categories were learned as hidden states
           v Inspection of the emission probabilities B showed that several
             page categories were discovered:
               » Shop info
               » Start, customer/corporate/promotion
               » Tools
               » Search, download/products
         Four user types were discovered too (HMMs with different
     u
         state prior and transition probabilities)
           v Two types dominated:
               » General interest users
               » Shop users
           v The starting state was the most important difference between
             the types


CSE5230 - Data Mining, 2004                                  Lecture 9.30




                                   Page 15
References
 u [DHS2000]   Richard O. Duda, Peter E. Hart and David
   G. Stork, Pattern Classification (2nd Edn), Wiley, New
   York, NY, 2000, pp. 128-138
 u [RaJ1986] L. R. Rabiner and B. H. Juang, An
   introduction to hidden Markov models, IEEE
   Magazine on Acoustics, Speech and Signal
   Processing, 3, 1, pp. 4 -16, January 1986.
 u [YpH2002] Alexander Ypma and Tom Heskes,
   Categorization of web pages and user clustering with
   mixtures of hidden Markov models, In Proceedings of
   the International Workshop on Web Knowledge
   Discovery and Data mining (WEBKDD'02),
   Edmonton, Canada, July 17 2002.


CSE5230 - Data Mining, 2004                 Lecture 9.31




                                Page 16

Mais conteúdo relacionado

Semelhante a Hmm Datamining

Scalable hierarchical algorithms for stochastic PDEs and UQ
Scalable hierarchical algorithms for stochastic PDEs and UQScalable hierarchical algorithms for stochastic PDEs and UQ
Scalable hierarchical algorithms for stochastic PDEs and UQAlexander Litvinenko
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsSriskandarajah Suhothayan
 
stochasticmodellinganditsapplications.ppt
stochasticmodellinganditsapplications.pptstochasticmodellinganditsapplications.ppt
stochasticmodellinganditsapplications.pptVGaneshKarthikeyan
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...IRJET Journal
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputingfikrul islamy
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputingfikrul islamy
 
Predicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parametersPredicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parametersIJLT EMAS
 
Stochastic modelling and its applications
Stochastic modelling and its applicationsStochastic modelling and its applications
Stochastic modelling and its applicationsKartavya Jain
 
Quantum Computation and Information
Quantum Computation and InformationQuantum Computation and Information
Quantum Computation and InformationXequeMateShannon
 
Turbulence - computational overview and CO2 transfer
Turbulence - computational overview and CO2 transferTurbulence - computational overview and CO2 transfer
Turbulence - computational overview and CO2 transferFabio Fonseca
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...ijcsit
 
meng.ppt
meng.pptmeng.ppt
meng.pptaozcan1
 
The SpaceDrive Project - First Results on EMDrive and Mach-Effect Thrusters
The SpaceDrive Project - First Results on EMDrive and Mach-Effect ThrustersThe SpaceDrive Project - First Results on EMDrive and Mach-Effect Thrusters
The SpaceDrive Project - First Results on EMDrive and Mach-Effect ThrustersSérgio Sacani
 

Semelhante a Hmm Datamining (20)

Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Scalable hierarchical algorithms for stochastic PDEs and UQ
Scalable hierarchical algorithms for stochastic PDEs and UQScalable hierarchical algorithms for stochastic PDEs and UQ
Scalable hierarchical algorithms for stochastic PDEs and UQ
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
 
stochasticmodellinganditsapplications.ppt
stochasticmodellinganditsapplications.pptstochasticmodellinganditsapplications.ppt
stochasticmodellinganditsapplications.ppt
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
 
Bay's marko chain
Bay's marko chainBay's marko chain
Bay's marko chain
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
 
Predicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parametersPredicting electricity consumption using hidden parameters
Predicting electricity consumption using hidden parameters
 
Hidden Markov Model
Hidden Markov Model Hidden Markov Model
Hidden Markov Model
 
Stochastic modelling and its applications
Stochastic modelling and its applicationsStochastic modelling and its applications
Stochastic modelling and its applications
 
Quantum Computation and Information
Quantum Computation and InformationQuantum Computation and Information
Quantum Computation and Information
 
Ns2
Ns2Ns2
Ns2
 
Turbulence - computational overview and CO2 transfer
Turbulence - computational overview and CO2 transferTurbulence - computational overview and CO2 transfer
Turbulence - computational overview and CO2 transfer
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...
 
meng.ppt
meng.pptmeng.ppt
meng.ppt
 
The SpaceDrive Project - First Results on EMDrive and Mach-Effect Thrusters
The SpaceDrive Project - First Results on EMDrive and Mach-Effect ThrustersThe SpaceDrive Project - First Results on EMDrive and Mach-Effect Thrusters
The SpaceDrive Project - First Results on EMDrive and Mach-Effect Thrusters
 
first research paper
first research paperfirst research paper
first research paper
 

Último

Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxuzma244191
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...Amil Baba Dawood bangali
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...Amil Baba Dawood bangali
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojnaDharmendra Kumar
 
Ch 4 investment Intermediate financial Accounting
Ch 4 investment Intermediate financial AccountingCh 4 investment Intermediate financial Accounting
Ch 4 investment Intermediate financial AccountingAbdi118682
 
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...First NO1 World Amil baba in Faisalabad
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantagesjayjaymabutot13
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptx
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024
 
🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojna
 
Ch 4 investment Intermediate financial Accounting
Ch 4 investment Intermediate financial AccountingCh 4 investment Intermediate financial Accounting
Ch 4 investment Intermediate financial Accounting
 
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantages
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Hmm Datamining

  • 1. CSE5230/DMS/2004/9 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230 - Data Mining, 2004 Lecture 9.1 Lecture Outline Time- and space-varying processes u First-order Markov models u Hidden Markov models u Examples: coin toss experiments u Formal definition u Use of HMMs for classification u The three HMM problems u v The evaluation problem v The Forward algorithm v The Viterbi and Forward-Backward algorithms HMMs for web-mining u References u CSE5230 - Data Mining, 2004 Lecture 9.2 Page 1
  • 2. Time- and Space-varying Processes (1) u The data mining techniques we have discussed so far have focused on the classification, prediction or characterization of single data points, e.g.: vAssign a record to one of a set of classes » Decision trees, back-propagation neural networks, Bayesian classifiers, etc. vPredicting the value of a field in a record given the values of the other fields » Regression, back-propagation neural networks, etc. vFinding regions of feature space where data points are densely grouped » Clustering, self-organizing maps CSE5230 - Data Mining, 2004 Lecture 9.3 Time- and Space-varying Processes (2) u In the methods we have considered so far, we have assumed that each observed data point is statistically independent from the observation that preceded it, e.g. vClassification: the class of data point x t (from time t) is not influenced by the class of x_t-1 (from time t – 1), or indeed any other data point vPrediction: the value of a field for a record depends only on the values of the field of that record, not on values in any other records. u Several important real-world data mining problems can not be modeled in this way. CSE5230 - Data Mining, 2004 Lecture 9.4 Page 2
  • 3. Time- and Space-varying Processes (3) We often encounter sequences of observations, where u each observation may depend on the observations which preceded it Examples u v Sequences of phonemes (fundamental sounds) in speech (speech recognition) v Sequences of letters or words in text (text categorization, information retrieval, text mining) v Sequences of web page accesses (web usage mining) v Sequences of bases (CGAT) in DNA (genome projects [human, fruit fly, etc.)) v Sequences of pen-strokes (hand-writing recognition) In all these cases, the probability of observing a particular u value in the sequence can depend on the values which came before it CSE5230 - Data Mining, 2004 Lecture 9.5 Example: web log Consider the following extract from a web log: u xxx - - [16/Sep/2002:14:50:34 +1000] quot;GET /courseware/cse5230/ HTTP/1.1quot; 200 13539 xxx - - [16/Sep/2002:14:50:42 +1000] quot;GET /courseware/cse5230/html/research_paper.html HTTP/1.1quot; 200 11118 xxx - - [16/Sep/2002:14:51:28 +1000] quot;GET /courseware/cse5230/html/tutorials.html HTTP/1.1quot; 200 7750 xxx - - [16/Sep/2002:14:51:30 +1000] quot;GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1quot; 200 32768 xxx - - [16/Sep/2002:14:51:31 +1000] quot;GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1quot; 206 146390 xxx - - [16/Sep/2002:14:51:40 +1000] quot;GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1quot; 200 17100 xxx - - [16/Sep/2002:14:51:40 +1000] quot;GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1quot; 206 14520 xxx - - [16/Sep/2002:14:51:56 +1000] quot;GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1quot; 200 17137 xxx - - [16/Sep/2002:14:51:56 +1000] quot;GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1quot; 206 16017 xxx - - [16/Sep/2002:14:52:03 +1000] quot;GET /courseware/cse5230/html/lectures.html HTTP/1.1quot; 200 9608 xxx - - [16/Sep/2002:14:52:05 +1000] quot;GET /courseware/cse5230/assets/images/week03.ppt HTTP/1.1quot; 200 121856 xxx - - [16/Sep/2002:14:52:24 +1000] quot;GET /courseware/cse5230/assets/images/week06.ppt HTTP/1.1quot; 200 527872 Cleary the URL which is requested depends on the URL u which was requested before v If the user uses the “Back” button in his/her browser, the requested URL may depend on earlier URLs in the sequence too Given a particular observed URL, we can calculate the u probabilities of observing all the other possible URLs next. v Note that we may even observe the same URL next. CSE5230 - Data Mining, 2004 Lecture 9.6 Page 3
  • 4. First-Order Markov Models (1) In order to model processes such as these, we make use of u the idea of states. At any time t, we consider the system to be in state q(t). We can consider a sequence of successive states of length u T: qT = (q(1), q(2), …, q(T)) We will model the production of such a sequence using u transition probabilities: P ( q j (t + 1) | qi (t )) = aij Which is the probability that the system will be in state qj at time t+1 given that it was in state qi at time t CSE5230 - Data Mining, 2004 Lecture 9.7 First-Order Markov Models (2) uA model of states and transition probabilities, such as the one we have just described, is called a Markov model. u Since we have assumed that the transition probabilities depend only on the previous state, this is a first-order Markov model vHigher order Markov models are possible, but we will not consider them here. u For example, Markov models for human speech could have states corresponding phonemes vA Markov model for the word “cat” would have states for /k/, /a/, /t/ and a final silent state CSE5230 - Data Mining, 2004 Lecture 9.8 Page 4
  • 5. Example: Markov model for “cat” /a/ /t/ /k/ /silent/ CSE5230 - Data Mining, 2004 Lecture 9.9 Hidden Markov Models In the preceding example, we have said that the states u correspond to phonemes In a speech recognition system, however, we don’t have u access to phonemes – we can only measure properties of the sound produced by a speaker In general, our observed data does not correspond directly u to a state of the model: the data corresponds to the visible states of the system v The visible states are directly accessible for measurement. The system can also have internal “hidden” states which , u can not be observed directly v For each hidden state, there is a probability of observing each visible state. This sort of model is called Hidden Markov Model (HMM) u CSE5230 - Data Mining, 2004 Lecture 9.10 Page 5
  • 6. Example: coin toss experiments u Let us imagine a scenario where we are in a room which is divided in two by a curtain. u We are on one side of the curtain, and on the other is a person who will carry out a procedure using coins resulting in a head (H) or a tail (T). u When the person has carried out the procedure, they call out the result, H or T, which we record. u This system will allow us to generate a sequence of Hs and Ts, e.g. HHTHTHTTHTTTTTHHTHHHHTHHHTTHHHHHHTTT TTTTTHTHHTHTTTTTHHTHTHHHTHTHHTTTTHHT TTHHTHHTTTHTHTHTHTHHHTHHTTHT…. CSE5230 - Data Mining, 2004 Lecture 9.11 Example: single fair coin Imagine that the person behind the curtain has a single fair u coin (i.e. it has equal probabilities of coming up heads or tails). This generates sequences such as THTHHHTTTTHHHHHTHHTTHHTTHHTHHHHHHHTTHTTHHHH THTTTHHTHTTHHHHTHTHHTTHTHTTHHTHTHHHTHHTHT… We could model the process producing the sequence of Hs u and Ts as a Markov model with two states, and equal transition probabilities: 0.5 0.5 0.5 H T 0.5 Note that here the visible states correspond exactly to the u internal states – the model is not hidden Note also that states can transition to themselves u CSE5230 - Data Mining, 2004 Lecture 9.12 Page 6
  • 7. Example: a fair and a biased coin Now let us imagine a more complicated scenario. The u person behind the curtain has three coins, two fair and one biased (for example, P(T) = 0.9) v One fair coin and the biased coin are used to produce output – these are the “output coins”. The other fair coin is used to decide whether to switch output coins. 1. The person starts by picking an output coin a random 2. The person tosses the coin, and calls out the result (H or T) 3. The person tosses the other fair coin. If the result was H, the person switches output coins 4. Go back to step 2, and repeat. This process generates sequences like: u HHHTTTTTTTHTHTTHTHTTTTTTHTTTTTTTTTTTTHHTHT TTHHTTHTHTHTTTHTTTTTTTTHTHTTTTHTTTTHTTTHTH HTTHTTTHTTHTTTTTTTHTTTTTHT… Note this looks quite different from the sequence for the u fair coin example. CSE5230 - Data Mining, 2004 Lecture 9.13 Example: a fair and a biased coin u In this scenario, the visible state no longer corresponds exactly to the hidden state of the system: vVisible state: output of H or T vHidden state: which coin was tossed u We can model this process using a HMM: 0.5 0.5 0.5 Fair Biased 0.5 0.9 0.1 0.5 0.5 T T H H CSE5230 - Data Mining, 2004 Lecture 9.14 Page 7
  • 8. Example: a fair and a biased coin u We see from the diagram on the preceding slide that we have extended our model vThe visible states are shown in blue, and the emission probabilities are shown too. u As well as internal states qj(t) and state transition probabilities aij, we have visible states vk(t) and emission probabilities bjk P (vk (t) | q j (t )) = b jk u We now have full model such as this is called a Hidden Markov Model CSE5230 - Data Mining, 2004 Lecture 9.15 HMM: formal definition (1) We can now give a more formal definition of a first-order u Hidden Markov Model (adapted from [RaJ1986]: v There is a finite number of (internal) states, N v At each time t, a new state is entered, based upon a transition probability distribution which depends on the state at time t – 1. Self-transitions are allowed v After each transition is made, a symbol is output, according to a probability distribution which depends only on the current state. There are thus N such probability distributions. If we want to build an HMM to model a real sequence, we u have to solve several problems. We must estimate: v the number of states N v the transition probabilities aij v the emission probabilities bjk CSE5230 - Data Mining, 2004 Lecture 9.16 Page 8
  • 9. HMM: formal definition (2) When an HMM is “run”, it produces an sequence of u symbols. This called an observation sequence O of length T: O = (O(1), O(2), …, O(T)) In order to talk about using, building and training an HMM, u we need some definitions: N — the number of states in the model M — the number of different symbols that can observed Q = {q1, q 2, …, qN} — the set of internal states V = { v1 , v2, …, vM} — the set of observable symbols A = {aij} — the set of state transition probabilities B = {bjk} — the set of symbol emission probabilities π = {πi = P(qi(1)} — the initial state probability distribution λ = (A, B, π) — a particular HMM model CSE5230 - Data Mining, 2004 Lecture 9.17 Generating a Sequence To generate an observation sequence using an HMM, we u use the following algorithm: Set t = 1 1. Choose an initial state q(1) according to πi 2. Output a symbol O(T) according to b q(t)k 3. Choose the next state q(t + 1) according to a q(t)q(t+1) 4. Set t = t + 1; if t < T, go to 3 5. In applications, however, we don’t actually do this. We u assume that the process that generates our data does this. The problem is to work out which HMM is responsible for a data sequence CSE5230 - Data Mining, 2004 Lecture 9.18 Page 9
  • 10. Use of HMMs We have now seen what sorts of processes can be u modeled using HMMs, and how an HMM is specified mathematically. We now consider how HMMs are actually used. u Consider the two H and T sequences we saw in the u previous examples: v How could we decided which coin-toss system was most likely to have produced each sequence? To which system would you assign these sequences? u 1: TTHHHTHHHTTTTTHTTTTTTHTHTHTTHHHHTHTH 2: TTTTTTHTHHTHTTHTTTTHHHTHHHHTTHTHTTTT 3: THTTHTTTTHTTHHHTHTTTHTHHHHTTHTHHHTHT 4: TTTHHTTTHHHTTTTTTTHTTTTTHHTHTTHTTTTH We can answer this question using a Bayesian formulation u (see last week’s lecture) CSE5230 - Data Mining, 2004 Lecture 9.19 Use of HMMs for classification HMMs are often used to classify sequences u To do this, a separate HMM is built and trained (i.e. the u parameters are estimated) for each class of sequence in which we are interested v e.g. we might have an HMM for each word in a speech recognition system. The hidden states would correspond to phonemes, and the visible states to measured sound features This gives us a set of HMMs, {λl } u For a given observed sequence O, we estimate the u probability that each HMM λl generated it: P (O | λl ) P( λl ) P( λl | O) = P( O ) We assign the sequence to the model with the highest u posterior probability v i.e. the probability given the evidence, where the evidence is the sequence to be classified CSE5230 - Data Mining, 2004 Lecture 9.20 Page 10
  • 11. The three HMM problems u If we want to apply HMMs real problems with real data, we must solve three problems: vThe evaluation problem: given an observation sequence O and an HMM model λ, compute P(O|λ), the probability that the sequence was produced by the model vThe decoding problem: given an observation sequence O and a model λ, find the most likely state sequence q(1), q(2), … q(T) to have produced O vThe learning problem: given a training sequence O, find a model λ, specified by parameters A, B, π to maximize P(O|λ) (we assume for now that Q and V are known) u The evaluation problem has a direct solution. The others are harder, and involve optimization CSE5230 - Data Mining, 2004 Lecture 9.21 The evaluation problem The simplest way to solve the evaluation problem is to go u over all possible state sequences Ir of length T and calculate the probability that each of them produced O: rmax P (O | λ ) = ∑ P(O | I r , λ ) P( I r ) r =1 where u P (O | I , λ ) = bi1O1 bi2O2 ...biT OT P ( I | λ ) = π i1 ai1i 2 ai2 i3 ...aiT −1iT While this could in principle be done, there is a problem: u computational complexity. Our model λ has N states. There are thus rmax = NT possible sequences of length T. The computational complexity is O(NT T). v Even if we have small N and T, this is not feasible: for N = 5 and T = 100, there are ~1072 computations needed! CSE5230 - Data Mining, 2004 Lecture 9.22 Page 11
  • 12. The Forward Algorithm (1) u Luckily, there is a solution to this problem: we do not need to do the full calculation u We can do a recursive evaluation, using an auxiliary variable α t(i), called the forward variable: α t (i) = P ((O (1), O (2),..., O (t )), i(t ) = qi | λ ) this is the probability of the partial observation sequence (up until time t) and internal state q i(t) given the model λ u Why does this help? Because in a first-order HMM, the transition and emission probabilities only depend on the current state. This makes a recursive calculation possible. CSE5230 - Data Mining, 2004 Lecture 9.23 The Forward Algorithm (2) u We can calculate a t+1(j) – the next step – using the previous one: N α t +1 (i ) = b jOt+1 ∑ aijα t (i) i =1 u This just says that the probability of the observation sequence up to time t + 1 and being in state qj at time t + 1 is: the probability of observing symbol Ot+1 when in state qj, bjOt+1, times the sum of the probabilities of getting to state qj from state qi times the probability of the observation sequence up to time t and being in state qi that we have to keep track of α t(i) for all N u Note possible internal states CSE5230 - Data Mining, 2004 Lecture 9.24 Page 12
  • 13. The Forward Algorithm (3) If we know αT (i) for all the possible states, we can u calculate the overall probability of the sequence given the model (as we wanted on slide 9.22): N P( O | λ ) = ∑ α T (i) i =1 We can now specify the forward algorithm, which will let u us calculate the αT(i): for i = 1 to N { α1 (i) = π i biO1 } /* initialize */ for t = 1 to T – 1 { for j = 1 to N { N α t +1 (i) = b jOt +1 ∑ a ijα t ( i) i=1 } } CSE5230 - Data Mining, 2004 Lecture 9.25 The Forward Algorithm (4) u The forward algorithm allows us to calculate P(O|λ), and has computational complexity O(N2 T), as can be seen from the algorithm vThis is linear in T, rather than exponential, as the direct calculation was. This means that it is feasible u We can use P(O|λ) in the Bayesian equation on slide 9.20 to use a set of HMMs a classifier vThe other terms in the equation can be estimated from the training data, as with the Naïve Bayesian Classifier CSE5230 - Data Mining, 2004 Lecture 9.26 Page 13
  • 14. The Viterbi and Forward- Backward Algorithms u The most common solution to the decoding problem uses the Viterbi algorithm [RaJ1986], which also uses partial sequences and recursion u There is no known method for finding an optimal solution to the learning problem. The most commonly used optimization technique is known as the forward-backward algorithm, or the Baum- Welch algorithm [RaJ1986,DHS2000]. It is a generalized expectation maximization algorithm. See references for details CSE5230 - Data Mining, 2004 Lecture 9.27 HMMs for web-mining (1) u HMMs can be used to analyse to clickstreams the web users leave in the log files of web servers (see slide 9.6). u Ypma and Heskes (2002) report the application of HMMs to: vWeb page categorization vUser clustering u They applied their system to real-word web logs from a large Dutch commercial web site [YpH2002] CSE5230 - Data Mining, 2004 Lecture 9.28 Page 14
  • 15. HMMs for web-mining (2) uA mixture of HMMs is used to learn page categories (the hidden state variables) and inter- category transition probabilities vThe page URLs are treated as the observations u Web surfer types are modeled by HMMs u A clickstream is modeled as a mixture of HMMs, to account for several types of user being present at once CSE5230 - Data Mining, 2004 Lecture 9.29 HMMs for web-mining (3) When applied to data from a commercial web site, page u categories were learned as hidden states v Inspection of the emission probabilities B showed that several page categories were discovered: » Shop info » Start, customer/corporate/promotion » Tools » Search, download/products Four user types were discovered too (HMMs with different u state prior and transition probabilities) v Two types dominated: » General interest users » Shop users v The starting state was the most important difference between the types CSE5230 - Data Mining, 2004 Lecture 9.30 Page 15
  • 16. References u [DHS2000] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification (2nd Edn), Wiley, New York, NY, 2000, pp. 128-138 u [RaJ1986] L. R. Rabiner and B. H. Juang, An introduction to hidden Markov models, IEEE Magazine on Acoustics, Speech and Signal Processing, 3, 1, pp. 4 -16, January 1986. u [YpH2002] Alexander Ypma and Tom Heskes, Categorization of web pages and user clustering with mixtures of hidden Markov models, In Proceedings of the International Workshop on Web Knowledge Discovery and Data mining (WEBKDD'02), Edmonton, Canada, July 17 2002. CSE5230 - Data Mining, 2004 Lecture 9.31 Page 16