SlideShare uma empresa Scribd logo
1 de 77
Baixar para ler offline
Model-based clustering for BSS
usage mining,
a case study with the velib’ system of Paris




Etienne Côme

15/10/2012
Outline



Bike Sharing Systems (BSS)


What is fun with BSS ?

    Relatively new systems
    Rapidly diffusing (EU and US nowadays, Hangzhou, ...)
    Important sucesses
    Abundant usage data
    In interesting and original forms :
           Origins / Destinations + timestamp
           Real-time stations balances
    Interesting and new problematics




  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   2 / 75
Outline



Outline
1   Introduction
       Problematics
       Usage data : trips records
       Velib’ in few numbers and pictures
       Tools and approach
2   Stations clustering using temporal usage profiles
      Data representation : count time series
      Generative model : naive Poisson mixture
      Analysis of the results on the Velib’ dataset
3   Latent Dirichlet Allocation (LDA) for trips activity recognition
      Data representation : dynamical O/D matrices
      Generative model under LDA
      Analysis of the results on the Velib’ dataset

    Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   3 / 75
Introduction   Problematics



Problematics

Operational objectives

    Planning new systems : position, size of the stations
    Quality of service : bikes re-dispatch,...
    ...


Mining objectives

    Building predictive model of usage
    Finding spatio-temporal patterns
    Better understanding of the usages
    ...


  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   4 / 75
Introduction   Usage data : trips records



Raw data
Trips data

     departure time stamp
     departure station
     arrival time stamp
     arrival station
     type of subscription
! Will be converted in contingency tables (i.e. tensors of counts)

Data sources
     ! Velib’, 2 month
     Open data : Barclays (Londre), Boston, ...

   Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining         15/10/2012   5 / 75
Introduction   Velib’ in few numbers and pictures




               in few numbers


BSS size :

    1200 stations
    ≈ 40000 slots
    ≈ 16000 bikes
    ≈ 100 000 trips/day
    27% trips = day subscription
    73% trips = year subscription




  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining                 15/10/2012   6 / 75
Introduction     Velib’ in few numbers and pictures



Global behavior

   100 000                                             140 000              Day subscription
                                                                            free use limit


                                                       120 000
                                                                                      Year subscription
    80 000
                                                                                      free use limit

                                                       100 000


    60 000
                                                        80 000
Trips




                                                        60 000
    40 000


                                                        40 000

    20 000
                                                        20 000



        0                                                   0

             0               5                  10               0         20             40              60       80       100
                       Distances (Km)                                                             Duration (min)


                       F IG . 1: Histograms of trips lengths and durations


        Etienne Côme (IFSTTAR)          Model-based clustering for BSS usage mining                            15/10/2012         7 / 75
Introduction    Velib’ in few numbers and pictures



Temporal effects
          35 000

                                                                                    Subscription :
          30 000                                                                            Short

                                                                                            Long

          25 000
  Trips




          20 000



          15 000



          10 000



           5 000




                   Monday   Tuesday     Wednesday     Thursday    Friday        Saturday    Sunday

                                                  Time


               F IG . 2: Number of Trips / hour (short / long subscriptions)


  Etienne Côme (IFSTTAR)       Model-based clustering for BSS usage mining                    15/10/2012   8 / 75
Introduction        Velib’ in few numbers and pictures



Temporal effects


                                                7 500


                      Average number of trips


                                                5 000




                                                2 500




                                                   0
                                                        0   2   4    6    8    10     12    14   16   18   20   22
                                                                                    Hours


               F IG . 3: Number of trips in week day / en week-end

  Etienne Côme (IFSTTAR)                                Model-based clustering for BSS usage mining                      15/10/2012   9 / 75
Introduction   Velib’ in few numbers and pictures



Spatial effects




                 F IG . 4: Incoming trips map [6h,7h] for week days

  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining                 15/10/2012   10 / 75
Introduction   Velib’ in few numbers and pictures



Spatial effects


                                             24



                                             20
                      Mean activity / hour


                                             16



                                             12



                                              8



                                              4




                                                          2            4       6          8         10
                                                  Distance from the center ("Les Halles") in Km


                F IG . 5: Stations activities / distance to "Les Halles"

  Etienne Côme (IFSTTAR)                            Model-based clustering for BSS usage mining                   15/10/2012   11 / 75
Introduction   Tools and approach



Approach, exploratory data analysis


General methodologie

       Use clustering algorithms to find interesting patterns in the data
       Confront the found clusters to the city geography and sociology
       ⇒ Extract important factors that influence BSS system behavior.


2 developments :

 1     Find clusters of stations with similar temporal usage pattern
 2     Find latent activities that govern the BSS system dynamics




     Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   12 / 75
Introduction   Tools and approach



Tools, model based clustering

General methodologie
Imagine a data generation process
⇒ which include non-observed or latent variables
Latent variables can be discrete or continuous

Examples of latent variables

     Species for flowers
     Topics for texts
     Communities for graph vertices
     ...



   Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   13 / 75
Introduction    Tools and approach



Generative approach
Clustering
Model-based clustering :
 1     Draw the cluster of sample (i)
 2     Depending on the cluster draw the observed values of (i)

                                      0.05


                                      0.04


                                      0.03
                               f(x)




                                      0.02


                                      0.01


                                        0
                                        -80     -60   -40   -20   0   20   40
                                                              x




                     F IG . 6: Example of 1D Gaussian mixture model

     Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining         15/10/2012   14 / 75
Introduction   Tools and approach



Data generation process
Graphical model representation




1. Draw the cluster of sample (i)

                                      Zi ∼ M(1, π)
⇒ π prior proportions of the clusters.

   Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   15 / 75
Introduction   Tools and approach



Data generation process
Graphical model representation




2. Depending on the cluster draw the observed values of (i)

                    p(x|Zik = 1) = f (x; θ k ),           ∀k ∈ {1, . . . , K }.
⇒ f can be tuned to exploit specificities of the problem.

   Etienne Côme (IFSTTAR)    Model-based clustering for BSS usage mining          15/10/2012   16 / 75
Introduction   Tools and approach



Model based clustering framework



Task and tools
    Inferring the parameters :
    ⇒ EM algorithm or Variational EM for complex models




  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   17 / 75
Introduction   Tools and approach



Model based clustering framework



Task and tools
    Inferring the parameters :
    ⇒ EM algorithm or Variational EM for complex models
    Finding the clustering
    ⇒ Byproducts of EM




  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   17 / 75
Introduction   Tools and approach



Model based clustering framework



Task and tools
    Inferring the parameters :
    ⇒ EM algorithm or Variational EM for complex models
    Finding the clustering
    ⇒ Byproducts of EM
    Fixing the number of clusters
    ⇒ Model selection criterion : BIC, AIC, ICL, perplexity.




  Etienne Côme (IFSTTAR)   Model-based clustering for BSS usage mining   15/10/2012   17 / 75
Stations clustering using temporal usage profiles




      Stations clustering using
      temporal usage profiles



Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining   15/10/2012   18 / 75
Stations clustering using temporal usage profiles



Stations clustering using temporal usage profiles


Objectives :

    Find groups of stations with similar temporal usage profiles
    Temporal usage profiles = incoming, outgoing activity / hour
    Taking into account the week-days /week-end discrepancy
    With a model for counts data
    Cross the results with possible explanatory variables :
    population, employments, amenities, ...




  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining   15/10/2012   19 / 75
Stations clustering using temporal usage profiles   Data representation : count time series



Data representation : count time series
Observed data :
      out
     Xsdt : # of bikes taken at station s during day d at hour t
      in
     Xsdt : # of bikes returned at station s during day d at hour t
                                in             in      out            out
                        Xsd = (Xsd1 , . . . , Xsd24 , Xsd1 , . . . , Xsd24 )
⇒ X tensor of size N × D × T .
⇒ temporal behavior / stations.

Variables
     Xsd (observed) : # of bike leaving/coming
     Zs (latent) : cluster of station s
     Wd (observed) : cluster of days (week / week-end)

   Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   20 / 75
Stations clustering using temporal usage profiles   Generative model : naive Poisson mixture



Generative model : naive Poisson mixture




                           F IG . 7: Graphical model representation


Parameters, Θ

    αs = stations attractivity effects
    π = (π1 , . . . , πK ) cluster proportions
    λ = (λklt ) temporal profiles of the clusters
  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                   15/10/2012   21 / 75
Stations clustering using temporal usage profiles   Generative model : naive Poisson mixture



Generative model
Naive Poisson mixture


                                                     Zs ∼ M(1, π)
                        Xsd1 ⊥ . . . ⊥ XsdT
                             ⊥       ⊥                       |     {Zsk = 1, Wdl = 1}
                Xsdt |{Zsk = 1, Wdl = 1} ∼ P(αs λklt )


Constraints

                                     Dl λklt = DT , ∀k ∈ {1, . . . , K },
                               l,t

with Dl number of day in cluster l.

   Etienne Côme (IFSTTAR)            Model-based clustering for BSS usage mining                  15/10/2012   22 / 75
Stations clustering using temporal usage profiles        Generative model : naive Poisson mixture



Parameters estimation, likelihood

Marginal likelihood
                                                                                                   

               L(Θ; X) =                 log              πk            p(Xsdt ; αs λklt )Wdl                    (1)
                                    s               k            d,t,l



Completed likelihood
                                                                                                      

           Lc(Θ; X, Z) =                      Zsk log πk                   p(Xsdt ; αs λklt )Wdl                 (2)
                                        s,k                         d,t,l

where Z is unknown.

  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                        15/10/2012   23 / 75
Stations clustering using temporal usage profiles      Generative model : naive Poisson mixture




EM algorithm
⇒ Straightforward solution for parameters estimation EM :

E step
Conditional expectation of Lc given the current parameters
                                                                                                         

    E[Lc(Θ, x, Z)|x, Θ(q) ] =                         tsk log πk                p(xsdt ; αs λklt )Wdl           (3)
                                               s,k                       d,t,l

with tsk the posteriori probabilities :
                                              (q)                          (q) (q)
                                         πk           d,t,l   p(xsdt ; αs λklt )Wdl
                      tsk     =                 (q)                              (q) (q)
                                                                                                                  (4)
                                          k   πk         d,t,l   p(xsdt ; αs λklt )Wdl



   Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                      15/10/2012   24 / 75
Stations clustering using temporal usage profiles   Generative model : naive Poisson mixture




EM algorithm
⇒ Straightforward solution for parameters estimation EM :


M step
Maximization of the lower bound with respect to the parameters
                                                             1
     αs : mean station activity αs =
                                ˆ                           DT       d,t   Xsdt ,
                                                                 1
     πk : proportion of cluster k , πk =
                                    ˆ                            N     s tsk
     λklt : activity of time frame t for cluster k , for week day or during
     the week-end (day cluster l)

                             ˆ                      1
                             λklt =                                            tsk Wdl Xsdt                    (5)
                                             s tsk αs        d   Wdl
                                                                        s,d




   Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                   15/10/2012   25 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Results

Setting

    One month of data (September)
    Number of clusters (K=8) set manually
    ⇒ good trade off between interpretability and fit of the clustering


Outputs

    Zs : station s clusters
    λk : temporal profile of cluster k
    αs : stations s attractivity



  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   26 / 75
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Railway stations
                                                 Week                         Week-end
                                   5


                                   4




                                                                                                 Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                 Arrivals
                                   2


                                   1


                                   0
                                       0   5    10   15    20      0     5    10    15    20
                                                                Hours

  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   27 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Railway stations




  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   28 / 75
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Parks
                                                 Week                         Week-end
                                   5


                                   4




                                                                                                 Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                 Arrivals
                                   2


                                   1


                                   0
                                       0   5    10   15    20      0     5    10    15    20
                                                                Hours


  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   29 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Parks




  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   30 / 75
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Spare time, night
                                                 Week                         Week-end
                                   5


                                   4




                                                                                                 Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                 Arrivals
                                   2


                                   1


                                   0
                                       0   5    10   15    20      0     5    10    15    20
                                                                Hours

  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   31 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Spare time, night




  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   32 / 75
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Spare time, night and week-end
                                                  Week                        Week-end
                                   5


                                   4




                                                                                                 Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                 Arrivals
                                   2


                                   1


                                   0
                                       0   5    10    15   20      0     5    10    15    20
                                                                Hours

  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   33 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Spare time, night and week-end




  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   34 / 75
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Housing
                                                Week                         Week-end
                                  5


                                  4




                                                                                                Departures
                                  3


                                  2


                                  1
                       Activity




                                  0
                                  5


                                  4


                                  3




                                                                                                Arrivals
                                  2


                                  1


                                  0
                                      0   5    10   15    20      0     5    10    15    20
                                                               Hours

 Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   35 / 75
Housing   Inhabitants / ha
                     0
                   200
                   400
                   600
                   800
                 1 000
                 1 200
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Employment (1)
                                                 Week                         Week-end
                                   5


                                   4




                                                                                                 Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                 Arrivals
                                   2


                                   1


                                   0
                                       0   5    10   15    20      0     5    10    15    20
                                                                Hours

  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   37 / 75
Stations clustering using temporal usage profiles           Analysis of the results on the Velib’ dataset



Employment (2)
                                                 Week                         Week-end
                                   5


                                   4




                                                                                                 Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                 Arrivals
                                   2


                                   1


                                   0
                                       0   5    10   15    20      0     5    10    15    20
                                                                Hours

  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                        15/10/2012   38 / 75
Employment (1 and 2)




                       Jobs / ha
                               0
                             500
                           1 000
                           1 500
                           2 000
Stations clustering using temporal usage profiles          Analysis of the results on the Velib’ dataset



Mixed usage
                                                 Week                        Week-end
                                   5


                                   4




                                                                                                Departures
                                   3


                                   2


                                   1
                        Activity




                                   0
                                   5


                                   4


                                   3




                                                                                                Arrivals
                                   2


                                   1


                                   0
                                       0   5    10   15    20      0    5    10    15    20
                                                                Hours

  Etienne Côme (IFSTTAR)                   Model-based clustering for BSS usage mining                       15/10/2012   40 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Mixed usage




  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   41 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Crossing with population/employments/services rates

                                          hab/ha            emp/ha          serv/ha            com/ha
                                           162               237              4.2                3.7
         Spare time (1)                    367               189              6.3                4.4
         Spare time (2)                    261               322              7.7                6.9
         Parks                             172                90               2                 1.7
         Railway stations                  209               206              2.4                1.8
         Housing                           375               108              3.8                2.7
         Employment (1)                    138               409              4.5                2.8
         Employment (2)                    157               456              5.7                5.6
         Mixed usage                       301               163              3.8                2.8
TAB . 1: Mean of each cluster with respect to population, employment,
services and shops densities . Sources "Recensement 2008", "Base
permanente des équipements", Insee.


   Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   42 / 75
Stations clustering using temporal usage profiles   Analysis of the results on the Velib’ dataset



Conclusion on stations clustering
Discussion on the model
    Model adapted to counts
    Scaling factors for stations important
    Stations described by incoming and outgoing flow dynamics
    Taking into account week-day week-end differences


Discussion on the results
    Clusters are interpretable
    Population, employment and amenities densities are highly
    explanatory for the clusters
    Temporal profiles are also interpretable and informative

  Etienne Côme (IFSTTAR)           Model-based clustering for BSS usage mining                       15/10/2012   43 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition




       Latent Dirichlet Allocation
                  (LDA),
      for trips activity recognition



   Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining   15/10/2012   44 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition




Objectives

      Decompose, the trips into interpretable clusters
      ⇒ look for stationarities and change points in the OD dynamics
      LDA with documents = small bags of successive trips


Analyse the found clusters with respect to their :

      Temporal positions, cycles
      Spatial distribution of flows
      Spatial distribution of incoming / outgoing flows per stations




    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining   15/10/2012   45 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Data representation : dynamical O/D matrices



Data representation : dynamical O/D matrices


Observed data :
      Xijt : # of bikes that were
          1    taken at station i
          2    returned at station j
          3    at time t
      t ∈ {1, . . . , Nt } :
      i, j ∈ {1, . . . , Ns } : set of stations
⇒ Xijt tensor of dimension Ns × Ns × Nt .
⇒ taking into account spatial and temporal BSS behavior




    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                     15/10/2012   46 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Generative model under LDA



LDA, background
LDA = Latent Dirichlet Allocation
      Bayesian mixture for discrete data
      ⇒ originally to find topics in text corpus
      Each document (bag of words) is a mixture of topics
      Each topic has its own words probabilities vector




                         F IG . 8: Graphical model representation of LDA.

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining            15/10/2012   47 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Generative model under LDA



LDA for dynamical O/D matrices analysis

Hypothesis :

      Local stationarity of BSS behaviour / OD
      Cyclostationarity : week, day


Small bags of successive trips ≈ stationarity of OD
⇒ Documents (bags of words) = bags of successive trips (5000)

, with :

      Words = Origin/Destination couples
      Topics = Latent activities


    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining            15/10/2012   48 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Generative model under LDA



LDA, for dynamical O/D matrices analysis

For each activity a, draw an O/D matrices generator :

                                                         Λa ∼ D(β)


For each "bag of trips" t ∈ {1, . . . , Nt } :

  1     Draw the activities proportions : πt ∼ D(α)
  2     For each trips of the bag t :
               Draw its activity A : A ∼ M(1, πt )
               Draw an O/D couple D using activity A generator :
                                                     D ∼ M(1, ΛA )


      Etienne Côme (IFSTTAR)              Model-based clustering for BSS usage mining            15/10/2012   49 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition                               Analysis of the results on the Velib’ dataset



Fixing the number of activities
perplexity analysis

      Perplexity = f( likelihood of test data )
      Clear drop off at K=5

                                                           165000
                                                                    q       q
                                                                        q




                                                           160000
                                              perplexity




                                                           155000
                                                                                                                q

                                                                                    q                       q
                                                                                q           q
                                                                                        q           q                q       q
                                                                                                                         q
                                                                                                        q



                                                                            4               8                   12
                                                                                                K




F IG . 9: Perplexity on the September dataset with respect to the number of
latent activities.

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                                                     15/10/2012   50 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Temporal results : πt


               9000
trips / hour




               6000


               3000


                  0
                 avril 11                            avril 18                                 avril 25




                                         F IG . 10: Temporal evolution of πt


Remarks :
                  Cyclostationarity clearly visible (even holidays)
                  Low mixture between the latent activities
                  Interpretable temporal clusters : Home ↔ Work, Lunch,...

               Etienne Côme (IFSTTAR)        Model-based clustering for BSS usage mining                         15/10/2012   51 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : Λa as flows




F IG . 11: Latent activity "House→Work commute", flows (blue for f=10/10 000)


    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   52 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : Λa as flows




              F IG . 12: Latent activity "Lunch", flows (blue for f=10/10 000)


    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   53 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : Λa as flows




F IG . 13: Latent activity "Work→House commute", flows (blue for f=10/10 000)


    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   54 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : Λa as flows




            F IG . 14: Latent activity "Evening", flows (blue for f=10/10 000)


    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   55 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : Λa as flows




         F IG . 15: Latent activity "Spare time", flows (blue for f=10/10 000)


    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   56 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition      Analysis of the results on the Velib’ dataset



Incoming / Outgoing specificities, question :
Which stations have an increased in/out-degree for a latent activity a ?

                                                 a
      Introduce stations incoming specificities ISs and outgoing
      specificities OSsa :


                           a          a            a           ag                                                 g
                         ISs = log(pins /pins ), OSs = log(pouts /pouts ),                                                     (6)
               a       a
      with pins , pouts the probabilities that a trips end/start in station s
      for activity a :
                             a
                          pins =     Λa , pouts =
                                       js
                                               a
                                                        Λa ,
                                                         sj
                                                          j                                j
                  g       g
      and      pins , pouts           the global probabilities that a trips end/start in
      station s :
                                          g             j,t   Xjst              g              j,t   Xsjt
                                    pins =                             , pouts =                              .
                                                        i,j,t   Xijt                           i,j,t   Xijt

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                             15/10/2012   57 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : incoming specificities




F IG . 16: Latent activity "House→Work commute", stations incoming
specificity
    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   58 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : outgoing specificities




F IG . 17: Latent activity "House→Work commute", stations outgoing specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   59 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset


Expected bike balance, question :
Positive/negative bike balance of stations for a latent activity a ?

      The O/D matrix D follow a multinomial law of parameter Ndep
      (number of trips) and Λa :
                                                      D ∼ M(Ndep , Λa ),
      The bike balance Bs for a station s is thus given by :
                                                     Incoming bikes          Outgoing bikes

                                          Bs =                  Djs      −              Dsj
                                                            j                       j

      And the expectation of the balance vector B is thus equal to :
                                             E[B] = Ndep (Λa )t − Λa v,                                                    (7)
      with v = (1, . . . , 1)t .
    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   60 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Spatial results : expected bike balance
                                                                                                       Balance
                                                                                                           -30
                                                                                                           -20
                                                                                                           -10
                                                                                                             0
                                                                                                           10
                                                                                                            20
                                                                                                            30




F IG . 18: Latent activity "House→Work commute", stations expected balances
with Ndep = 10 000
    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   61 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Lunch", incoming specificity




                                  F IG . 19: Stations incoming specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   62 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Lunch", outgoing specificity




                                  F IG . 20: Stations outgoing specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   63 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Lunch", balance
                                                                                                       Balance
                                                                                                           -30
                                                                                                           -20
                                                                                                           -10
                                                                                                             0
                                                                                                           10
                                                                                                            20
                                                                                                            30




                 F IG . 21: Stations expected balances with Ndep = 10 000

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   64 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Work→House commute", incoming specificity




                                  F IG . 22: Stations incoming specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   65 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Work→House commute", outgoing specificity




                                  F IG . 23: Stations outgoing specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   66 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Work→House commute", balance
                                                                                                      Balance
                                                                                                          -30
                                                                                                          -20
                                                                                                          -10
                                                                                                            0
                                                                                                          10
                                                                                                           20
                                                                                                           30




                F IG . 24: Stations expected balances with Ndep = 10 000

   Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   67 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Evening" incoming specificity




                                  F IG . 25: Stations incoming specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   68 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Evening", outgoing specificity




                                  F IG . 26: Stations outgoing specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   69 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Evening", balance
                                                                                                       Balance
                                                                                                           -30
                                                                                                           -20
                                                                                                           -10
                                                                                                             0
                                                                                                           10
                                                                                                            20
                                                                                                            30




                 F IG . 27: Stations expected balances with Ndep = 10 000

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   70 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Spare time", incoming specificity




                                  F IG . 28: Stations incoming specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   71 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Spare time", outgoing specificity




                                  F IG . 29: Stations outgoing specificity

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   72 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



"Spare time", balance
                                                                                                       Balance
                                                                                                           -30
                                                                                                           -20
                                                                                                           -10
                                                                                                             0
                                                                                                           10
                                                                                                            20
                                                                                                            30




                 F IG . 30: Stations expected balances with Ndep = 10 000

    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   73 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition   Analysis of the results on the Velib’ dataset



Conclusion on LDA for activities recognition




      Interpretable latent activities
      Give good picture of city "pulse" and geography
      Better understanding of the system behaviour
      Strong evidence of cyclostationarity
      Week-day / Week-end pattern




    Etienne Côme (IFSTTAR)                Model-based clustering for BSS usage mining                         15/10/2012   74 / 75
Thanks for your attention




                               @comeetie, etienne.come@ifsttar.fr


Ifsttar
Centre de Marne-la-Vallée
Batiment le “Descartes 2”
2, rue de la Butte Verte F-93166 Noisy le Grand cedex

Mél. etienne.come@ifsttar.fr
Tél. +33 (0)1 45 92 56 57


Site : www.ifsttar.fr



      Etienne Côme (IFSTTAR)            Model-based clustering for BSS usage mining   15/10/2012   75 / 75

Mais conteúdo relacionado

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

animatics

  • 1. Model-based clustering for BSS usage mining, a case study with the velib’ system of Paris Etienne Côme 15/10/2012
  • 2. Outline Bike Sharing Systems (BSS) What is fun with BSS ? Relatively new systems Rapidly diffusing (EU and US nowadays, Hangzhou, ...) Important sucesses Abundant usage data In interesting and original forms : Origins / Destinations + timestamp Real-time stations balances Interesting and new problematics Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 2 / 75
  • 3. Outline Outline 1 Introduction Problematics Usage data : trips records Velib’ in few numbers and pictures Tools and approach 2 Stations clustering using temporal usage profiles Data representation : count time series Generative model : naive Poisson mixture Analysis of the results on the Velib’ dataset 3 Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matrices Generative model under LDA Analysis of the results on the Velib’ dataset Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 3 / 75
  • 4. Introduction Problematics Problematics Operational objectives Planning new systems : position, size of the stations Quality of service : bikes re-dispatch,... ... Mining objectives Building predictive model of usage Finding spatio-temporal patterns Better understanding of the usages ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 4 / 75
  • 5. Introduction Usage data : trips records Raw data Trips data departure time stamp departure station arrival time stamp arrival station type of subscription ! Will be converted in contingency tables (i.e. tensors of counts) Data sources ! Velib’, 2 month Open data : Barclays (Londre), Boston, ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 5 / 75
  • 6. Introduction Velib’ in few numbers and pictures in few numbers BSS size : 1200 stations ≈ 40000 slots ≈ 16000 bikes ≈ 100 000 trips/day 27% trips = day subscription 73% trips = year subscription Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 6 / 75
  • 7. Introduction Velib’ in few numbers and pictures Global behavior 100 000 140 000 Day subscription free use limit 120 000 Year subscription 80 000 free use limit 100 000 60 000 80 000 Trips 60 000 40 000 40 000 20 000 20 000 0 0 0 5 10 0 20 40 60 80 100 Distances (Km) Duration (min) F IG . 1: Histograms of trips lengths and durations Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 7 / 75
  • 8. Introduction Velib’ in few numbers and pictures Temporal effects 35 000 Subscription : 30 000 Short Long 25 000 Trips 20 000 15 000 10 000 5 000 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Time F IG . 2: Number of Trips / hour (short / long subscriptions) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 8 / 75
  • 9. Introduction Velib’ in few numbers and pictures Temporal effects 7 500 Average number of trips 5 000 2 500 0 0 2 4 6 8 10 12 14 16 18 20 22 Hours F IG . 3: Number of trips in week day / en week-end Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 9 / 75
  • 10. Introduction Velib’ in few numbers and pictures Spatial effects F IG . 4: Incoming trips map [6h,7h] for week days Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 10 / 75
  • 11. Introduction Velib’ in few numbers and pictures Spatial effects 24 20 Mean activity / hour 16 12 8 4 2 4 6 8 10 Distance from the center ("Les Halles") in Km F IG . 5: Stations activities / distance to "Les Halles" Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 11 / 75
  • 12. Introduction Tools and approach Approach, exploratory data analysis General methodologie Use clustering algorithms to find interesting patterns in the data Confront the found clusters to the city geography and sociology ⇒ Extract important factors that influence BSS system behavior. 2 developments : 1 Find clusters of stations with similar temporal usage pattern 2 Find latent activities that govern the BSS system dynamics Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 12 / 75
  • 13. Introduction Tools and approach Tools, model based clustering General methodologie Imagine a data generation process ⇒ which include non-observed or latent variables Latent variables can be discrete or continuous Examples of latent variables Species for flowers Topics for texts Communities for graph vertices ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 13 / 75
  • 14. Introduction Tools and approach Generative approach Clustering Model-based clustering : 1 Draw the cluster of sample (i) 2 Depending on the cluster draw the observed values of (i) 0.05 0.04 0.03 f(x) 0.02 0.01 0 -80 -60 -40 -20 0 20 40 x F IG . 6: Example of 1D Gaussian mixture model Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 14 / 75
  • 15. Introduction Tools and approach Data generation process Graphical model representation 1. Draw the cluster of sample (i) Zi ∼ M(1, π) ⇒ π prior proportions of the clusters. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 15 / 75
  • 16. Introduction Tools and approach Data generation process Graphical model representation 2. Depending on the cluster draw the observed values of (i) p(x|Zik = 1) = f (x; θ k ), ∀k ∈ {1, . . . , K }. ⇒ f can be tuned to exploit specificities of the problem. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 16 / 75
  • 17. Introduction Tools and approach Model based clustering framework Task and tools Inferring the parameters : ⇒ EM algorithm or Variational EM for complex models Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
  • 18. Introduction Tools and approach Model based clustering framework Task and tools Inferring the parameters : ⇒ EM algorithm or Variational EM for complex models Finding the clustering ⇒ Byproducts of EM Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
  • 19. Introduction Tools and approach Model based clustering framework Task and tools Inferring the parameters : ⇒ EM algorithm or Variational EM for complex models Finding the clustering ⇒ Byproducts of EM Fixing the number of clusters ⇒ Model selection criterion : BIC, AIC, ICL, perplexity. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
  • 20. Stations clustering using temporal usage profiles Stations clustering using temporal usage profiles Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 18 / 75
  • 21. Stations clustering using temporal usage profiles Stations clustering using temporal usage profiles Objectives : Find groups of stations with similar temporal usage profiles Temporal usage profiles = incoming, outgoing activity / hour Taking into account the week-days /week-end discrepancy With a model for counts data Cross the results with possible explanatory variables : population, employments, amenities, ... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 19 / 75
  • 22. Stations clustering using temporal usage profiles Data representation : count time series Data representation : count time series Observed data : out Xsdt : # of bikes taken at station s during day d at hour t in Xsdt : # of bikes returned at station s during day d at hour t in in out out Xsd = (Xsd1 , . . . , Xsd24 , Xsd1 , . . . , Xsd24 ) ⇒ X tensor of size N × D × T . ⇒ temporal behavior / stations. Variables Xsd (observed) : # of bike leaving/coming Zs (latent) : cluster of station s Wd (observed) : cluster of days (week / week-end) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 20 / 75
  • 23. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture Generative model : naive Poisson mixture F IG . 7: Graphical model representation Parameters, Θ αs = stations attractivity effects π = (π1 , . . . , πK ) cluster proportions λ = (λklt ) temporal profiles of the clusters Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 21 / 75
  • 24. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture Generative model Naive Poisson mixture Zs ∼ M(1, π) Xsd1 ⊥ . . . ⊥ XsdT ⊥ ⊥ | {Zsk = 1, Wdl = 1} Xsdt |{Zsk = 1, Wdl = 1} ∼ P(αs λklt ) Constraints Dl λklt = DT , ∀k ∈ {1, . . . , K }, l,t with Dl number of day in cluster l. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 22 / 75
  • 25. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture Parameters estimation, likelihood Marginal likelihood   L(Θ; X) = log  πk p(Xsdt ; αs λklt )Wdl  (1) s k d,t,l Completed likelihood   Lc(Θ; X, Z) = Zsk log πk p(Xsdt ; αs λklt )Wdl  (2) s,k d,t,l where Z is unknown. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 23 / 75
  • 26. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture EM algorithm ⇒ Straightforward solution for parameters estimation EM : E step Conditional expectation of Lc given the current parameters   E[Lc(Θ, x, Z)|x, Θ(q) ] = tsk log πk p(xsdt ; αs λklt )Wdl  (3) s,k d,t,l with tsk the posteriori probabilities : (q) (q) (q) πk d,t,l p(xsdt ; αs λklt )Wdl tsk = (q) (q) (q) (4) k πk d,t,l p(xsdt ; αs λklt )Wdl Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 24 / 75
  • 27. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture EM algorithm ⇒ Straightforward solution for parameters estimation EM : M step Maximization of the lower bound with respect to the parameters 1 αs : mean station activity αs = ˆ DT d,t Xsdt , 1 πk : proportion of cluster k , πk = ˆ N s tsk λklt : activity of time frame t for cluster k , for week day or during the week-end (day cluster l) ˆ 1 λklt = tsk Wdl Xsdt (5) s tsk αs d Wdl s,d Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 25 / 75
  • 28. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Results Setting One month of data (September) Number of clusters (K=8) set manually ⇒ good trade off between interpretability and fit of the clustering Outputs Zs : station s clusters λk : temporal profile of cluster k αs : stations s attractivity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 26 / 75
  • 29. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Railway stations Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 27 / 75
  • 30. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Railway stations Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 28 / 75
  • 31. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Parks Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 29 / 75
  • 32. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Parks Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 30 / 75
  • 33. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Spare time, night Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 31 / 75
  • 34. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Spare time, night Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 32 / 75
  • 35. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Spare time, night and week-end Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 33 / 75
  • 36. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Spare time, night and week-end Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 34 / 75
  • 37. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Housing Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 35 / 75
  • 38. Housing Inhabitants / ha 0 200 400 600 800 1 000 1 200
  • 39. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Employment (1) Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 37 / 75
  • 40. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Employment (2) Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 38 / 75
  • 41. Employment (1 and 2) Jobs / ha 0 500 1 000 1 500 2 000
  • 42. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Mixed usage Week Week-end 5 4 Departures 3 2 1 Activity 0 5 4 3 Arrivals 2 1 0 0 5 10 15 20 0 5 10 15 20 Hours Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 40 / 75
  • 43. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Mixed usage Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 41 / 75
  • 44. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Crossing with population/employments/services rates hab/ha emp/ha serv/ha com/ha 162 237 4.2 3.7 Spare time (1) 367 189 6.3 4.4 Spare time (2) 261 322 7.7 6.9 Parks 172 90 2 1.7 Railway stations 209 206 2.4 1.8 Housing 375 108 3.8 2.7 Employment (1) 138 409 4.5 2.8 Employment (2) 157 456 5.7 5.6 Mixed usage 301 163 3.8 2.8 TAB . 1: Mean of each cluster with respect to population, employment, services and shops densities . Sources "Recensement 2008", "Base permanente des équipements", Insee. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 42 / 75
  • 45. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset Conclusion on stations clustering Discussion on the model Model adapted to counts Scaling factors for stations important Stations described by incoming and outgoing flow dynamics Taking into account week-day week-end differences Discussion on the results Clusters are interpretable Population, employment and amenities densities are highly explanatory for the clusters Temporal profiles are also interpretable and informative Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 43 / 75
  • 46. Latent Dirichlet Allocation (LDA) for trips activity recognition Latent Dirichlet Allocation (LDA), for trips activity recognition Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 44 / 75
  • 47. Latent Dirichlet Allocation (LDA) for trips activity recognition Objectives Decompose, the trips into interpretable clusters ⇒ look for stationarities and change points in the OD dynamics LDA with documents = small bags of successive trips Analyse the found clusters with respect to their : Temporal positions, cycles Spatial distribution of flows Spatial distribution of incoming / outgoing flows per stations Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 45 / 75
  • 48. Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matrices Data representation : dynamical O/D matrices Observed data : Xijt : # of bikes that were 1 taken at station i 2 returned at station j 3 at time t t ∈ {1, . . . , Nt } : i, j ∈ {1, . . . , Ns } : set of stations ⇒ Xijt tensor of dimension Ns × Ns × Nt . ⇒ taking into account spatial and temporal BSS behavior Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 46 / 75
  • 49. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA LDA, background LDA = Latent Dirichlet Allocation Bayesian mixture for discrete data ⇒ originally to find topics in text corpus Each document (bag of words) is a mixture of topics Each topic has its own words probabilities vector F IG . 8: Graphical model representation of LDA. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 47 / 75
  • 50. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA LDA for dynamical O/D matrices analysis Hypothesis : Local stationarity of BSS behaviour / OD Cyclostationarity : week, day Small bags of successive trips ≈ stationarity of OD ⇒ Documents (bags of words) = bags of successive trips (5000) , with : Words = Origin/Destination couples Topics = Latent activities Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 48 / 75
  • 51. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA LDA, for dynamical O/D matrices analysis For each activity a, draw an O/D matrices generator : Λa ∼ D(β) For each "bag of trips" t ∈ {1, . . . , Nt } : 1 Draw the activities proportions : πt ∼ D(α) 2 For each trips of the bag t : Draw its activity A : A ∼ M(1, πt ) Draw an O/D couple D using activity A generator : D ∼ M(1, ΛA ) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 49 / 75
  • 52. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Fixing the number of activities perplexity analysis Perplexity = f( likelihood of test data ) Clear drop off at K=5 165000 q q q 160000 perplexity 155000 q q q q q q q q q q q 4 8 12 K F IG . 9: Perplexity on the September dataset with respect to the number of latent activities. Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 50 / 75
  • 53. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Temporal results : πt 9000 trips / hour 6000 3000 0 avril 11 avril 18 avril 25 F IG . 10: Temporal evolution of πt Remarks : Cyclostationarity clearly visible (even holidays) Low mixture between the latent activities Interpretable temporal clusters : Home ↔ Work, Lunch,... Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 51 / 75
  • 54. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : Λa as flows F IG . 11: Latent activity "House→Work commute", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 52 / 75
  • 55. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : Λa as flows F IG . 12: Latent activity "Lunch", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 53 / 75
  • 56. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : Λa as flows F IG . 13: Latent activity "Work→House commute", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 54 / 75
  • 57. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : Λa as flows F IG . 14: Latent activity "Evening", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 55 / 75
  • 58. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : Λa as flows F IG . 15: Latent activity "Spare time", flows (blue for f=10/10 000) Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 56 / 75
  • 59. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Incoming / Outgoing specificities, question : Which stations have an increased in/out-degree for a latent activity a ? a Introduce stations incoming specificities ISs and outgoing specificities OSsa : a a a ag g ISs = log(pins /pins ), OSs = log(pouts /pouts ), (6) a a with pins , pouts the probabilities that a trips end/start in station s for activity a : a pins = Λa , pouts = js a Λa , sj j j g g and pins , pouts the global probabilities that a trips end/start in station s : g j,t Xjst g j,t Xsjt pins = , pouts = . i,j,t Xijt i,j,t Xijt Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 57 / 75
  • 60. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : incoming specificities F IG . 16: Latent activity "House→Work commute", stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 58 / 75
  • 61. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : outgoing specificities F IG . 17: Latent activity "House→Work commute", stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 59 / 75
  • 62. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Expected bike balance, question : Positive/negative bike balance of stations for a latent activity a ? The O/D matrix D follow a multinomial law of parameter Ndep (number of trips) and Λa : D ∼ M(Ndep , Λa ), The bike balance Bs for a station s is thus given by : Incoming bikes Outgoing bikes Bs = Djs − Dsj j j And the expectation of the balance vector B is thus equal to : E[B] = Ndep (Λa )t − Λa v, (7) with v = (1, . . . , 1)t . Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 60 / 75
  • 63. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Spatial results : expected bike balance Balance -30 -20 -10 0 10 20 30 F IG . 18: Latent activity "House→Work commute", stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 61 / 75
  • 64. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Lunch", incoming specificity F IG . 19: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 62 / 75
  • 65. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Lunch", outgoing specificity F IG . 20: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 63 / 75
  • 66. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Lunch", balance Balance -30 -20 -10 0 10 20 30 F IG . 21: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 64 / 75
  • 67. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Work→House commute", incoming specificity F IG . 22: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 65 / 75
  • 68. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Work→House commute", outgoing specificity F IG . 23: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 66 / 75
  • 69. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Work→House commute", balance Balance -30 -20 -10 0 10 20 30 F IG . 24: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 67 / 75
  • 70. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Evening" incoming specificity F IG . 25: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 68 / 75
  • 71. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Evening", outgoing specificity F IG . 26: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 69 / 75
  • 72. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Evening", balance Balance -30 -20 -10 0 10 20 30 F IG . 27: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 70 / 75
  • 73. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Spare time", incoming specificity F IG . 28: Stations incoming specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 71 / 75
  • 74. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Spare time", outgoing specificity F IG . 29: Stations outgoing specificity Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 72 / 75
  • 75. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset "Spare time", balance Balance -30 -20 -10 0 10 20 30 F IG . 30: Stations expected balances with Ndep = 10 000 Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 73 / 75
  • 76. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset Conclusion on LDA for activities recognition Interpretable latent activities Give good picture of city "pulse" and geography Better understanding of the system behaviour Strong evidence of cyclostationarity Week-day / Week-end pattern Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 74 / 75
  • 77. Thanks for your attention @comeetie, etienne.come@ifsttar.fr Ifsttar Centre de Marne-la-Vallée Batiment le “Descartes 2” 2, rue de la Butte Verte F-93166 Noisy le Grand cedex Mél. etienne.come@ifsttar.fr Tél. +33 (0)1 45 92 56 57 Site : www.ifsttar.fr Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 75 / 75