Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
animatics
1. Model-based clustering for BSS
usage mining,
a case study with the velib’ system of Paris
Etienne Côme
15/10/2012
2. Outline
Bike Sharing Systems (BSS)
What is fun with BSS ?
Relatively new systems
Rapidly diffusing (EU and US nowadays, Hangzhou, ...)
Important sucesses
Abundant usage data
In interesting and original forms :
Origins / Destinations + timestamp
Real-time stations balances
Interesting and new problematics
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 2 / 75
3. Outline
Outline
1 Introduction
Problematics
Usage data : trips records
Velib’ in few numbers and pictures
Tools and approach
2 Stations clustering using temporal usage profiles
Data representation : count time series
Generative model : naive Poisson mixture
Analysis of the results on the Velib’ dataset
3 Latent Dirichlet Allocation (LDA) for trips activity recognition
Data representation : dynamical O/D matrices
Generative model under LDA
Analysis of the results on the Velib’ dataset
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 3 / 75
4. Introduction Problematics
Problematics
Operational objectives
Planning new systems : position, size of the stations
Quality of service : bikes re-dispatch,...
...
Mining objectives
Building predictive model of usage
Finding spatio-temporal patterns
Better understanding of the usages
...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 4 / 75
5. Introduction Usage data : trips records
Raw data
Trips data
departure time stamp
departure station
arrival time stamp
arrival station
type of subscription
! Will be converted in contingency tables (i.e. tensors of counts)
Data sources
! Velib’, 2 month
Open data : Barclays (Londre), Boston, ...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 5 / 75
6. Introduction Velib’ in few numbers and pictures
in few numbers
BSS size :
1200 stations
≈ 40000 slots
≈ 16000 bikes
≈ 100 000 trips/day
27% trips = day subscription
73% trips = year subscription
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 6 / 75
7. Introduction Velib’ in few numbers and pictures
Global behavior
100 000 140 000 Day subscription
free use limit
120 000
Year subscription
80 000
free use limit
100 000
60 000
80 000
Trips
60 000
40 000
40 000
20 000
20 000
0 0
0 5 10 0 20 40 60 80 100
Distances (Km) Duration (min)
F IG . 1: Histograms of trips lengths and durations
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 7 / 75
8. Introduction Velib’ in few numbers and pictures
Temporal effects
35 000
Subscription :
30 000 Short
Long
25 000
Trips
20 000
15 000
10 000
5 000
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Time
F IG . 2: Number of Trips / hour (short / long subscriptions)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 8 / 75
9. Introduction Velib’ in few numbers and pictures
Temporal effects
7 500
Average number of trips
5 000
2 500
0
0 2 4 6 8 10 12 14 16 18 20 22
Hours
F IG . 3: Number of trips in week day / en week-end
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 9 / 75
10. Introduction Velib’ in few numbers and pictures
Spatial effects
F IG . 4: Incoming trips map [6h,7h] for week days
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 10 / 75
11. Introduction Velib’ in few numbers and pictures
Spatial effects
24
20
Mean activity / hour
16
12
8
4
2 4 6 8 10
Distance from the center ("Les Halles") in Km
F IG . 5: Stations activities / distance to "Les Halles"
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 11 / 75
12. Introduction Tools and approach
Approach, exploratory data analysis
General methodologie
Use clustering algorithms to find interesting patterns in the data
Confront the found clusters to the city geography and sociology
⇒ Extract important factors that influence BSS system behavior.
2 developments :
1 Find clusters of stations with similar temporal usage pattern
2 Find latent activities that govern the BSS system dynamics
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 12 / 75
13. Introduction Tools and approach
Tools, model based clustering
General methodologie
Imagine a data generation process
⇒ which include non-observed or latent variables
Latent variables can be discrete or continuous
Examples of latent variables
Species for flowers
Topics for texts
Communities for graph vertices
...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 13 / 75
14. Introduction Tools and approach
Generative approach
Clustering
Model-based clustering :
1 Draw the cluster of sample (i)
2 Depending on the cluster draw the observed values of (i)
0.05
0.04
0.03
f(x)
0.02
0.01
0
-80 -60 -40 -20 0 20 40
x
F IG . 6: Example of 1D Gaussian mixture model
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 14 / 75
15. Introduction Tools and approach
Data generation process
Graphical model representation
1. Draw the cluster of sample (i)
Zi ∼ M(1, π)
⇒ π prior proportions of the clusters.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 15 / 75
16. Introduction Tools and approach
Data generation process
Graphical model representation
2. Depending on the cluster draw the observed values of (i)
p(x|Zik = 1) = f (x; θ k ), ∀k ∈ {1, . . . , K }.
⇒ f can be tuned to exploit specificities of the problem.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 16 / 75
17. Introduction Tools and approach
Model based clustering framework
Task and tools
Inferring the parameters :
⇒ EM algorithm or Variational EM for complex models
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
18. Introduction Tools and approach
Model based clustering framework
Task and tools
Inferring the parameters :
⇒ EM algorithm or Variational EM for complex models
Finding the clustering
⇒ Byproducts of EM
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
19. Introduction Tools and approach
Model based clustering framework
Task and tools
Inferring the parameters :
⇒ EM algorithm or Variational EM for complex models
Finding the clustering
⇒ Byproducts of EM
Fixing the number of clusters
⇒ Model selection criterion : BIC, AIC, ICL, perplexity.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
20. Stations clustering using temporal usage profiles
Stations clustering using
temporal usage profiles
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 18 / 75
21. Stations clustering using temporal usage profiles
Stations clustering using temporal usage profiles
Objectives :
Find groups of stations with similar temporal usage profiles
Temporal usage profiles = incoming, outgoing activity / hour
Taking into account the week-days /week-end discrepancy
With a model for counts data
Cross the results with possible explanatory variables :
population, employments, amenities, ...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 19 / 75
22. Stations clustering using temporal usage profiles Data representation : count time series
Data representation : count time series
Observed data :
out
Xsdt : # of bikes taken at station s during day d at hour t
in
Xsdt : # of bikes returned at station s during day d at hour t
in in out out
Xsd = (Xsd1 , . . . , Xsd24 , Xsd1 , . . . , Xsd24 )
⇒ X tensor of size N × D × T .
⇒ temporal behavior / stations.
Variables
Xsd (observed) : # of bike leaving/coming
Zs (latent) : cluster of station s
Wd (observed) : cluster of days (week / week-end)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 20 / 75
23. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
Generative model : naive Poisson mixture
F IG . 7: Graphical model representation
Parameters, Θ
αs = stations attractivity effects
π = (π1 , . . . , πK ) cluster proportions
λ = (λklt ) temporal profiles of the clusters
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 21 / 75
24. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
Generative model
Naive Poisson mixture
Zs ∼ M(1, π)
Xsd1 ⊥ . . . ⊥ XsdT
⊥ ⊥ | {Zsk = 1, Wdl = 1}
Xsdt |{Zsk = 1, Wdl = 1} ∼ P(αs λklt )
Constraints
Dl λklt = DT , ∀k ∈ {1, . . . , K },
l,t
with Dl number of day in cluster l.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 22 / 75
25. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
Parameters estimation, likelihood
Marginal likelihood
L(Θ; X) = log πk p(Xsdt ; αs λklt )Wdl (1)
s k d,t,l
Completed likelihood
Lc(Θ; X, Z) = Zsk log πk p(Xsdt ; αs λklt )Wdl (2)
s,k d,t,l
where Z is unknown.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 23 / 75
26. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
EM algorithm
⇒ Straightforward solution for parameters estimation EM :
E step
Conditional expectation of Lc given the current parameters
E[Lc(Θ, x, Z)|x, Θ(q) ] = tsk log πk p(xsdt ; αs λklt )Wdl (3)
s,k d,t,l
with tsk the posteriori probabilities :
(q) (q) (q)
πk d,t,l p(xsdt ; αs λklt )Wdl
tsk = (q) (q) (q)
(4)
k πk d,t,l p(xsdt ; αs λklt )Wdl
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 24 / 75
27. Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
EM algorithm
⇒ Straightforward solution for parameters estimation EM :
M step
Maximization of the lower bound with respect to the parameters
1
αs : mean station activity αs =
ˆ DT d,t Xsdt ,
1
πk : proportion of cluster k , πk =
ˆ N s tsk
λklt : activity of time frame t for cluster k , for week day or during
the week-end (day cluster l)
ˆ 1
λklt = tsk Wdl Xsdt (5)
s tsk αs d Wdl
s,d
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 25 / 75
28. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Results
Setting
One month of data (September)
Number of clusters (K=8) set manually
⇒ good trade off between interpretability and fit of the clustering
Outputs
Zs : station s clusters
λk : temporal profile of cluster k
αs : stations s attractivity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 26 / 75
29. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Railway stations
Week Week-end
5
4
Departures
3
2
1
Activity
0
5
4
3
Arrivals
2
1
0
0 5 10 15 20 0 5 10 15 20
Hours
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 27 / 75
30. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Railway stations
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 28 / 75
31. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Parks
Week Week-end
5
4
Departures
3
2
1
Activity
0
5
4
3
Arrivals
2
1
0
0 5 10 15 20 0 5 10 15 20
Hours
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 29 / 75
32. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Parks
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 30 / 75
33. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night
Week Week-end
5
4
Departures
3
2
1
Activity
0
5
4
3
Arrivals
2
1
0
0 5 10 15 20 0 5 10 15 20
Hours
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 31 / 75
34. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 32 / 75
35. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night and week-end
Week Week-end
5
4
Departures
3
2
1
Activity
0
5
4
3
Arrivals
2
1
0
0 5 10 15 20 0 5 10 15 20
Hours
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 33 / 75
36. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night and week-end
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 34 / 75
37. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Housing
Week Week-end
5
4
Departures
3
2
1
Activity
0
5
4
3
Arrivals
2
1
0
0 5 10 15 20 0 5 10 15 20
Hours
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 35 / 75
42. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Mixed usage
Week Week-end
5
4
Departures
3
2
1
Activity
0
5
4
3
Arrivals
2
1
0
0 5 10 15 20 0 5 10 15 20
Hours
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 40 / 75
43. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Mixed usage
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 41 / 75
44. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Crossing with population/employments/services rates
hab/ha emp/ha serv/ha com/ha
162 237 4.2 3.7
Spare time (1) 367 189 6.3 4.4
Spare time (2) 261 322 7.7 6.9
Parks 172 90 2 1.7
Railway stations 209 206 2.4 1.8
Housing 375 108 3.8 2.7
Employment (1) 138 409 4.5 2.8
Employment (2) 157 456 5.7 5.6
Mixed usage 301 163 3.8 2.8
TAB . 1: Mean of each cluster with respect to population, employment,
services and shops densities . Sources "Recensement 2008", "Base
permanente des équipements", Insee.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 42 / 75
45. Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Conclusion on stations clustering
Discussion on the model
Model adapted to counts
Scaling factors for stations important
Stations described by incoming and outgoing flow dynamics
Taking into account week-day week-end differences
Discussion on the results
Clusters are interpretable
Population, employment and amenities densities are highly
explanatory for the clusters
Temporal profiles are also interpretable and informative
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 43 / 75
46. Latent Dirichlet Allocation (LDA) for trips activity recognition
Latent Dirichlet Allocation
(LDA),
for trips activity recognition
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 44 / 75
47. Latent Dirichlet Allocation (LDA) for trips activity recognition
Objectives
Decompose, the trips into interpretable clusters
⇒ look for stationarities and change points in the OD dynamics
LDA with documents = small bags of successive trips
Analyse the found clusters with respect to their :
Temporal positions, cycles
Spatial distribution of flows
Spatial distribution of incoming / outgoing flows per stations
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 45 / 75
48. Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matrices
Data representation : dynamical O/D matrices
Observed data :
Xijt : # of bikes that were
1 taken at station i
2 returned at station j
3 at time t
t ∈ {1, . . . , Nt } :
i, j ∈ {1, . . . , Ns } : set of stations
⇒ Xijt tensor of dimension Ns × Ns × Nt .
⇒ taking into account spatial and temporal BSS behavior
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 46 / 75
49. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA
LDA, background
LDA = Latent Dirichlet Allocation
Bayesian mixture for discrete data
⇒ originally to find topics in text corpus
Each document (bag of words) is a mixture of topics
Each topic has its own words probabilities vector
F IG . 8: Graphical model representation of LDA.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 47 / 75
50. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA
LDA for dynamical O/D matrices analysis
Hypothesis :
Local stationarity of BSS behaviour / OD
Cyclostationarity : week, day
Small bags of successive trips ≈ stationarity of OD
⇒ Documents (bags of words) = bags of successive trips (5000)
, with :
Words = Origin/Destination couples
Topics = Latent activities
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 48 / 75
51. Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA
LDA, for dynamical O/D matrices analysis
For each activity a, draw an O/D matrices generator :
Λa ∼ D(β)
For each "bag of trips" t ∈ {1, . . . , Nt } :
1 Draw the activities proportions : πt ∼ D(α)
2 For each trips of the bag t :
Draw its activity A : A ∼ M(1, πt )
Draw an O/D couple D using activity A generator :
D ∼ M(1, ΛA )
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 49 / 75
52. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Fixing the number of activities
perplexity analysis
Perplexity = f( likelihood of test data )
Clear drop off at K=5
165000
q q
q
160000
perplexity
155000
q
q q
q q
q q q q
q
q
4 8 12
K
F IG . 9: Perplexity on the September dataset with respect to the number of
latent activities.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 50 / 75
53. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Temporal results : πt
9000
trips / hour
6000
3000
0
avril 11 avril 18 avril 25
F IG . 10: Temporal evolution of πt
Remarks :
Cyclostationarity clearly visible (even holidays)
Low mixture between the latent activities
Interpretable temporal clusters : Home ↔ Work, Lunch,...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 51 / 75
54. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
F IG . 11: Latent activity "House→Work commute", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 52 / 75
55. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
F IG . 12: Latent activity "Lunch", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 53 / 75
56. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
F IG . 13: Latent activity "Work→House commute", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 54 / 75
57. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
F IG . 14: Latent activity "Evening", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 55 / 75
58. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
F IG . 15: Latent activity "Spare time", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 56 / 75
59. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Incoming / Outgoing specificities, question :
Which stations have an increased in/out-degree for a latent activity a ?
a
Introduce stations incoming specificities ISs and outgoing
specificities OSsa :
a a a ag g
ISs = log(pins /pins ), OSs = log(pouts /pouts ), (6)
a a
with pins , pouts the probabilities that a trips end/start in station s
for activity a :
a
pins = Λa , pouts =
js
a
Λa ,
sj
j j
g g
and pins , pouts the global probabilities that a trips end/start in
station s :
g j,t Xjst g j,t Xsjt
pins = , pouts = .
i,j,t Xijt i,j,t Xijt
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 57 / 75
60. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : incoming specificities
F IG . 16: Latent activity "House→Work commute", stations incoming
specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 58 / 75
61. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : outgoing specificities
F IG . 17: Latent activity "House→Work commute", stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 59 / 75
62. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Expected bike balance, question :
Positive/negative bike balance of stations for a latent activity a ?
The O/D matrix D follow a multinomial law of parameter Ndep
(number of trips) and Λa :
D ∼ M(Ndep , Λa ),
The bike balance Bs for a station s is thus given by :
Incoming bikes Outgoing bikes
Bs = Djs − Dsj
j j
And the expectation of the balance vector B is thus equal to :
E[B] = Ndep (Λa )t − Λa v, (7)
with v = (1, . . . , 1)t .
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 60 / 75
63. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : expected bike balance
Balance
-30
-20
-10
0
10
20
30
F IG . 18: Latent activity "House→Work commute", stations expected balances
with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 61 / 75
64. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Lunch", incoming specificity
F IG . 19: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 62 / 75
65. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Lunch", outgoing specificity
F IG . 20: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 63 / 75
66. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Lunch", balance
Balance
-30
-20
-10
0
10
20
30
F IG . 21: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 64 / 75
67. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Work→House commute", incoming specificity
F IG . 22: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 65 / 75
68. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Work→House commute", outgoing specificity
F IG . 23: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 66 / 75
69. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Work→House commute", balance
Balance
-30
-20
-10
0
10
20
30
F IG . 24: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 67 / 75
70. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Evening" incoming specificity
F IG . 25: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 68 / 75
71. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Evening", outgoing specificity
F IG . 26: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 69 / 75
72. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Evening", balance
Balance
-30
-20
-10
0
10
20
30
F IG . 27: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 70 / 75
73. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Spare time", incoming specificity
F IG . 28: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 71 / 75
74. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Spare time", outgoing specificity
F IG . 29: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 72 / 75
75. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Spare time", balance
Balance
-30
-20
-10
0
10
20
30
F IG . 30: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 73 / 75
76. Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Conclusion on LDA for activities recognition
Interpretable latent activities
Give good picture of city "pulse" and geography
Better understanding of the system behaviour
Strong evidence of cyclostationarity
Week-day / Week-end pattern
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 74 / 75
77. Thanks for your attention
@comeetie, etienne.come@ifsttar.fr
Ifsttar
Centre de Marne-la-Vallée
Batiment le “Descartes 2”
2, rue de la Butte Verte F-93166 Noisy le Grand cedex
Mél. etienne.come@ifsttar.fr
Tél. +33 (0)1 45 92 56 57
Site : www.ifsttar.fr
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 75 / 75