In urban mobility, map-matching aims to project GPS points generated by moving objects onto the road segments representing the actual object positions. Up to now, map-matching has found interesting applications in traffic analysis, frequent path extraction, and location prediction. However, state-of-art implementations of map-matching algorithms are either private, sequential or inefficient. In this paper, we propose an extension of an existing serial algorithm of known efficiency by reformulating it in a distributed way, in order to achieve great scalability on real big data scenarios. Furthermore, we enhance the robustness of the algorithm, which is based on a first-order Hidden Markov Model, by introducing a smart strategy to avoid gaps in the matched road segments; indeed, this problem may occur under sparse GPS sampling or in urban areas with highly fragmented road segments. Our implementation is based on Apache Spark and is publicly available on Github. The implementation is tested against a dataset with 7.8 million GPS points in Milan.
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm with a Hidden Markov Model
1. Map-Matching on Big Data:
a Distributed and Efficient Algorithm
with a Hidden Markov Model
Matteo Francia, Enrico Gallinucci, Federico Vitali
{m.francia,enrico.gallinucci} @ unibo.it
{federico.vitali} @ studio.unibo.it
2. Map-Matching: problem definition
Trajectory T = (p0, …, pt): sequence of GPS
points ordered by time [1]
Map-Matching: attach points to the route along
which an object is moving
• Inputs: trajectory dataset 𝒯, road network G(V, S)
• Output: matched trajectories M
Issues
• Millions of GPS points
• … with systematic measurement errors
• … from different road networks
Matteo Francia – University of Bologna miproBIS 2019 2
G(V,S)𝒯
Map-Matching
M
3. Up to now
• Sequential implementations achieve best accuracy
• Naïve: generate all possible routes to identify the most-likely [2]
• HMMs model road network topology to achieve higher accuracy [3, 4]
• Highest accuracy (up to now)
• Distributed implementations
• Scalable approximations of exact algorithms [5]
• Indexing structures facilitating the matching process [6, 7]
• There is room to improve accuracy of distributed implementation
Matteo Francia – University of Bologna miproBIS 2019 3
4. Hidden Markov Model (HMM)
HMM describes probability distribution over sequences
• Find most likely sequence of hidden states producing observed symbols
• Markov chain: state transition depends only on current state
From HMM to Map-Matching of trajectory T
Matteo Francia – University of Bologna miproBIS 2019 4
Sk
Sj
Si
pt pt-1 p0
GPS points
Road
segments
Transition
EmissionHMM Map-Matching
Hidden states (not observable) Road segments (S)
Initial state probability Equally probable segments
Observable symbols GPS points (p0, …, pt)
Emission probability Point/Segment geometrical relationship
Transition probability Segment/Segment topological relationships
5. Our contribution
• Map-Matching on big data
• Based on HMM (inspired by [4])
• Extend transition probability to fragmented networks
• [4] considers only 3 segments
• Scale up to millions of GPS points
• Implemented in Spark & GeoSpark
• Given 𝒯 and G(V,S), three-step approach
(A) Emission probability estimation
(B) Transition probability estimation
(C) Viterbi algorithm
Matteo Francia – University of Bologna miproBIS 2019 5
M
𝒯
(C)
(B)
(A)
Spatial Join
Group By
Trajectory
Join
Map
routes
G(V,S)
6. Map-Matching: Step A
Emission probability estimation
Matteo Francia – University of Bologna miproBIS 2019 6
dH
pt t
dH = Haversine distance
𝒯
(A)
Spatial Join
G(V,S)
G(V,S)
Si
Sj
7. Map-Matching: Step A
Emission probability estimation
Compute the neighborhood N for each point pt in 𝒯
• Join 𝒯 and G(V,S) on element-wise distance
Efficiency: road segments far from pt have a null emission probability
• t bounds the neighborhood range
• a bounds the neighbors cardinality
Matteo Francia – University of Bologna miproBIS 2019 7
𝒯
(A)
Spatial Join
G(V,S)
8. Map-Matching: Step B
Transition probability estimation
where
Matteo Francia – University of Bologna miproBIS 2019 8
dH
dR
pt
pt-1
dH = Haversine distance
dR = shorthest path in G(V,S)
𝒯
(B)
(A)
Spatial Join
Group By
Trajectory
Joinroutes
G(V,S)
G(V,S)Si
Sj
9. Map-Matching: Step B
Transition probability estimation
where
Compute routes (Segment/Segment shortest path)
Group 𝒯 by trajectory T
Join consequent points in T to routes to estimate dR
Efficiency: far road segments have a null transition probability
• J bounds the routes depth
• g bounds the routes length
Matteo Francia – University of Bologna miproBIS 2019 9
𝒯
(B)
(A)
Spatial Join
Group By
Trajectory
Joinroutes
G(V,S)
10. Map-Matching: Step C
Map-matching and aiding
• Viterbi algorithm returns matched routes
• Bag-of-task (for each trajectory)
• Aiding fragmented routes
• Consequent points can map to non-adjacent segments
• Match all segments along the shortest path
Efficiency: tackle Viterbi algorithm complexity O(|S|2|T|)
• |T|: trajectory simplification (out of scope)
• |S|2: far road segments have a null probability
• Tune neighborhood range t and cardinality a in Step A
Matteo Francia – University of Bologna miproBIS 2019 10
M
𝒯
(C)
(B)
(A)
Spatial Join
Group By
Trajectory
Join
Map
routes
G(V,S)
11. Evaluation: dataset
Case study in Milan
• 120K trajectories
• 8M GPS points
• Ground truth: 50 trajectories
Tests
• Scaling up
• Step complexity
miproBIS 2019 Matteo Francia – University of Bologna 11
12. Evaluation: scaling up
#Executors: [1 , 10]
• Execution time from 2h20m to 20m
• Speedup (of 7x) bounded by Viterbi and
parallelization overhead
#Trajectories: [12, 120K]
• Execution time increases linearly with |𝒯|
• For small |𝒯| the Spark’s overhead overcomes
actual workload times
Matteo Francia – University of Bologna miproBIS 2019 12
0
50
100
150
1 2 3 4 5 6 7 8 9 10
Exec.time(m)
routes time A time B time C
0
5
10
15
20
25
12 120 1.200 12.000 120.000
Exec.time(m)
routes time A time B time C
13. Evaluation: step complexity
Matteo Francia – University of Bologna miproBIS 2019 13
(C)
(B)
(A)
Spatial Join
Group By
Trajectory
Join
Map
0
20
40
60
α = 4 α = 6 α = 8 α = 10
Exec.time(m)
C
B
A
0
10
20
30
40
τ = 100 τ = 150 τ = 200
Exec.time(m)
C
B
A
0
5
10
15
20
25
ϑ = 4 ϑ = 6 ϑ = 8 ϑ = 10Exec.time(m)
C
B
A
routes
t affects A (higher shuffle)
J affects routes (more joins)
a affects C (bigger Viterbi’s search space )
0
5
10
15
20
25
ϒ = 300 ϒ = 500 ϒ = 800
Exec.time(m)
C
B
A
g affects B (higher shuffle)
routes
14. Conclusion
Contribution:
• Distributed Map-Matching on Spark
• Accuracy equivalent to sequential HMM [4]
• Overcome limitations for highly fragmented urban networks
• Map aiding embedded in the Viterbi computation
Future directions:
• Matching should adapt to high variance in:
• Means of transportation (e.g., car, walk, bicycle)
• Sampling rates of GPS points (e.g., from seconds to minutes)
• And scale up to huge road networks
• From Milan to the entire region Lombardia
Matteo Francia – University of Bologna miproBIS 2019 14
https://github.com/big-unibo/map-matching
15. References
[1] Xiaofang Zhou and Lei Li. Spatiotemporal data: Trajectories. In Encyclopedia of Big Data Technologies. Springer, 2019.
[2] Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei Wang, and Yan Huang. Map-matching for low-sampling-rate GPS
trajectories. In 17th ACM SIGSPATIAL Int. Symp. on Advances in Geographic Inform. Syst., ACM-GIS 2009, Nov. 4-6, 2009,
Seattle, Washington, USA, Proc., pages 352–361, 2009.
[3] Paul Newson and John Krumm. Hidden markov Map-Matching through noise and sparseness. In 17th ACM SIGSPATIAL Int.
Symp. on Advances in Geographic Inform. Syst., ACM-GIS 2009, Nov. 4-6, 2009, Seattle, Washington, USA, Proc., pages 336–
343, 2009.
[4] An Luo, Shenghua Chen, and Bin Xv. Enhanced map-matching algorithm with a hidden markov model for mobile phone
positioning. ISPRS Int. J. Geo-Inform., 6(11):327, 2017.
[5] Antonio M. R. Almeida, Maria I. V. Lima, José Antonio Fernandes de Macedo, and Javam C. Machado. DMM: A distributed
map-matching algorithm using the mapreduce paradigm. In 19th IEEE Int. Conf. on Intelligent Transportation Syst., ITSC 2016.
[6] Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, Petteri Nurmi, Sasu Tarkoma, and Huy T. Vo. Geomatch: Efficient large-scale
Map-Matching on apache spark. In IEEE Int. Conf. on Big Data, Big Data 2018, Seattle, WA, USA, Dec. 10-13, 2018, pages 384–
391, 2018.
[7] Douglas Alves Peixoto, Hung Quoc Viet Nguyen, Bolong Zheng, and Xiaofang Zhou. A framework for parallel map-matching
at scale using spark. Distributed and Parallel Databases, pages 1– 24, 2018.
Matteo Francia – University of Bologna miproBIS 2019 15