Presentation of PhD thesis on Location Data Fusion

Information fusion for location
data analysis
Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei
Doctorate School in Industrial Innovation Engineering

Thesis outline
• Introduction to Data Fusion Methods
• Location Data and Application Scenarios
• Data Fusion for Event Detection and Event Description
• Re-identification of Anonymized CDR Records Using Information Fusion
• Privacy issues
• Conclusions

Location data and application scenarios
Data
• Location data such as CDR (Call
Description Records)
• Geo-tagged social network data or
data from LBS
• Open data with a location
dimension such as census data
Applications
• Social – economic development
(D4D) .
• Smart mobility applications, land use
and city management
• Ground truth information for
validation analysis

Introduction to data fusion methods
• Stage based methods.
• Feature level-based.
• Semantic meaning-based data fusion methods

Location data fusion : side effect
• Data fusion enables a huge number of applications
• Privacy risks for individual data

Data fusion for event detection / description by
using aggregated CDR data and geo-tagged social
network data
Detecting and describing events happening in urban
areas by analysing spatio – temporal data
Detecting and describing events happening in urban areas
by analysing spatio – temporal data
Riferimento all’articolo

The dataset: spatio-temporal aggregation
Spatial Aggregation
Temporal aggregation

Outlier detection
method
Median method :
[LB,UB] = [Q50 – k*Q50, Q50 + k*Q50]
IQR method :
[LB,UB] = [Q25 – k*IQR, Q75 + k*IQR]
Q75 method :
[LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]

Groundtruth
dataset
 Football matches
 Fairs
 Protests
 Other events
Events happeing in the period of
time the data covers

Measuring precision and
recall of the system
True positives (tp)
False positives (fp)
False negatives (fn)
Precision = tp / (tp + fp)
Recall = tp / (tp + fn)

Precision – Recall of event detection system

Precision – Recall Milano vs Trentino SMS-Call

By combining the results from
the two datasets
• Improvement of precision – recall
performance of the method
• The improvement is limited in the
long run by the main dataset.
• The same improvement can be
observed also by joining the
results of the other datasets.
Improving event detection results by data fusion

By using the CDR the events
can be detected but not
described:
• By joining the results the data
can complement and enrich
each other.
• In this case the social dataset
can be used to describe
semantically the events
Data fusion for Event description

Confronting the results with other works on event
detection
• Two other similar works
• Using much more sophisticated algorithms
• Comparable results

Re-identification of CDR data by using social
network geo-tagged data
• Fine grained social and CDR user data
• Mobility paths
• Uniqueness of mobility prints
• Matching of user’s mobility path
• Re-identification probability evaluation
• The groundtruth problem.

Location data : CDR and social
CDR data
1. Massive dataset about millions of
users
2. Released in an anonymized format
3. Regularly sampled
4. Tower granularity (400 – sev. kml)
Geo-tagged social data
1. Sparse data following exp. distrib. (too
many users too little events per user)
2. Not anonymized
3. Irregular samplinig
4. Precise (GPS or triang. Loc.)

Re-identification of CDR data by using social
network geo-tagged data
• Anonymization.. and re-identification
• Movie ratings from NetFlix Prize dataset
• Medical records of Massachusetts Hospital using a voters list
• Re-identification of anonymous volunteers in a DNA study for Personal Genome
Project
• In line with our domain
• Unique in the Crowd: the privacy bounds of Human Mobility
• Markov chain models for de-anonymization of geo-located data

Mobility measures : radius of gyration
Knowledge extraction : radius of gyration

Radius of gyration : Social Network Data

Mobility measures and uniqueness of users mobility
Knowledge extraction : uniqueness of traces

Mobility measures and uniqueness of users mobility
Sample of 1000 users from each CDR dataset
Knowledge extraction : uniqueness of traces

Knowledge extraction : uniqueness of traces statistics

Knowledge extraction : matching users from different datasets CDR and
social dataset

Data fusion : matching algorithm

Knowledge extraction : matching statistics

• Matching by chance : Bonferroni principle
• False social user’s events created :
a) in a random way
b) by clonning events (+1km, +30min)
• As a result we have 60 % less in the number of matchings in the first
case and 40% in the second case
Data fusion : considerations

As real identity of CDR users is missing, a validation of these results is
difficoult.
Flickr user is Twitter user (mobility traces overlapping and similar
usernames) and (the only) CDR user.
MCC field of the CDR record matching with the language used for
describing pictures and tweets content.
Data fusion : groundtruth validation

Reidentifying CDR users : probabilistic approach
Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?

• Question which is both novel (no other works addressing it in this
domain) and fundamental
• Conditional probability
Re-identification : probabilistic approach
Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?

Re-identification : probabilistic modeling

Privacy risks for pesonal data
The revelatory potential power of location data
• Location of a person’s home. What kind of city area does he lives in?
• Locations of the stores a person frequent and from this information
shopping patterns can be inferred preferences and in some cases religious belief.
• There are also other types of very sensitive data such as health records. These can be
deduced by locations of doctors and hospitals the person visits
• By linking two or more locations on time and space, mobility
paths may be inferred.

Privacy risks : privacy preserving techniques
• Data Anonymization
a) K-anonymity in different improved versions
b) Possible reidentification of location data as already showed
• Data Suppression
a) Suppression and aggregation
b) Utility of the dataset after suppression dramatically reduced

Challenges
• One of the main challenges is the lack of common engineering standards for data
fusion systems. It has been one of the main impediments to integration and data
fusion.
• As different methods of data fusion behave differently in different applications, it
is not trivial to choose the best method for a specific task.
• Challenges during the data fusion design phase. At which level of abstraction,
reduction and simplification the data should be fused ?
• The lack of a unified framework that could orient the process of data fusion
towards a “structured data fusion” vision.

Conclusions and future work
• Information fusion as a an enabling process for novel applications
- Future work oriented towards the “structured data fusion” idea
• Privacy
- Assesment of variations of existing privacy preserving techniques (D.P.)

Publications
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness
for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in
enabling technologies for collaboration 17-20 2013.
• Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data
”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014.
• Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and
Humanized Computing, pp 1– 15.
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social
Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers
Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227.
• Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between
Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

Presentation of PhD thesis on Location Data Fusion

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Presentation of PhD thesis on Location Data Fusion

Semelhante a Presentation of PhD thesis on Location Data Fusion (20)

Mais de Alket Cecaj

Mais de Alket Cecaj (6)

Último

Último (20)

Presentation of PhD thesis on Location Data Fusion

Notas do Editor