SlideShare uma empresa Scribd logo
1 de 45
Information fusion for location
data analysis
Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei
Doctorate School in Industrial Innovation Engineering
Thesis outline
• Introduction to Data Fusion Methods
• Location Data and Application Scenarios
• Data Fusion for Event Detection and Event Description
• Re-identification of Anonymized CDR Records Using Information Fusion
• Privacy issues
• Conclusions
Location data and application scenarios
Data
• Location data such as CDR (Call
Description Records)
• Geo-tagged social network data or
data from LBS
• Open data with a location
dimension such as census data
Applications
• Social – economic development
(D4D) .
• Smart mobility applications, land use
and city management
• Ground truth information for
validation analysis
Introduction to data fusion
Introduction to data fusion methods
• Stage based methods.
• Feature level-based.
• Semantic meaning-based data fusion methods
Location data fusion : side effect
• Data fusion enables a huge number of applications
• Privacy risks for individual data
Data fusion for event detection / description by
using aggregated CDR data and geo-tagged social
network data
Detecting and describing events happening in urban
areas by analysing spatio – temporal data
Detecting and describing events happening in urban areas
by analysing spatio – temporal data
Riferimento all’articolo
The dataset
The dataset: spatio-temporal aggregation
Spatial Aggregation
Temporal aggregation
Statistical modelling
Outlier detection
method
Median method :
[LB,UB] = [Q50 – k*Q50, Q50 + k*Q50]
IQR method :
[LB,UB] = [Q25 – k*IQR, Q75 + k*IQR]
Q75 method :
[LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
Groundtruth
dataset
 Football matches
 Fairs
 Protests
 Other events
Events happeing in the period of
time the data covers
Measuring precision and
recall of the system
True positives (tp)
False positives (fp)
False negatives (fn)
Precision = tp / (tp + fp)
Recall = tp / (tp + fn)
Precision – Recall of event detection system
Precision – Recall Milano vs Trentino SMS-Call
Precision – Recall Milano vs Trentino SMS-Call
Precision – Recall Milano vs Trentino SMS-Call
By combining the results from
the two datasets
• Improvement of precision – recall
performance of the method
• The improvement is limited in the
long run by the main dataset.
• The same improvement can be
observed also by joining the
results of the other datasets.
Improving event detection results by data fusion
By using the CDR the events
can be detected but not
described:
• By joining the results the data
can complement and enrich
each other.
• In this case the social dataset
can be used to describe
semantically the events
Data fusion for Event description
Confronting the results with other works on event
detection
• Two other similar works
• Using much more sophisticated algorithms
• Comparable results
Re-identification of CDR data by using social
network geo-tagged data
• Fine grained social and CDR user data
• Mobility paths
• Uniqueness of mobility prints
• Matching of user’s mobility path
• Re-identification probability evaluation
• The groundtruth problem.
Location data : CDR and social
CDR data
1. Massive dataset about millions of
users
2. Released in an anonymized format
3. Regularly sampled
4. Tower granularity (400 – sev. kml)
Geo-tagged social data
1. Sparse data following exp. distrib. (too
many users too little events per user)
2. Not anonymized
3. Irregular samplinig
4. Precise (GPS or triang. Loc.)
Re-identification of CDR data by using social
network geo-tagged data
• Anonymization.. and re-identification
• Movie ratings from NetFlix Prize dataset
• Medical records of Massachusetts Hospital using a voters list
• Re-identification of anonymous volunteers in a DNA study for Personal Genome
Project
• In line with our domain
• Unique in the Crowd: the privacy bounds of Human Mobility
• Markov chain models for de-anonymization of geo-located data
Data fusion process
Mobility measures : radius of gyration
Knowledge extraction : radius of gyration
Radius of gyration : CDR
Radius of gyration : Social Network Data
Mobility measures and uniqueness of users mobility
Knowledge extraction : uniqueness of traces
Mobility measures and uniqueness of users mobility
Sample of 1000 users from each CDR dataset
Knowledge extraction : uniqueness of traces
Knowledge extraction : uniqueness of traces statistics
Knowledge extraction : matching users from different datasets CDR and
social dataset
Data fusion : matching algorithm
Knowledge extraction : matching statistics
• Matching by chance : Bonferroni principle
• False social user’s events created :
a) in a random way
b) by clonning events (+1km, +30min)
• As a result we have 60 % less in the number of matchings in the first
case and 40% in the second case
Data fusion : considerations
As real identity of CDR users is missing, a validation of these results is
difficoult.
Flickr user is Twitter user (mobility traces overlapping and similar
usernames) and (the only) CDR user.
MCC field of the CDR record matching with the language used for
describing pictures and tweets content.
Data fusion : groundtruth validation
Data fusion : considerations
Reidentifying CDR users : probabilistic approach
Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?
• Question which is both novel (no other works addressing it in this
domain) and fundamental
• Conditional probability
Re-identification : probabilistic approach
Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?
Re-identification : probabilistic modeling
Privacy risks for pesonal data
The revelatory potential power of location data
• Location of a person’s home. What kind of city area does he lives in?
• Locations of the stores a person frequent and from this information
shopping patterns can be inferred preferences and in some cases religious belief.
• There are also other types of very sensitive data such as health records. These can be
deduced by locations of doctors and hospitals the person visits
• By linking two or more locations on time and space, mobility
paths may be inferred.
Privacy risks : privacy preserving techniques
• Data Anonymization
a) K-anonymity in different improved versions
b) Possible reidentification of location data as already showed
• Data Suppression
a) Suppression and aggregation
b) Utility of the dataset after suppression dramatically reduced
Challenges
• One of the main challenges is the lack of common engineering standards for data
fusion systems. It has been one of the main impediments to integration and data
fusion.
• As different methods of data fusion behave differently in different applications, it
is not trivial to choose the best method for a specific task.
• Challenges during the data fusion design phase. At which level of abstraction,
reduction and simplification the data should be fused ?
• The lack of a unified framework that could orient the process of data fusion
towards a “structured data fusion” vision.
Conclusions and future work
• Information fusion as a an enabling process for novel applications
- Future work oriented towards the “structured data fusion” idea
• Privacy
- Assesment of variations of existing privacy preserving techniques (D.P.)
Publications
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness
for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in
enabling technologies for collaboration 17-20 2013.
• Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data
”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014.
• Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and
Humanized Computing, pp 1– 15.
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social
Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers
Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227.
• Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between
Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

Mais conteúdo relacionado

Mais procurados

A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingPayamBarnaghi
 
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIBig Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIRuchika Sharma
 
CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities PayamBarnaghi
 
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust networkBig Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust networkRuchika Sharma
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
 
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...FarzaneH Karegar
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...Elena Simperl
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesPayamBarnaghi
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...mlaij
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart citiesPayamBarnaghi
 
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Kumar Goud
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social mediarangesharp
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so farPayamBarnaghi
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPayamBarnaghi
 

Mais procurados (20)

A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
 
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHIBig Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
 
CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities
 
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust networkBig Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
 
Dotnet ieee titles 2013 14
Dotnet ieee titles 2013 14Dotnet ieee titles 2013 14
Dotnet ieee titles 2013 14
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart cities
 
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
 
Q046049397
Q046049397Q046049397
Q046049397
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so far
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
 

Semelhante a Presentation of PhD thesis on Location Data Fusion

La telefonía móvil como fuente de información para el estudio de la movilidad...
La telefonía móvil como fuente de información para el estudio de la movilidad...La telefonía móvil como fuente de información para el estudio de la movilidad...
La telefonía móvil como fuente de información para el estudio de la movilidad...Esri España
 
Understanding Human Mobility
Understanding Human MobilityUnderstanding Human Mobility
Understanding Human MobilityWidy Widyawan
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper PresentationShubham Singh
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City ApplicationsAmit Sheth
 
Spatial Computing and the Future of Utility GIS
Spatial Computing and the Future of Utility GISSpatial Computing and the Future of Utility GIS
Spatial Computing and the Future of Utility GISGeorge Percivall
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymizationarx-deidentifier
 
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...InfinIT - Innovationsnetværket for it
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Andreas Kamilaris
 
SocIoTal: Creating a Citizen - Centric Internet of Things
SocIoTal: Creating a Citizen - Centric Internet of ThingsSocIoTal: Creating a Citizen - Centric Internet of Things
SocIoTal: Creating a Citizen - Centric Internet of ThingsDunavNET
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis Jari Jussila
 
u world 2012, Dalian, China
u world 2012, Dalian, China u world 2012, Dalian, China
u world 2012, Dalian, China Arpan Pal
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open DataBlerina Spahiu
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Internet of Things: Research Directions
Internet of Things: Research DirectionsInternet of Things: Research Directions
Internet of Things: Research DirectionsDavide Nardone
 
Intelligent Data Processing for the Internet of Things
Intelligent Data Processing for the Internet of Things Intelligent Data Processing for the Internet of Things
Intelligent Data Processing for the Internet of Things PayamBarnaghi
 

Semelhante a Presentation of PhD thesis on Location Data Fusion (20)

La telefonía móvil como fuente de información para el estudio de la movilidad...
La telefonía móvil como fuente de información para el estudio de la movilidad...La telefonía móvil como fuente de información para el estudio de la movilidad...
La telefonía móvil como fuente de información para el estudio de la movilidad...
 
Understanding Human Mobility
Understanding Human MobilityUnderstanding Human Mobility
Understanding Human Mobility
 
Big Data and IOT
Big Data and IOTBig Data and IOT
Big Data and IOT
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper Presentation
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
Spatial Computing and the Future of Utility GIS
Spatial Computing and the Future of Utility GISSpatial Computing and the Future of Utility GIS
Spatial Computing and the Future of Utility GIS
 
DBMS
DBMSDBMS
DBMS
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
 
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
Extracting Value from Big Data - The Case Vehicular Traffic Data by Christian...
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...
 
SocIoTal: Creating a Citizen - Centric Internet of Things
SocIoTal: Creating a Citizen - Centric Internet of ThingsSocIoTal: Creating a Citizen - Centric Internet of Things
SocIoTal: Creating a Citizen - Centric Internet of Things
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
u world 2012, Dalian, China
u world 2012, Dalian, China u world 2012, Dalian, China
u world 2012, Dalian, China
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Internet of Things: Research Directions
Internet of Things: Research DirectionsInternet of Things: Research Directions
Internet of Things: Research Directions
 
Intelligent Data Processing for the Internet of Things
Intelligent Data Processing for the Internet of Things Intelligent Data Processing for the Internet of Things
Intelligent Data Processing for the Internet of Things
 

Mais de Alket Cecaj

Distributed systems and blockchain technology
Distributed systems and blockchain technologyDistributed systems and blockchain technology
Distributed systems and blockchain technologyAlket Cecaj
 
Elaborazione e rappresentazione grafica e interattiva dell'informazione
Elaborazione e rappresentazione grafica e interattiva dell'informazioneElaborazione e rappresentazione grafica e interattiva dell'informazione
Elaborazione e rappresentazione grafica e interattiva dell'informazioneAlket Cecaj
 
Collective awareness for human ict collaboration in smart cities
Collective awareness for human ict collaboration in smart citiesCollective awareness for human ict collaboration in smart cities
Collective awareness for human ict collaboration in smart citiesAlket Cecaj
 
Algorithms presentation
Algorithms presentationAlgorithms presentation
Algorithms presentationAlket Cecaj
 
Bridges innovcampdk
Bridges innovcampdkBridges innovcampdk
Bridges innovcampdkAlket Cecaj
 

Mais de Alket Cecaj (6)

Distributed systems and blockchain technology
Distributed systems and blockchain technologyDistributed systems and blockchain technology
Distributed systems and blockchain technology
 
Joomla
Joomla Joomla
Joomla
 
Elaborazione e rappresentazione grafica e interattiva dell'informazione
Elaborazione e rappresentazione grafica e interattiva dell'informazioneElaborazione e rappresentazione grafica e interattiva dell'informazione
Elaborazione e rappresentazione grafica e interattiva dell'informazione
 
Collective awareness for human ict collaboration in smart cities
Collective awareness for human ict collaboration in smart citiesCollective awareness for human ict collaboration in smart cities
Collective awareness for human ict collaboration in smart cities
 
Algorithms presentation
Algorithms presentationAlgorithms presentation
Algorithms presentation
 
Bridges innovcampdk
Bridges innovcampdkBridges innovcampdk
Bridges innovcampdk
 

Último

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 

Último (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Presentation of PhD thesis on Location Data Fusion

  • 1. Information fusion for location data analysis Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei Doctorate School in Industrial Innovation Engineering
  • 2. Thesis outline • Introduction to Data Fusion Methods • Location Data and Application Scenarios • Data Fusion for Event Detection and Event Description • Re-identification of Anonymized CDR Records Using Information Fusion • Privacy issues • Conclusions
  • 3. Location data and application scenarios Data • Location data such as CDR (Call Description Records) • Geo-tagged social network data or data from LBS • Open data with a location dimension such as census data Applications • Social – economic development (D4D) . • Smart mobility applications, land use and city management • Ground truth information for validation analysis
  • 5. Introduction to data fusion methods • Stage based methods. • Feature level-based. • Semantic meaning-based data fusion methods
  • 6. Location data fusion : side effect • Data fusion enables a huge number of applications • Privacy risks for individual data
  • 7. Data fusion for event detection / description by using aggregated CDR data and geo-tagged social network data Detecting and describing events happening in urban areas by analysing spatio – temporal data Detecting and describing events happening in urban areas by analysing spatio – temporal data Riferimento all’articolo
  • 8.
  • 10. The dataset: spatio-temporal aggregation Spatial Aggregation Temporal aggregation
  • 12. Outlier detection method Median method : [LB,UB] = [Q50 – k*Q50, Q50 + k*Q50] IQR method : [LB,UB] = [Q25 – k*IQR, Q75 + k*IQR] Q75 method : [LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
  • 13. Groundtruth dataset  Football matches  Fairs  Protests  Other events Events happeing in the period of time the data covers
  • 14. Measuring precision and recall of the system True positives (tp) False positives (fp) False negatives (fn) Precision = tp / (tp + fp) Recall = tp / (tp + fn)
  • 15. Precision – Recall of event detection system
  • 16. Precision – Recall Milano vs Trentino SMS-Call
  • 17. Precision – Recall Milano vs Trentino SMS-Call
  • 18. Precision – Recall Milano vs Trentino SMS-Call
  • 19. By combining the results from the two datasets • Improvement of precision – recall performance of the method • The improvement is limited in the long run by the main dataset. • The same improvement can be observed also by joining the results of the other datasets. Improving event detection results by data fusion
  • 20. By using the CDR the events can be detected but not described: • By joining the results the data can complement and enrich each other. • In this case the social dataset can be used to describe semantically the events Data fusion for Event description
  • 21. Confronting the results with other works on event detection • Two other similar works • Using much more sophisticated algorithms • Comparable results
  • 22. Re-identification of CDR data by using social network geo-tagged data • Fine grained social and CDR user data • Mobility paths • Uniqueness of mobility prints • Matching of user’s mobility path • Re-identification probability evaluation • The groundtruth problem.
  • 23. Location data : CDR and social CDR data 1. Massive dataset about millions of users 2. Released in an anonymized format 3. Regularly sampled 4. Tower granularity (400 – sev. kml) Geo-tagged social data 1. Sparse data following exp. distrib. (too many users too little events per user) 2. Not anonymized 3. Irregular samplinig 4. Precise (GPS or triang. Loc.)
  • 24. Re-identification of CDR data by using social network geo-tagged data • Anonymization.. and re-identification • Movie ratings from NetFlix Prize dataset • Medical records of Massachusetts Hospital using a voters list • Re-identification of anonymous volunteers in a DNA study for Personal Genome Project • In line with our domain • Unique in the Crowd: the privacy bounds of Human Mobility • Markov chain models for de-anonymization of geo-located data
  • 26. Mobility measures : radius of gyration Knowledge extraction : radius of gyration
  • 28. Radius of gyration : Social Network Data
  • 29. Mobility measures and uniqueness of users mobility Knowledge extraction : uniqueness of traces
  • 30. Mobility measures and uniqueness of users mobility Sample of 1000 users from each CDR dataset Knowledge extraction : uniqueness of traces
  • 31. Knowledge extraction : uniqueness of traces statistics
  • 32. Knowledge extraction : matching users from different datasets CDR and social dataset
  • 33. Data fusion : matching algorithm
  • 34. Knowledge extraction : matching statistics
  • 35. • Matching by chance : Bonferroni principle • False social user’s events created : a) in a random way b) by clonning events (+1km, +30min) • As a result we have 60 % less in the number of matchings in the first case and 40% in the second case Data fusion : considerations
  • 36. As real identity of CDR users is missing, a validation of these results is difficoult. Flickr user is Twitter user (mobility traces overlapping and similar usernames) and (the only) CDR user. MCC field of the CDR record matching with the language used for describing pictures and tweets content. Data fusion : groundtruth validation
  • 37. Data fusion : considerations
  • 38. Reidentifying CDR users : probabilistic approach Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two users are the same?
  • 39. • Question which is both novel (no other works addressing it in this domain) and fundamental • Conditional probability Re-identification : probabilistic approach Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two users are the same?
  • 41. Privacy risks for pesonal data The revelatory potential power of location data • Location of a person’s home. What kind of city area does he lives in? • Locations of the stores a person frequent and from this information shopping patterns can be inferred preferences and in some cases religious belief. • There are also other types of very sensitive data such as health records. These can be deduced by locations of doctors and hospitals the person visits • By linking two or more locations on time and space, mobility paths may be inferred.
  • 42. Privacy risks : privacy preserving techniques • Data Anonymization a) K-anonymity in different improved versions b) Possible reidentification of location data as already showed • Data Suppression a) Suppression and aggregation b) Utility of the dataset after suppression dramatically reduced
  • 43. Challenges • One of the main challenges is the lack of common engineering standards for data fusion systems. It has been one of the main impediments to integration and data fusion. • As different methods of data fusion behave differently in different applications, it is not trivial to choose the best method for a specific task. • Challenges during the data fusion design phase. At which level of abstraction, reduction and simplification the data should be fused ? • The lack of a unified framework that could orient the process of data fusion towards a “structured data fusion” vision.
  • 44. Conclusions and future work • Information fusion as a an enabling process for novel applications - Future work oriented towards the “structured data fusion” idea • Privacy - Assesment of variations of existing privacy preserving techniques (D.P.)
  • 45. Publications • Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in enabling technologies for collaboration 17-20 2013. • Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data ”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014. • Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and Humanized Computing, pp 1– 15. • Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227. • Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

Notas do Editor

  1. Introduzione ai metodi di data /information fusion. In particolare si parla di data o di information fusion a seconda che si tratti di una integrazione di basso o alto livello. I vari tipi di dati geo-referenziati e le diverse applicazioni che questi dati possono avere. Uno studio di rilevamento automatico di grandi eventi in aree urbane usando dati aggregati di telefonia mobile e dati social geo-referenziati. Dai dati aggregati si passa ai dati anonimizzati CDR che mostrano tracce di mobilità individuali. In questo lavoro si studiano diverse caratteristiche come l’unicità di queste tracce e di come questo può impattare la privacy. Alla fine, insieme alle conclusioni si presentano diversi punti aperti (sfide ancora aperte) da risolvere sia per quanto riguarda il campo di data fusion che quello sulla privacy preserving.
  2. La grande mole di dati generati durante la routine quotidiana come ad esempio I dati geo-referenziati come ad esempio i CDR (Call Description Records), i dati geo-referenziati che è possibile ottenere dai social network o (LBS come Foursquare) oppure gli open data come quelli del census. Dall’altra parte le applicazioni che derivano sono tante. Dal punto di vista dello sviluppo sociale si possono menzionare lavori che studiano i dati geo-referenziati a capire il meccanismo di diffusione delle malattie oppure i livelli di povertà nelle varie aree urbane, tutti studi che contribuiscono a orientare possibili interventi in questo senso. In un ambito smart city tali dati permettono di capire le varie dinamiche nelle grandi città come i commute patterns e land use tutte informazioni utili a capire e gestire al meglio una città. Anche se questi dati presi singolarmente sono utilissimi per le applicazioni menzionate prima, possono risultare molto più potenti se combinati o integrati in un’unica rappresentazione. Ad esempio anche se i CDR forniscono un indicazione su un grande raggruppamento di persone in una certa zona una volta combinati con i dati social possono rivelare anche il perché di un tale evento.
  3. Questo processo di combinazione e integrazione degli dati o data fusion punta ad analizzare i dati cosi che ciascun data set possa interagire, informare e completare gli altri data set. Record matching vs knowledge fusion.
  4. This is a category that uses different data sets that are in different stages of the process of data mining. Following this category, the data sets are loosely coupled without any requirements on their consistency. This method treats features extracted from different data sets and creates an array by concatenating them. This array can then be used in clustering and classification methods. 3. These methods take in consideration the relations between features in different data sets. This implies that the data miner knows what each data set represents, and why they can be fused or why they re-inforce each other in terms of enrichment of information.
  5. Data such as anonimyzed CDR or social network datasets
  6. By following the diagram in the first chapter we present the steps for applying the data – fusion methods.
  7. Milano Grid and time series of the activity levels of one of the cells during the two months period
  8. Big data challenge 2014 : aggregated CDR data and geo-tagged social network data tables .
  9. Faster computation as there are less entries
  10. The data used in the previous study were aggregated . It means that there were no personal data provided– they just provide the level of mobile phone activity in a certain geographic area identified by a square cell inside a grid . However there are many cases in which CDR data are released in a fine grained temporal and location scale, where personal anonymized data are provided. That means that individual mobility traces can be spotted and analyzed. In the same way geo-tagged social data form location based services such as Foursquare, or social networking services such as Twitter or Flickr can reveal location traces of their users.
  11. The data used in the previous study were aggregated . It means that there were no personal data provided– they just provide the level of mobile phone activity in a certain geographic area identified by a square cell inside a grid . However there are many cases in which CDR data are released in a fine grained temporal and location scale, where personal anonymized data are provided. That means that individual mobility traces can be spotted and analyzed. In the same way geo-tagged social data form location based services such as Foursquare, or social networking services such as Twitter or Flickr can reveal location traces of their users.
  12. Conclusions for this part : the uniqueness test shows the number of points needed for singleing out the mobility traces of 80-95 % of the overall users. A number of maximum 7 points is needed to do this. This number is not affected by the time intervall of the matchin process. The same can be sad for the percentage of the users paths singled out.
  13. Having discovered the number of points sufficient to single out a CDR user we proceed in matching the CRD users with the social ones . In the graphics a simple matching process between CDR and social data. While the C4 and C3 can be excluded due to their producing data in different locations in the same moment, nothing can be sad for C2 and C1. That’s why we use a probabilistic approach that could tell us (within a reasonably limit if a CDR user is the same social user with wich the events are matching )
  14. Conclusions for this part : the matching test shows the number of cdr users with which the social users match for a given number of points. That means that every social user has at least one point in common with (on average) 1000 CDR users. It has two points in common (on average) with 100 CDR users, it has 13 points in common with (on average) 50 CDR users. Analogously the percentage of CDR users with which the social users have 1, 2, 3…15 points in common decreases.