Analysis, modelling and protection of online private data.

Advisors:
Dr. Jordi FORNE ́ MUÑOZ
Dr. David REBOLLO MONEDERO
In partial fulfilment of the requirements for the degree of: Doctor of philosophy.
Silvia Puglisi
silvia.puglisi@upc.edu
Analysis, modelling and protection
of online private data.

Agenda
Background. Introduction and scope of the investigation.
Objectives. Objectives of the investigation.
Ongoing and future work. Publications and current research efforts.

Online privacy
Is Privacy the right to be forgotten?
In 2011, the amount of digital information created and replicated globally exceeded
1.8 zettabytes (1.8 trillion gigabytes).
75% of this information is created by individuals through new media fora such as
blogs and via social networks.
By the end of 2011, Facebook had 845 million monthly active users, sharing over 30
billion pieces of content.
Library Briefing - Library of the European Parliament - 01/03/2012

What is online privacy anyway?
In an online context, the right to privacy has commonly been interpreted as a right to
“information self-determination”.
Acts typically claimed to breach online privacy concern the collection of personal
information without consent, the selling of personal information and the further processing
of that information.

Do we have online privacy?
Irani, Danesh et al. [1] describe how personal information leaks on social networks can
be used for concrete attacks.
Acquisti, Alessandro, and Ralph Gross [2] also presented a method to infer people Social
Security numbers by using only publicly available information.
Goga, Oana et al. [3] describe how user activity on one site can implicitly reveal their
identity onto another site.
Chen, Terence et al. [4] showed a correlation between the amount and type of
information revealed in social network profiles.

The age of the “metadata”
“meta-data” is collected and stored by public and private organisations about where,
when and who created and accessed a particular online content.
In the private sphere it has been said that “literally, Google knows more about us than we
can remember our-selves”.
This situation has led to growing concerns regarding online privacy.
In China, one estimate suggests there are over 30 000 government censors monitoring
online information.
Library Briefing - Library of the European Parliament - 01/03/2012

Ex: Google Conversion Tracking
<html>
<body>

<a onclick="goog_report_conversion('tel:949-555-1234')" href="#" >CALL
NOW</a>
</body>
</html>
Some websites implement Google forwarding number that measures the calls made by
potential customers.

Why metadata matters?
Metadata is more interesting than actual information. Ex:
● They know you called the suicide prevention hotline. But the actual conversation
remain secret.
● They know you checked HIV related websites, talked to a HIV testing service, then
spoke to your doctor. But they don’t know what was discussed during the calls.
Furthermore, Bizer, Christian et al. [5] have shown how websites already embed
structured data to describe product, services, events, and make user information
available already into their HTML pages using markup standards such as Microformats,
Microdata and RDFa.

Hyperdata && Hypermedia
Hyperdata indicates data objects linked to other data objects in other places, as
hypertext indicates text linked to other text in other places.
Hyperdata enables formation of a web of data, evolving from the "data on the Web" that
is not inter-related (or at least, not linked).
Hypermedia, an extension of the term hypertext, is a nonlinear medium of information
which includes graphics, audio, video, plain text and hyperlinks.
Source: Wikipedia

What is REST?
REST, an architectural style introduced by Roy Thomas Fielding in 2000, which has been
at the core of the web design and development.
REST represents an abstraction over the actual architecture of the web.
In REST identification, representation and format are independent concepts.
Specifically:
An URI can identify a resource without knowing what formats the resource uses to
exchange representations.
Likewise the protocols and representations used by the resource to communicate
can be modified independently from the URI identifying the resource.

REST Interfaces
The uniformity of REST interfaces is build upon four guiding principles:
● The identification of resources through the URI mechanism.
● The manipulation of resources through their representations.
● The use of self-descriptive messages.
● Implementing hypermedia as engine of the application state (HATEOAS)

Hypermedia and privacy protection
Information self-determination is not even possible if users have no control on their online
footprint.
Hypermedia provides context over unstructured footprint information.
Users and applications use REST interfaces to interact with one another exchanging
resource representations.
The web follows REST principles and so do users’ online traces.

Hypermedia and privacy protection
Genc,Yegin,et al. [6] introduce a method to map text message into a wide context, and by
computing the distance between them, classify their content.
Ducheneaut, Nicolas et al. [7] explain how recommender systems need to incorporate
contextual information from the physical world, as users move continuously and
frequently engage in a variety of activities.
Sakaki, Takeshi et al. [8] discuss how real-time interaction between online users and the
offline world can be used to detect target events, turning the actual users into sensors
themselves.

Objective 1
Development of a hypermedia model of the
user online footprint

Objective 1
This hypermedia model of the user online footprint is constructed by analysing the
different interactions that the user has online with various services and platforms.
Hyperme is the proposed hyperdata model of a user online footprint.
The hyperme model links the user footprints created across different services and the
features associated with them in a hypergraph.
The user footprints is therefore transformed into an object that can be explored based on
some desired features.

Objective 1
Users stream private information
towards devices, applications and
platforms.
These information is shared with
groups of different people with distinct
access rights.
Private (?) information is only shared
with service providers.

Objective 1
The hyperme model capture
different aspect of user activities
online:
● Everything in the hyperme
model is a signal.
● Signals can be easily profiled.
● Signals can be linked
between each other.
● Footprints become objects
that can be explored.

Objective 1
The last two weeks
activity of Stephen Fry
twitter account have been
analysed (@stephenfry)

Objective 2
Analysis of data flows from social networks to
third party advertisers

Objective 2
The aim of this objective is understanding what data is leaked by third party advertising
networks and how third party advertising networks and social platforms track users as
they surf the web.
The exchange of identity information is followed from the client to third party advertising
platforms.
Methods implemented by third party advertising networks are discovered and classified
by analysing network requests (HTTP) and actual data flow (JavaScript calls).
Mathematical distance between the user profile and the observed advertising profile is
taken as a measurement of how accurately third party platforms are tracking the user.

Objective 3
Evaluation of different PETs in Content
Recommendation Systems

Objective 3
The goal of this objective is the evaluation of different PETs in Content Recommendation
Systems.
Our aim is to show how a recommendation system is affected by the application of
certain PETs by a part of the user population.
Users may, in fact, wish to protect their privacy while also maintaining a satisfactory level
of utility of the information received by the recommendation platform.
Different levels of privacy protection are evaluated.

Objective 4
Evaluation of different PETs to prevent
information leaks on third party advertising
networks

Objective 4
The goal of this objective is the evaluation of different PETs to prevent third party
advertising networks to pervasively track users through their browsing pattern and social
platform profile.
In particular we are concerned with understanding how third-party advertising network
can be prevented to access certain private data regarding the user.

Objective 5
Extension of the hyperme model to cover
aspects of location identity

Objective 5
This objective aims at:
Analysing the amount and extent of geographical tagged information shared through
online activities.
Establish links between location information and spatial context.
Evaluate different PETs to protect user’s location privacy.

Ongoing and future work
At the moment we are applying the hyperme hypermedia model to profile user activities
online.
We are especially concerned with answering the following questions:
● How is advertising influenced by online activities?
● To what extent does social networks activity influence third party advertising?
● To what extent can mobile phone activity influence third party advertising?
● What PETs can be implemented to protect users’ privacy?

Ongoing and future work
We are collaborating with Dr. Markus Huber @ SBA Research (Vienna, Austria) on the
following topics:
● Analyse Alexa Top Million websites to make a statistics of current tracking services
implemented.
● Testing current anti-tracking technologies to find how effectives these are.
We are aiming at submitting a paper to the 36th IEEE Symposium on Security and
Privacy.

Publications
The following article was submitted an article to the journal Computer Standards &
Interfaces, on the topic of content based recommendation systems and privacy
enhancing techniques:
S. Puglisi, J. Parra-Arnau, D. Rebollo-Monedero and J. Forne ́,
On Content-Based Recommendation and Users Privacy in Social Tagging Systems,
Preprint submitted to Computer Standards & Interfaces, April, 2014. Submitted for
publication.

I grew up with the understanding that the world I lived in was one where people enjoyed a
sort of freedom to communicate with each other in privacy, without it being monitored,
without it being measured or analyzed or sort of judged by these shadowy figures or
systems, any time they mention anything that travels across public lines.
- Edward Snowden
Thank you.

References
[1] D. Irani, S. Webb, and C. Pu, “Modeling unintended personal-
information leakage from multiple online social networks,” IEEE
Internet Computing, 2011.
[2] A. Acquisti and R.Gross,“Predicting social security numbers from
public data,”in Proceedings of the National academy of sciences,
2009.
[3] O. Goga, H. Lei, S. H. K. Parthasarathi, G. Friedland, R.
Sommer, and R. Teixeira, “Exploiting innocuous activity for
correlating users across sites,” in Proceedings of the 22nd
international conference on World Wide Web, 2013.
[4] T. Chen, M. A. Kaafar, A. Friedman, and R. Boreli, “Is more
always merrier? a deep dive into online social footprints,” in
Proceedings of the 2012 ACM workshop on Workshop on online
social networks, 2012.
[5] C.Bizer, K.Eckert, R.Meusel, H.Mü hleisen, M.Schuhmacher, and
J.Vo ̈lker, “Deployment of rdfa, microdata, and microformats on the
web – a quantitative analysis,” in 12th International Semantic Web
Conference, 21-25 October 2013, Sydney, Australia, In-Use track,
2013.
[6] Y.Genc,Y. Sakamoto, and J. Nickerson, “Discovering context:
Classifying tweets through a semantic transform based on
wikipedia,” in Foundations of Augmented Cognition. Directing the
Future of Adaptive Systems, ser. Lecture Notes in Computer
Science, D. Schmorrow and C. Fidopiastis, Eds. Springer Berlin
Heidelberg, 2011, vol. 6780, pp. 484–492.
[7] N. Ducheneaut, K. Partridge, Q. Huang, B. Price, M. Roberts, E.
Chi, V. Bellotti, and B. Begole, “Collaborative filtering is not
enough? experiments with a mixed-model recommender for leisure
activities,” in User Modeling, Adaptation, and Personalization, ser.
Lecture Notes in Computer Science, G.-J. Houben, G. McCalla, F.
Pianesi, and M. Zancanaro, Eds. Springer Berlin Heidelberg, 2009,
vol. 5535, pp. 295–306.
[8] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes
twitter users: real-time event detection by social sensors,” in
Proceedings of the 19th international conference on World wide
web. ACM, 2010, pp. 851–860.

Analysis, modelling and protection of online private data.

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (8)

Similar to Analysis, modelling and protection of online private data.

Similar to Analysis, modelling and protection of online private data. (20)

Recently uploaded

Recently uploaded (20)

Analysis, modelling and protection of online private data.