1. The document discusses privacy challenges in the era of big data. It defines big data as extremely large data sets that are difficult to store, manage, and process using traditional methods due to the volume of data and processing speed/costs.
2. While big data provides benefits from insights discovered through analysis, it also challenges core privacy principles. Data collected and analyzed at large scale may not be truly anonymous, and re-identification is possible using additional data sources. Existing privacy laws may not cover analysis of non-personal data.
3. To address privacy risks, the document recommends expanding definitions of personal data and consent under privacy laws. Organizations collecting and processing big data should also implement privacy impact assessments, be
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Privacy challenges in the age of big data
1. Privacy in the age of
’BIG DATA’
56th UIA Dresde Congress - November 1st, 2012
‘Rights of the Digital Person’
Marc Gallardo
email:
marc.gallardo@alliantabogados.com
2. # Summary
1.- What is ‘Big Data’
2.- Big Benefits
3.- Big Privacy Challenges
4.- Final Remarks
3. # 1 Definition
‘Big data usually refers to data sets
whose size is beyond the ability of
commonly-used technology tools to
capture, store, manage, and process
the data within a tolerable elapsed
time and cost’
Not a new concept: « data mining »
6. 5 exabytes of information created between the
dawn of civilization through 2003
Now 3 exabytes are created every day
1 terabyte (TB) = 1000 gigabytes (GB)
1 petabyte (PB) = 1.000.000 gigabytes (GB)
1 exabyte (EB) = 1.000.000.000 gigabytes (GB)
1 zettabyte (ZB) = 1.000.000.000.000 gigabytes (GB)
90 % of the data that now exists has been
created in the last 2 years
… and the pace is growing
9. Tech
data Innovation
Software
Collection (Hadoop, NoSQL)
Vast amount of Hardware
Storage
data Processing (faster processors,
cheaper, bigger storage)
Sense-making
BIG DATA
12. # 3 Privacy Risks
Big Data challenges
some of the core
privacy principles
13.
14.
15. Is the information amassed for such
analysis TRULY ANONYMOUS?
We can not be sure !!!
It can be relatively easy to take some
types of de-identified data and
reassociate it with specific individuals
16.
17. Re-identification of data subjects
using Non Personal Data (NPD)
Whether or not NPD that forms
the basis for data extractions of
new knowledge is covered by our
data protection laws
18.
19. Personal data is any
information about
identified or identifiable
person
20.
21. # De Lege Ferenda
Definition of PD and data subject
might be expanded to cover
technologies (i.e. data mining) that
make reverse engineering of forms of
« anonymisation » more feasible.
> crux point for the Regulation not to
become quickly obsolete.
22. Consent of Data Subject:
Freely given, specific, informed & explicit:
statement or affirmative action.
The problem under BD scenario is the DC
don’t know in advance what he may discover
after mining data so the data subject cannot
knowingly consent to the use of his data.
23. Automated individual decisions (AID) art. 15 DPD
Grants the right not to be subject to a decision
that produces legal effects which is based solely on
automated processing of data intented to evaluate
certain personal aspects.
Art. 12(a) grants the right to discover « the
knowledge of the logic ».
Limited scope: human intervention / knowledge
and remedy.
24. Automated individual decisions (AID) art. 20 DPR
Grants same right to oppose more broadly: not
only « evaluate » but analyse or predict the
person’s perfomance at work, economic situation,
location, health, personal preferences, reliability or
behaviour.
Right to « know the logic » is eliminated.
Right to know the existence and envisaged effect
of profiling.
25. To BD collectors & processors:
I. Engage PIA to identify and address risks relating
to BD analysis
2.- Be clear about what you collect and process
3.- Use de-identification techniques
4.- Secure the data to avoid data breaches
26. Good trend and the real challenge
for regulators
Preserve BD rewards
whilst seeking to
minimize privacy risks
Put simply .. Not a new concept … is a more powerful version of knowledge discovery in databases or data mining which has been defined as « the non trivial extraction of implicit, previously unknown and potentially useful information from data » which also enables firms to discover or infer previously unknown facts and patterns in a databse. The term big data describe a new generation of technologies and architectures designed to economically extract value for large volumes of a wide variety of data. Obviously as tech changes and improves the size of a dataset that would qualify as big data would also change.
1.- Volume: the main attraction to BD analytics. Most immediate challenge for to conventional IT structures because you need scalable storage and distributed approach for querying . 2.- Velocity: important to take data fast from input to decision (called streaming data).input and output data. The quicker the greater the competitive advantage. The results might go directly into a product such as a recommendation feature or into dashboard used to drive decision-making. 3.- Variety: rarely does the data present itself in a form perfectly ordered and ready for processing. It can be data feed direcly from a sensor source and social network data. None of this things come ready for integration into an application. Risk of loss of information when moving from source data to processed application data. Choice on software depending on how structured the data are (variety comes into play). The terms has been invented by big tecnology companies eager to sell their software and software. Some of the big players are IBM, HP, Oracle, … ANALYTICAL USE to gain competitive advantage. Extract value: mathematitians are now suddenly sexy. As a lawyer i have always found those with a facility with numbers to be appeling. I’m happy to see im not the only one and others agree wiith me. Successfully exploiting the value in BD requires experimentation and even access to best data decyphering tool is not guarantee of great wisdom. Very few companies have people on staff with the training not to only evaluate mountains of data but also to do something with it. Capturing data is one thing making it useful is a whole other.
-> what this means is that the amount of data that companies, governements and people are creating is growing exponentially and that does not even begin to point across. -< yotabytes: 1 billion zetabytes Generally speaking experts consider petabytes of data volumes as the starting point for BD Market research firm IDC estimates that 1200 exabytes of data will be generated this year alone 3 exabytes every ten minutes. Projected 2012 sales of 367,2 million PCs, 107 million tablets, 650 million smartphones.
Not only persons feed data to the Internet, things can do it. Low cost sensors (RFID: key of your car, packages logistics sector) : digital thermostat combining sensors, machine learning and web technology, it senses not just air temperature, but the movements of people in the house their comings and goings and adjust rooms temperature to save energy. There is a lot more data generated with these sources and we can observe that they are entirely new sources of data (sensors) not just more stream of data. There are now countless digital sensors worlwide in industrial equipment,automobiles … that are communicating data to computing intellenge creating the IoT or the Industrial Internet.
New context: BG trend is MORE DATA, FASTER COMPUTERS and NEW ANALYTIC TECHNIQUES Hardware falling computing costs and scalable, distributed data processing models and open source software as Hadoop bring BD processing into the reach of the less well resourced. Hadoop is an open source software for working with BD. It was derived from Google tech and put into practice by Yahoo and others. But BD is too varied and complex for a one size fits all solution. While Hadoop has surely captured the greatest name recognition it is just one of the 3 classesof tech weel suited to storing and managing BD. The other 2 are non SQL and Massive Parallel Processing data stores. Sense making over data: which is why we have the data to begin with. Also big players providing BD solutions: IBM, Oracle, SAP, Microsoft, HP. Google (bigquery software that can scan terabytes of information in seconds).
Uses of big data can be transformative, potential benefits are vast and still largely unrealised. Smart grid: directional data flux the user receives electricity as usual but send information about what how much it consumes to be analysed, companies supplying electricity can manage this good more efficiently and adopt more rational decions about energy production (once produced electricity can be stored and must be consumed immediately). Companies: Analysts at Forrester Research estimate that enterprises use only 5% of their available data, leaving the field open to those who wants to fill up the remaining 95% and obtain th hidden value their data holds, illuminating trends, unlock new sources of economic value, improve business processes and more. Google flu trends a tool using aggregate search queries to identfy flu outbreaks by region.
I would’nt claim to have all the answers INCREASE OF DATA SUBJECTS WHOSE DATA WILL BE PROCESSED INCREASE OF DATABASES CONTAINING THESE TYPE OF DATA INCREASE OF ‘INTELLIGENCE’ OF PROCESSINGS: AGGREGATED DATA Privacy and data protection means the same thing in the age of big data as it always has but the capacity of machines to capture, store, process, synthetise and analyse details about everyone has forced new boundaries. Digital data now available to organizations or the novel ways in which BD combines these diverse data sets. BD not suprinsingly intensify existing privacy concerns over tracking and profiling.
Data is not deidentified simply because you strip of a name or an address, now much of our personal information is linked to specific devices like smartphones or laptops through UDIDs, IP adresses, fingerprinting an other means which are personally identifiable.
And once created would be regulated as personal data? Regulatory dilemma.
An identifiable person is one who can identified, directly or indirectly, in particular by reference to an identitication number or to one or more specific factors
Neither silence nor inactivity can constitute valid consent.
AID gains importance as far as BD intensifies the use of automated decision – making by substantially improving its accuracy and scope Knowledge of the logic involved in any automatic processing of data concerning him Limited remedies: it requires that the data controller brings some human judgement by reviewing the factors forming the basis of the automated decision
AID gains importance as far as BD intensifies the use of automated decision – making by substantially improving its accuracy and scope Knowledge of the logic involved in any automatic processing of data concerning him Limited remedies: it requires that the data controller brings some human judgement by reviewing the factors forming the basis of the automated decision. Should include the the controller obligation to inform data subjects on techinques and procedures for profiling (algortyms). As well as document results of profiling in case of complaints
BD’s impact on privacy requires some new and hard thinking of all of us. Be clear about what you collect: Compete case (FTC De-identify but do not ignore the fact big data can increase the risk of re-identification We need to pay attention to these issues so that bd IS REALIZED and the risks are kept to minimum. Industriy has a strong and justifiable need to contnue to innovate but we need to discuss further about collection and use in this ecoystem to instill consumer trust in the online and mobile marketplace.