SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
Security and Privacy in a Big Data World
                         Dr. Flavio Villanustre, CISSP, LexisNexis Risk Solutions
VP of Information Security & Lead for the HPCC Systems open source initiative
                                                                28 January 2013
But what is Big Data?
•   Gartner told us that it’s defined by its four dimensions:
       •   Volume
       •   Velocity
       •   Variety
       •   Complexity
•   Driven by the proliferation of social media, sensors, the
    Internet of Things and the such (a lot of the latter)
•   Became accessible thanks to open source distributed data
    intensive platforms (for example, the open source
    LexisNexis HPCC Systems platform and Hadoop)
•   Made popular by consumer oriented services such as
    recommendation systems and search engines


                                                                2
Big Data platforms: key design principles

•   Distributed local store
    • Move algorithm to the data – exploit locality
•   Many data problems are embarrassingly parallel
    • Leverage massive resource aggregation:
       •   Thousands of execution cores
       •   Hundreds of disk controllers
       •   Hundreds of network interfaces
       •   Terabytes of memory and massive memory bandwidth
•   Storage is cheap
•   Moving data into the system takes time (hence keep the
    data around, if possible)
•   It’s fine (and encouraged) to perform iterative exploration
•   In the end: It’s just and all about the data
                                                                  3
A timeline of the main open source Big Data platforms

 LexisNexis designs
 the HPCC Systems                             Google’s
platform to meet its                     MapReduce Paper        First Hadoop                 These platforms gain
own Big Data Needs.                        is Published.           Summit                    mainstream adoption




      Late
    90s/Early          2001               2004         2007            2008           2011               2012
     2000s



             The first few systems are              First Release of           The LN HPCC Systems
              sold to companies and               Hadoop (designed              platform is officially
                   organizations                  after Google’s Map            released as an open
                                                     Reduce ideas)                 source project




                                                                                                                4
Just when we thought that we knew data security…
Big Data is not your dad’s data
   a.   More data sources (beware! data + data > 2 * information)
   b.   Boiling the ocean is at the reach of your hand
   c.   Public clouds offer scalability but introduce risks
   d.   Distributed data stores can blur boundaries
   e.   Leveraging diverse skills implies more people accessing the data

Old tricks may not work
   a.   Tokenization only goes so far
   b.   Encryption at rest just protects against misplaced hardware
   c.   Tracking data provenance is hard
   d.   Enforcing data access controls can quickly get unwieldy
   e.   Conveying policy information across multiple systems is hard



                                                                           5
The ever present challenges

•   Security
    •   Keeping the bad guys out
    •   Making sure the good guys are good and stay good
    •   Preventing mistakes
    •   Disposing of unnecessary/expired data
    •   Enforcing “least privilege” and “need to know” basis
•   Privacy
    • Statistics safer than aggregates
    • Aggregates safer than tokenized samples
    • Tokenized samples safer than individuals
    • Don’t underestimate the power of de-anonymization
    • Mistakes in privacy are irreversible
•   Security <> Privacy

                                                               6
Be wary of “tokenization in a box”

•   Tokenized dataset * Tokenized dataset ~= identifiable data

•   The problem of eliminating inference is NP-complete

•   Several examples in the last decade:

    •   The “Netflix case”

    •   The “Hospital discharge data case”

    •   The “AOL case”


                                                                 7
Common sense to the rescue
•   Track data provenance, permissible use and expiration
    through metadata (data labels and RCM)
•   Enforce fine granular access controls through code/data
    encapsulation (data access wrappers)
•   Utilize homogeneous platforms that allow for end-to-end
    policy preservation and enforcement
•   Deploy (and properly configure) Network and host based
    Data Loss Prevention
•   A comprehensive data governance process is king
•   Use proven controls (Homomorphic encryption and PIR
    are, so far, only theoretical concepts)


                                                              8
And always keep in mind that…


• Access to the hardware ~= Access to the data
   •   Encryption at rest only mitigates the risk for the isolated
       hard drive, but NOT if the decryption key goes with it
   •   Compromise of a running system is NOT mitigated by
       encryption of data at rest

• Virtualization may increase efficiency, but…
   •   In virtual environments, s/he who has access to your
       VMM/Hypervisor, also has access to your data
   •   Cross-VM side-channel attacks are not just theoretical



                                                                     9
Remember


•   Data exhibits its own form of quantum entanglement:
    once a copy is compromised, all other copies instantly are

•   The closer to the source the [filtering, grouping,
    tokenization] is applied, the lower the risk

•   YCLWYDH! You can’t lose what you don’t have (data
    destruction)




                                                                 10
Useful Links
 LexisNexis Risk Solutions: http://lexisnexis.com/risk
 LexisNexis Open Source HPCC Systems Platform: http://hpccsystems.com
 Cross-VM side-channel attacks:
  http://blog.cryptographyengineering.com/2012/10/attack-of-week-cross-
  vm-timing-attacks.html
 K-Anonymity: a model for protecting privacy:
  http://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity.p
  df
 Robust de-anonymization of Large Sparse Datasets (the Netflix case):
  http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
 Broken Promises of Privacy: Responding to the surprising failure of
  anonymization:
  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006
 Tamper detection and Relationship Context Metadata:
  http://blogs.gartner.com/ian-glazer/2011/08/19/follow-up-from-catalyst-
  2011-tamper-detection-and-relationship-context-metadata/

                                                          The HPCC Systems Platform   11
Questions?




             Email: info@hpccsystems.com


                                           The HPCC Systems Platform   12

Mais conteúdo relacionado

Mais procurados

a hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsa hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsswathi78
 
A Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud ComputingA Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud Computingvivatechijri
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationPrem Rao
 
How One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomHow One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomLaurent Henocque
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computingAlexander Decker
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Oruta privacy preserving public auditing
Oruta privacy preserving public auditingOruta privacy preserving public auditing
Oruta privacy preserving public auditingshakas007
 
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]Mahmuda Rahman
 
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
DDS-to-JSON and DDS Real-time Data Storage with MongoDBDDS-to-JSON and DDS Real-time Data Storage with MongoDB
DDS-to-JSON and DDS Real-time Data Storage with MongoDBAbdullah Ozturk
 
The “obsession” with checksums by Helen Hockx-Yu
The “obsession” with checksums by Helen Hockx-YuThe “obsession” with checksums by Helen Hockx-Yu
The “obsession” with checksums by Helen Hockx-YuCLOCKSS
 
Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)Abdullah Ozturk
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersIRJET Journal
 
Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingShantanu Sharma
 
Paul Stokes (Jisc) - A provocation about preservation
Paul Stokes (Jisc) - A provocation about preservationPaul Stokes (Jisc) - A provocation about preservation
Paul Stokes (Jisc) - A provocation about preservationCLOCKSS
 
Encryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloudEncryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloudprjpublications
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detectionijsrd.com
 

Mais procurados (20)

a hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsa hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplications
 
A Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud ComputingA Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud Computing
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized Deduplication
 
How One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomHow One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonom
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computing
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Oruta privacy preserving public auditing
Oruta privacy preserving public auditingOruta privacy preserving public auditing
Oruta privacy preserving public auditing
 
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
 
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
DDS-to-JSON and DDS Real-time Data Storage with MongoDBDDS-to-JSON and DDS Real-time Data Storage with MongoDB
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Thoughts on Cybersecurity
Thoughts on CybersecurityThoughts on Cybersecurity
Thoughts on Cybersecurity
 
The “obsession” with checksums by Helen Hockx-Yu
The “obsession” with checksums by Helen Hockx-YuThe “obsession” with checksums by Helen Hockx-Yu
The “obsession” with checksums by Helen Hockx-Yu
 
Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providers
 
C017421624
C017421624C017421624
C017421624
 
Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
 
Paul Stokes (Jisc) - A provocation about preservation
Paul Stokes (Jisc) - A provocation about preservationPaul Stokes (Jisc) - A provocation about preservation
Paul Stokes (Jisc) - A provocation about preservation
 
Encryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloudEncryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloud
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detection
 

Destaque

IBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM Analytics
 
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014kevintsmith
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data miningharithavijay94
 
Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCybera Inc.
 
走出IT人才荒 研討會
走出IT人才荒 研討會走出IT人才荒 研討會
走出IT人才荒 研討會Charles Mok
 
Data Privacy &amp; Security Update 2012
Data Privacy &amp; Security Update 2012Data Privacy &amp; Security Update 2012
Data Privacy &amp; Security Update 2012Jason Haislmaier
 
Privacy and Big Data Overload!
Privacy and Big Data Overload!Privacy and Big Data Overload!
Privacy and Big Data Overload!SparkPost
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposureredpel dot com
 
The Impact of Cloud: Cloud Computing Security and Privacy
The Impact of Cloud: Cloud Computing Security and PrivacyThe Impact of Cloud: Cloud Computing Security and Privacy
The Impact of Cloud: Cloud Computing Security and PrivacyCharles Mok
 
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Data Con LA
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Information Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data MiningInformation Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data Miningwanani181
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Paper presentation held at national seminar
Paper presentation held at national seminarPaper presentation held at national seminar
Paper presentation held at national seminarKrishna Kumar
 
Conference Powerpoint Presentations
Conference Powerpoint PresentationsConference Powerpoint Presentations
Conference Powerpoint Presentationsapdh1312
 
The Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingThe Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingAnkit Singh
 

Destaque (20)

IBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big data
 
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Time Of Courage
Time Of CourageTime Of Courage
Time Of Courage
 
Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
 
走出IT人才荒 研討會
走出IT人才荒 研討會走出IT人才荒 研討會
走出IT人才荒 研討會
 
Data Privacy &amp; Security Update 2012
Data Privacy &amp; Security Update 2012Data Privacy &amp; Security Update 2012
Data Privacy &amp; Security Update 2012
 
Privacy and Big Data Overload!
Privacy and Big Data Overload!Privacy and Big Data Overload!
Privacy and Big Data Overload!
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
The Impact of Cloud: Cloud Computing Security and Privacy
The Impact of Cloud: Cloud Computing Security and PrivacyThe Impact of Cloud: Cloud Computing Security and Privacy
The Impact of Cloud: Cloud Computing Security and Privacy
 
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Information Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data MiningInformation Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data Mining
 
Big data
Big dataBig data
Big data
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Paper presentation held at national seminar
Paper presentation held at national seminarPaper presentation held at national seminar
Paper presentation held at national seminar
 
Big data security
Big data securityBig data security
Big data security
 
Conference Powerpoint Presentations
Conference Powerpoint PresentationsConference Powerpoint Presentations
Conference Powerpoint Presentations
 
The Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingThe Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud Computing
 

Semelhante a Global bigdata conf_01282013

Data Analytics Governance and Ethics
Data Analytics Governance and EthicsData Analytics Governance and Ethics
Data Analytics Governance and EthicsHPCC Systems
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of databcantrill
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeDenodo
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big DataRaffael Marty
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...i_scienceEU
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Toward a Mobile Data Commons
Toward a Mobile Data CommonsToward a Mobile Data Commons
Toward a Mobile Data CommonskingsBSD
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Dr. Anita Goel
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceIsaac Christoffersen
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersDenodo
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementTim Mackey
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementBlack Duck by Synopsys
 
110307 cloud security requirements gourley
110307 cloud security requirements gourley110307 cloud security requirements gourley
110307 cloud security requirements gourleyGovCloud Network
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 

Semelhante a Global bigdata conf_01282013 (20)

Data Analytics Governance and Ethics
Data Analytics Governance and EthicsData Analytics Governance and Ethics
Data Analytics Governance and Ethics
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of data
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Toward a Mobile Data Commons
Toward a Mobile Data CommonsToward a Mobile Data Commons
Toward a Mobile Data Commons
 
Peer-to-peer Systems.ppt
Peer-to-peer Systems.pptPeer-to-peer Systems.ppt
Peer-to-peer Systems.ppt
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability Management
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability Management
 
110307 cloud security requirements gourley
110307 cloud security requirements gourley110307 cloud security requirements gourley
110307 cloud security requirements gourley
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Big Data
Big Data Big Data
Big Data
 

Mais de HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

Mais de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Último

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Último (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

Global bigdata conf_01282013

  • 1. Security and Privacy in a Big Data World Dr. Flavio Villanustre, CISSP, LexisNexis Risk Solutions VP of Information Security & Lead for the HPCC Systems open source initiative 28 January 2013
  • 2. But what is Big Data? • Gartner told us that it’s defined by its four dimensions: • Volume • Velocity • Variety • Complexity • Driven by the proliferation of social media, sensors, the Internet of Things and the such (a lot of the latter) • Became accessible thanks to open source distributed data intensive platforms (for example, the open source LexisNexis HPCC Systems platform and Hadoop) • Made popular by consumer oriented services such as recommendation systems and search engines 2
  • 3. Big Data platforms: key design principles • Distributed local store • Move algorithm to the data – exploit locality • Many data problems are embarrassingly parallel • Leverage massive resource aggregation: • Thousands of execution cores • Hundreds of disk controllers • Hundreds of network interfaces • Terabytes of memory and massive memory bandwidth • Storage is cheap • Moving data into the system takes time (hence keep the data around, if possible) • It’s fine (and encouraged) to perform iterative exploration • In the end: It’s just and all about the data 3
  • 4. A timeline of the main open source Big Data platforms LexisNexis designs the HPCC Systems Google’s platform to meet its MapReduce Paper First Hadoop These platforms gain own Big Data Needs. is Published. Summit mainstream adoption Late 90s/Early 2001 2004 2007 2008 2011 2012 2000s The first few systems are First Release of The LN HPCC Systems sold to companies and Hadoop (designed platform is officially organizations after Google’s Map released as an open Reduce ideas) source project 4
  • 5. Just when we thought that we knew data security… Big Data is not your dad’s data a. More data sources (beware! data + data > 2 * information) b. Boiling the ocean is at the reach of your hand c. Public clouds offer scalability but introduce risks d. Distributed data stores can blur boundaries e. Leveraging diverse skills implies more people accessing the data Old tricks may not work a. Tokenization only goes so far b. Encryption at rest just protects against misplaced hardware c. Tracking data provenance is hard d. Enforcing data access controls can quickly get unwieldy e. Conveying policy information across multiple systems is hard 5
  • 6. The ever present challenges • Security • Keeping the bad guys out • Making sure the good guys are good and stay good • Preventing mistakes • Disposing of unnecessary/expired data • Enforcing “least privilege” and “need to know” basis • Privacy • Statistics safer than aggregates • Aggregates safer than tokenized samples • Tokenized samples safer than individuals • Don’t underestimate the power of de-anonymization • Mistakes in privacy are irreversible • Security <> Privacy 6
  • 7. Be wary of “tokenization in a box” • Tokenized dataset * Tokenized dataset ~= identifiable data • The problem of eliminating inference is NP-complete • Several examples in the last decade: • The “Netflix case” • The “Hospital discharge data case” • The “AOL case” 7
  • 8. Common sense to the rescue • Track data provenance, permissible use and expiration through metadata (data labels and RCM) • Enforce fine granular access controls through code/data encapsulation (data access wrappers) • Utilize homogeneous platforms that allow for end-to-end policy preservation and enforcement • Deploy (and properly configure) Network and host based Data Loss Prevention • A comprehensive data governance process is king • Use proven controls (Homomorphic encryption and PIR are, so far, only theoretical concepts) 8
  • 9. And always keep in mind that… • Access to the hardware ~= Access to the data • Encryption at rest only mitigates the risk for the isolated hard drive, but NOT if the decryption key goes with it • Compromise of a running system is NOT mitigated by encryption of data at rest • Virtualization may increase efficiency, but… • In virtual environments, s/he who has access to your VMM/Hypervisor, also has access to your data • Cross-VM side-channel attacks are not just theoretical 9
  • 10. Remember • Data exhibits its own form of quantum entanglement: once a copy is compromised, all other copies instantly are • The closer to the source the [filtering, grouping, tokenization] is applied, the lower the risk • YCLWYDH! You can’t lose what you don’t have (data destruction) 10
  • 11. Useful Links  LexisNexis Risk Solutions: http://lexisnexis.com/risk  LexisNexis Open Source HPCC Systems Platform: http://hpccsystems.com  Cross-VM side-channel attacks: http://blog.cryptographyengineering.com/2012/10/attack-of-week-cross- vm-timing-attacks.html  K-Anonymity: a model for protecting privacy: http://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity.p df  Robust de-anonymization of Large Sparse Datasets (the Netflix case): http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf  Broken Promises of Privacy: Responding to the surprising failure of anonymization: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006  Tamper detection and Relationship Context Metadata: http://blogs.gartner.com/ian-glazer/2011/08/19/follow-up-from-catalyst- 2011-tamper-detection-and-relationship-context-metadata/ The HPCC Systems Platform 11
  • 12. Questions? Email: info@hpccsystems.com The HPCC Systems Platform 12