2. +
About This Talk…
n Some practical tips & design patterns on
building secure applications using HBase
and Accumulo
n A quick demo (fingers crossed!)
n Audience : technical
3. +
Who Invited This Guy?
n HI, I am Sujee Maniyam
n Founder / Principal @ Elephant Scale
Consulting & Training in Big Data, NoSQL
n Co-Author of open source Hadoop book:
http://hadoopilluminated.com
n Founder / Organizer of ‘Big Data Guru’ meetup
http://www.meetup.com/BigDataGurus/
n Open source : http://github.com/sujee
n http://sujee.net |
http://www.linkedin.com/in/sujeemaniyam
5. +
HBase : Quick Intro
n Modeled after Google Big Table
n Distributed, Nosql store built on Hadoop / HDFS
n Apache project
n http://hbase.apache.org/
HDFS
HBase
6. +
Accumulo : Quick Intro
n Developed by the National Security Agency (NSA) !
n Google Big Table implementation
n Nosql store on top of HDFS
n Security is a first grade concept
HDFS
Accumulo
7. +
HBase & Accumulo
n Both are Big Table implementation
n Based on HDFS
n Written in Java
n Apache open source projects
HDFS
HBase Accumulo
9. +
But Security Picture Has Improved
Rapidly…
n Lot of work going on in the eco system
n Hadoop vendors (Cloudera / HortonWorks ..) have been
very actively working on security features
n ‘the core’ features are in
n Ease of use improving as well
11. +
What Does It Mean to be ‘Secure’?
n 1) Control who can get in?
n 2) Verify the person’s identity
n 3) safeguard communications with user
n 4) What is allowed for this user
n 5) And finally…
n Protect data at rest
12. +
1) Who can get in
n Control which machines can connect to NoSQL cluster
n Don’t expose the cluster to public
n Too many open ports
n Too vulnerable
n Solutions:
n Run cluster behind firewall
n Restrict which machines
can connect to cluster
n Linux / Network level security
n Outside the actual NoSQL
14. +
2) User Authentication
n Wolf: Knock… Knock…
n Pig :Who is there?
n Wolf : It is me… little pig
n How can we verify the user?
n Username / password (gmail)
n Or use a third person (referee)
n Kerberos
Source : http://1.bp.blogspot.com/
15. +
Kerberos : Quick Primer
n Kerberos is a authentication protocol for networked
machines
n Validates client to server and vice-versa
n Strong crypto algorithms (AES, 3DES…)
17. +
Kerberos Protocol Explained :
Getting Beer @ Fair / Party
n Prove your age (identity) to wrist-band issuer
n Ticket Granting Ticket
n Get a wristband à qualifies you to get beer
n Service Ticket
n Go to bartender and ask for beer using your wrist-band
n Service Request
n Get Beer ! J
n For technically correct explanation see :
http://www.roguelynn.com/words/explain-like-im-5-
kerberos/
19. +
3) Secure Client Communication
n Guard client / server communication (‘on the wire’)
n Done by using SASL (certificates)
n Prevents snooping by third parties
Hbase Accumulo
Secure client
communications
Yes Yes
20. +
4) What Is Allowed For This User?
n In unsecured environment users can read / write to any table
n à not very secure!
n Control which data users can see..
21. +
Quick Primer on HBase Storage
n Tables have many rows
n Row has multiple columns (or qualifiers)
n They are grouped into column families
n Each cell also has a timestamp
(not shown here)
info secure
Customer_id name email phone Last 4
social
Full ssn
Family1
Cell
Family2
22. +
HBase Allows Access Control At
Family Level
info secure
Customer_id name email phone Last 4
social
Full ssn
First level CSR can
Only access this family
Only supervisors can
access this family
23. +
Need More Fine Grained Access
n We like to provide ‘cell level’ access controls
n Greater flexibility in application development
n More fine grained access controls
n Meet Accumulo’s Data Model
24. +
Accumulo Data Model
Family : info
Columns à name email Last 4 ssn Ssn Gmail
password
Visibility
tokens à
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
• Every thing in HBase data model
• Plus each row has a ‘Visibility Token’
25. +
Users Are Assigned ‘Visibility
Tokens’
User id Visibility levels
User 1 Level 1
User 2 Level 1 + Level 2
Edward Snowden Level 1 + Level 2 + Top
Clearance
26. +
Accumulo only returns cells visible
to user
family
Columns à name email Last 4 SSN Full SSN Gmail
password
person1 Joe joe@gma
il.com
6789 123-45-67
89
JoeSuper
Man!
Visibility
tokens à
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
27. +
What Users Can See…
User Visibility Privilage Visible Cells
User 1 Level 1 Name
Email
Last 4 ssn
User 2 Level 1 +
Level 2
Name
Email
Last 4 SSN
Full SSN
Edward Snowden Level 1 +
Level 2 +
Top Clearance
Name
Email
Last 4 SSN
Full SSN
Gmail Password
28. +
Good News For HBase
n With release 0.98 Hbase also allows cell based access
controls
n Called ‘tags’
n Need to upgrade to Hfile V3 (version 3) format
29. +
Visibility / Access Controls
n Both HBase and Accumulo allow access control for the data
Hbase Accumulo
Cell Level Visibility Yes
(Starting
with v 0.98)
Yes
30. +
5) Final Step : Encrypt Data At Rest
n Eventually data ends up in disk
n We need to protect the ‘raw data’ on disk
n To prevent
n Users going to disk directly
n Theft of hardware
31. +
Solution : Encrypt Data
Transparently
n Encryption is done via keys
n Uses Java Cryptography Extension (JCE)
n Data is encrypted before writing to HDFS
n Does not rely on HDFS or Linux level encryption
n Per family encryption is supported
Hbase Accumulo
Encryption At Rest Yes Yes
33. +
Encryption : Key Management
n The keys have to managed carefully…
n Don’t loose them !
n Don’t compromise them !!
n Possible storage mechanisms
n Database
n Remote file server
n Key management server
n Local file system
34. +
Summary
HBase Accumulo
Runs in a trusted environment Yes
(outside
configuration)
Yes
(outside
configuration)
User Authentication Kerberos Kerberos +
Built-in
Secure client communications
(via SSL)
Yes Yes
Visibility at cell level Yes (starting from
v0.98)
Yes
Encrypt data at rest Yes Yes
37. +
Demo Explained
Name email ssn Gmail_pas
sword
Person1 Joe Smith joe@gmail.
com
123-45-6789 ‘JoeDaMan!’
Visibility
Level
Level 1 Level 1 Level 2 Top
Demonstrate cell level visibility feature of accumulo
Here is how the data looks like: