SlideShare uma empresa Scribd logo
1 de 37
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
• What’s MapR
• Why Secure Hadoop
• Securing MapR Hadoop
• Security beyond the core
© 2014 MapR Technologies 3
MapR Data Platform
Management
MapR Data Platform MAPR-DBMAPR-FS
APACHE HADOOP AND OSS ECOSYSTEM
Hue ...SharkImpalaDrill
Hive/
Stinger/
Tez
Sqoop
Storm SentrySparkSolrCascadingMahoutFlume
Oozie HBaseMapReduceYARNPigWhirrZookeeper
MapR Data Platform TABLESFILES MapR Data Platform MAPR-DBMAPR-FS
Patent
Pending
Enterprise-grade Security OperationalPerformance
• High availability
• Data protection
• Disaster recovery
• Standard file access
• Standard database
access
• Pluggable services
• Broad developer
support
• Enterprise security
authorization
• Wire-level
authentication
• Data governance
• Ability to support
predictive analytics,
real-time database
operations, and
support high arrival
rate data
• Ability to logically
divide a cluster to
support different
use cases, job
types, user groups,
and administrators
• Ability to deliver 2X
to 7X performance
• Consistent low
latency
Multi-tenancyInter-operability
MapR Distribution for Hadoop
© 2014 MapR Technologies 4
The Cloud Leaders Pick MapR
Google chose MapR to
provide Hadoop on Google
Compute Engine
Amazon EMR is the largest
Hadoop provider in revenue
and # of clusters
© 2014 MapR Technologies 5
Why Secure Hadoop Now?
• Historically security wasn’t a high priority
– Reflection of the type of data and the type of organizations using Hadoop
• Hadoop is now being used by more traditional firms as well as
organizations with high security requirements
– Highly regulated
– Sensitive data sets
– People with experience with security in existing enterprise technologies (e.g.,
databases) are asking for the same in Hadoop
• Think for a moment and imagine the value of the data in a Hadoop
cluster used as a data lake
– Much valuable operational data about your customers, systems, sales, etc.
© 2014 MapR Technologies 6
Typical Hadoop Deployment Weaknesses
• Client operating system is trusted to identify user (weak
authentication)
– If I can compromise client, I can run jobs or access HDFS as anyone
– Think about virtual machines with root access
• Hadoop servers trust anyone that can reach them on the network
– Could I falsify a data node, job tracker, etc.?
• Hive Server runs as ‘system’ user
– All Hive Server submitted jobs run as that ‘system’ user
• Intruders can see and modify all network traffic
© 2014 MapR Technologies 7
Agenda
• What’s MapR
• Why Secure Hadoop
• Securing MapR Hadoop
• Security beyond the core
© 2014 MapR Technologies 8
MapR 3.1: Securing MapR Hadoop
• Core goals
– Authenticate network traffic
• Users authenticate
• Servers authenticate to each other
– Encrypt network traffic
– Authorization
• Integrate with existing authorization functionality
• Enhance MapR Tables authorization with fine grained controls
– Low barrier to entry
• Low performance overhead
• Simple and easy to administer
• Support, but do not require Kerberos
– Leverage Apache Hadoop functionality
© 2014 MapR Technologies 9
MapR Native Security
• Hadoop security without Kerberos
– But borrows heavily from Kerberos design
• Kerberos integration if desired
© 2014 MapR Technologies 10
Architecture
• Shared secrets like Kerberos
– Managed at cluster level
– Two shared keys: cldb key and server key
• Identity represented using a ticket which is issued by MapR
CLDB servers (Container Location DataBase)
© 2014 MapR Technologies 11
Tickets
• A ticket represents a valid authenticated identity
• Contains
– An expiration time, renewal lifetime, and creation time
– A randomly generated secret key
– Information about the identity – userid, group ids
• Signed and encrypted when issued by CLDB
– CLDB key used for ‘permanent’ server tickets
– Server key used for ephemeral tickets issued for users
• A client authenticates to trusted servers using the ticket
© 2014 MapR Technologies 12
User Experience
• User invokes maprlogin
– maprlogin connects to CLDB (over https)
• Provide userid & password (or Kerberos ticket) for validation by CLDB
– Ticket is returned, saved in file in /tmp file and accessible only by owning user –
file name is /tmp/maprticket_<uid>
• MapR PAM module
– Optional MapR provided PAM module creates MapR tickets automatically
during Unix login
• All processes automatically pick up ticket (nothing to do)
– Java and C/C++ clients implicitly look for valid ticket and use it
– Clients optionally use existing Kerberos identity to get MapR ticket
© 2014 MapR Technologies 13
Maprlogin
• Primary user visible security tool
• Actions are
– password - authenticate to a MapR cluster using a valid password
– kerberos - authenticate to a MapR cluster using Kerberos
– print - print information on your existing credentials
– authtest - test authentication as a generic client
– end / logout - logout of cluster
– renew - renew existing ticket
• User information is obtained using PAM and Linux pwent APIs
– Fully pluggable
– MapR can authenticate using any registry that is PAM enabled and gets user
information via Unix APIs which are NSSwitch controlled
• Basically, if it works with Linux authentication, it should work with MapR
© 2014 MapR Technologies 14
CLI Example
$ hadoop fs -ls /
Bad connection to FS. command aborted. exception: failure to login: Unable
to obtain MapR credentials
$ maprlogin password
[Password for user 'fred' at cluster 'my.cluster.com': ]
MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to
‘/tmp/maprticket_1001'
$ hadoop fs -ls /
Found 3 items
-rwxr-xr-x 3 mapr mapr 0 2013-12-10 13:25 /hbase
drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /user
drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /var
© 2014 MapR Technologies 15
Maprlogin – Under the Covers
maprlogin MapR
CLDB
1. username/passwd
sent on https LDAP/
Kerberos
/NIS
2. uses PAM to
authenticate
3. ticket + user
key returned
FileServer/
CLDB
4. ticket + key saved in file in /tmp
hadoop fs –ls /
5. cmd picks up
ticket + key from
file
6. client sends RPC
encrypted with
user-key + ticket
7. server decrypts ticket to
authenticate user and
checks permissions on ACL
© 2014 MapR Technologies 16
Client First Contact
• Client sends the ticket and data encrypted using secret key
• Receiving server
– Extracts and decrypts ticket to obtain secret key
– Checks expiration
– Uses the secret key to decrypt the data
• This proves that the client possesses the key that corresponds to the ticket
– Extracts identity information from ticket and uses that for authorization
– Returns encrypted response to client
• MapR user identity is independent of host or operating system
identity
© 2014 MapR Technologies 17
Server First Contact
• When a trusted server starts it uses a local server ticket to
authenticate to the CLDB
– CLDB verifies the ticket’s authenticity using secret key
– CLDB returns the server key that is used to create and validate user
tickets
– The server is now a trusted member of the cluster
© 2014 MapR Technologies 18
Component Security
• Security between MapR unique components (CLDB, file server, etc.)
is handled via changes to the MapR RPC layer
• Apache components support pluggable security mechanisms –
typically SASL
– We are providing a new mechanism called ‘maprsasl’
– maprsasl secures communication following the same techniques as the MapR
RPC layer
• Existing authorization code simply leverages the securely
authenticated identity
– File access
– Job submission
– Queue ACLs
– And so on …
© 2014 MapR Technologies 19
Example: Job Tracker Integration
JT can create user tickets. TT copies ticket to private job directory on local disk.
taskcontroller copies it to user private local disk dir and tasks set
MAPR_TICKET_LOCATION to that place.
JobClient JobTracker TaskTracker
submit
job
(maprsasl)
schedule
job
(maprsasl)
File system
1. JC copies
job conf securely to FS 4. TT launches job using ticket identity
3. TT fetches
ticket
2. JT creates
user ticket
© 2014 MapR Technologies 20
Out of the Box Defaults
• User experience
– Users authenticate using maprlogin and passwords
– User ‘mapr’ is admin as always
• User must authenticate however, OS identity irrelevant
– Operating system identity (on or off cluster) no longer relevant to MapR
security
• Obviously root user and ‘mapr’ user can read/write /opt/mapr
• We’ve also tightened permissions for many directories under /opt/mapr
– Web UIs require authentication
– MapR CLIs require authentication
• hadoop fs/mfs/jar/job/etc
• maprcli
– Any user can submit jobs, but can only admin their own jobs
© 2014 MapR Technologies 21
Out of the Box Defaults
• Cluster operations
– All MapR servers authenticate to each other
• Most communication paths encrypted
– All nodes share common maprserverticket
• Nodes can only join cluster if they have maprserverticket
– Self-signed wildcard certificates created for HTTPS traffic
• ssl_keystore contains certificate and private key, ssl_truststore contains certificate
– We set JVM system property: javax.net.ssl.trustStore
• Used by Web UIs, MCS, and maprlogin to CLDB
• Uses hostname command to get DNS domain for cluster and put that into
certificate
© 2014 MapR Technologies 22
Cryptography
• Encrypted using current NIST standards
– AES-256 in GCM mode for encryption and signing
• http://en.wikipedia.org/wiki/Galois/Counter_Mode
• NIST standard - http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf
– Leverage Intel hardware encryption where available, software otherwise
• Use the open source crypto++ library for our C++ cryptography –
http://cryptopp.com
• Random number generation
– Use secure random number generation as documented here
http://www.cryptopp.com/docs/ref/class_auto_seeded_random_pool.ht
ml#_details
© 2014 MapR Technologies 23
Let’s Build a Secure Cluster!
Node 1
apt-get install mapr….
configure.sh –C … -Z … -secure –genkeys
– Generates all needed keys for MapR-RPC as well as for HTTPS
Node N
apt-get install mapr….
scp
rootORmapr@node1:/opt/mapr/conf/{cldb.key,maprserverticket,ssl_ke
ystore,ssl_truststore} /opt/mapr/conf
configure.sh –C … -Z … -secure
Clients
apt-get install mapr…
scp anyuser@nodeN:/opt/mapr/conf/ssl_truststore /opt/mapr/conf
configure.sh … -secure
© 2014 MapR Technologies 24
Kerberos
• Not required but can use
• Kerberos SSO
– Explicitly using ‘maprlogin kerberos’
– Implicitly
• If no MapR ticket available, client automatically detects and uses Kerberos ticket
and uses it to obtain MapR ticket
• Kerberos SSO requires only
– Kerberos client on CLDB and client machines
– Kerberos identity only for CLDB – typically 3-5 CLDBs
• No need to manage identities for every node
© 2014 MapR Technologies 25
Agenda
• What’s MapR
• Why Secure Hadoop
• Securing MapR Hadoop
• Security beyond the core
© 2014 MapR Technologies 26
Hadoop Map Reduce Clients
• Many components simply generate Map Reduce jobs. As such
they implicitly leverage the security we’ve defined for Map
Reduce previously. They are:
– Hive (except Hive Server)
– Pig
– Mahout
– Sqoop
© 2014 MapR Technologies 27
Ecosystem Security
• All ecosystem components run securely as well in a secure
MapR cluster
– Some by default
– Some with minor configuration
• Most Web UIs enhanced to use userid & password
authentication and HTTPS
– Can configure Kerberos SPNEGO, same as from Apache
© 2014 MapR Technologies 28
MapR Ecosystem Security – by Default
• By default, out of the box when security enabled
– Hive Server 2 supports password authentication
• Can configure Kerberos and SSL function, same as from Apache, including
secure impersonation
– Oozie supports MapR ticket authentication
• Can configure Kerberos and SSL function, same as from Apache, including
secure impersonation
• HBase and Hive MetaServer require Kerberos to be secured
• MapR Tables (HBase APIs) use native MapR security, no
configuration needed
© 2014 MapR Technologies 29
MapR Tables Authorization
• boolean logic constraints on access to M7 tables
– Uses user & group information
– Very powerful
• ( u:bob | g:admins)
• ( g:managers & ! g:restricted)
• ( g: managers & g:businessunity) | g:executives
– Settable at table, column, and column family level for various actions
– Queries silently hide data you are not authorized to see
© 2014 MapR Technologies 30
MapR Hadoop Advantage
• Vastly simpler
– Core secured by default in one step
– No requirement for Kerberos in core and associated complexity
• Easier integration
– Leverage existing Linux authentication (PAM and NSSwitch)
• Faster
– Leverage Intel AES hardware cryptography
© 2014 MapR Technologies 31
Further Reading
• MapR
– http://mapr.com
• MapR Native Security
– http://www.mapr.com/blog/getting-started-mapr-security-0
– http://www.mapr.com/press-release/mapr-technologies-integrates-security-
into-hadoop
– http://www.mapr.com/products/only-with-mapr/mapr-integrates-security-into-
hadoop
• Adding Security to Apache Hadoop
– http://hortonworks.com/wp-content/uploads/2011/10/security-
design_withCover-1.pdf
• The Evolution of Hadoop’s Security Model
– http://www.infoq.com/articles/HadoopSecurityModel/
© 2014 MapR Technologies 32
Q&A
@mapr maprtech
kbotzum@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
© 2014 MapR Technologies 33© 2014 MapR Technologies
Appendix
© 2014 MapR Technologies 34
Encrypted Shuffle (?)
• No need to special case encrypting shuffle
• MapR-FS is store for Map output
– Shuffle inherits the same encryption, authentication, and authorization
functionality of the rest of MapR-FS
© 2014 MapR Technologies 35
Persistent Keys and Tickets
CLDB/Z
K 1
K
CLDB/Z
K N
K
Node 1 Node 2 Node N
…
…
© 2014 MapR Technologies 36
Apache Hadoop Security
• Kerberos as core authentication technology
– Kerberos to access HDFS, JT, Oozie, etc.
– Kerberos for server to server traffic
• But Kerberos doesn’t fit perfectly with Hadoop model
– Introduce delegation tokens for carrying identity in many scenarios
• Kerberos is complicated
– Need Kerberos identity for every server in the cluster
• Lots to manage!
– Every user needs a Kerberos identity to access cluster, Web UIs, etc.
– Lots of steps
• http://www.cloudera.com/content/cloudera-content/cloudera-
docs/CDH4/4.3.0/CDH4-Security-Guide/cdh4sg_topic_3.html
© 2014 MapR Technologies 37
Key Design Elements
• User authentication and authorization information obtained using
standard operating system information – PAM and nsswitch
• MapR specific shared secret keys
– Easier to manage
– No dependencies on complex external security systems
– Better performance
• MapR servers (running as ‘mapr’) have access to maprserverticket
and are therefore privileged processes
• MapR-RPC altered to encrypt and authenticate traffic
• Maprsasl created for Apache Java code to leverage similar security
– Leverages same keys, authentication model, etc.
– Reuses the C/C++ code via JNI

Mais conteúdo relacionado

Mais procurados

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbaseiammutex
 
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisKnoldus Inc.
 
DVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsDVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsFrancesco Casalegno
 
Nightwatch JS for End to End Tests
Nightwatch JS for End to End TestsNightwatch JS for End to End Tests
Nightwatch JS for End to End TestsSriram Angajala
 
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on DemandLinux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on DemandLeostream
 
Big data-cheat-sheet
Big data-cheat-sheetBig data-cheat-sheet
Big data-cheat-sheetmasoodkhh
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBsadegh salehi
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
The Truth About the Service Mesh Data Plane
The Truth About the Service Mesh Data PlaneThe Truth About the Service Mesh Data Plane
The Truth About the Service Mesh Data PlaneChristian Posta
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
Redis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your applicationRedis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your applicationrjsmelo
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelDean Wampler
 

Mais procurados (20)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
DVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsDVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projects
 
Nightwatch JS for End to End Tests
Nightwatch JS for End to End TestsNightwatch JS for End to End Tests
Nightwatch JS for End to End Tests
 
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on DemandLinux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
 
Intro to Knative
Intro to KnativeIntro to Knative
Intro to Knative
 
Big data-cheat-sheet
Big data-cheat-sheetBig data-cheat-sheet
Big data-cheat-sheet
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DB
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Hadoop
HadoopHadoop
Hadoop
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
The Truth About the Service Mesh Data Plane
The Truth About the Service Mesh Data PlaneThe Truth About the Service Mesh Data Plane
The Truth About the Service Mesh Data Plane
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
Redis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your applicationRedis & ZeroMQ: How to scale your application
Redis & ZeroMQ: How to scale your application
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 

Semelhante a Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum

Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumSecuring Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumMapR Technologies
 
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesMapR Technologies
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayHadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayCloudera, Inc.
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerClocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerAndrew Kennedy
 
Clocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and PlacementClocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and PlacementDocker, Inc.
 
Dapr- Distributed Application Runtime
Dapr- Distributed Application RuntimeDapr- Distributed Application Runtime
Dapr- Distributed Application RuntimeMoaid Hathot
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPALDAPCon
 
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopBSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopAjay Choudhary
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 

Semelhante a Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum (20)

Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumSecuring Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys Botzum
 
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR Technologies
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayHadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerClocker - The Docker Cloud Maker
Clocker - The Docker Cloud Maker
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
20a installation
20a installation20a installation
20a installation
 
Clocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and PlacementClocker: Managing Container Networking and Placement
Clocker: Managing Container Networking and Placement
 
Dapr- Distributed Application Runtime
Dapr- Distributed Application RuntimeDapr- Distributed Application Runtime
Dapr- Distributed Application Runtime
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
 
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopBSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming Workshop
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 

Mais de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mais de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Agenda • What’s MapR • Why Secure Hadoop • Securing MapR Hadoop • Security beyond the core
  • 3. © 2014 MapR Technologies 3 MapR Data Platform Management MapR Data Platform MAPR-DBMAPR-FS APACHE HADOOP AND OSS ECOSYSTEM Hue ...SharkImpalaDrill Hive/ Stinger/ Tez Sqoop Storm SentrySparkSolrCascadingMahoutFlume Oozie HBaseMapReduceYARNPigWhirrZookeeper MapR Data Platform TABLESFILES MapR Data Platform MAPR-DBMAPR-FS Patent Pending Enterprise-grade Security OperationalPerformance • High availability • Data protection • Disaster recovery • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • Ability to deliver 2X to 7X performance • Consistent low latency Multi-tenancyInter-operability MapR Distribution for Hadoop
  • 4. © 2014 MapR Technologies 4 The Cloud Leaders Pick MapR Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters
  • 5. © 2014 MapR Technologies 5 Why Secure Hadoop Now? • Historically security wasn’t a high priority – Reflection of the type of data and the type of organizations using Hadoop • Hadoop is now being used by more traditional firms as well as organizations with high security requirements – Highly regulated – Sensitive data sets – People with experience with security in existing enterprise technologies (e.g., databases) are asking for the same in Hadoop • Think for a moment and imagine the value of the data in a Hadoop cluster used as a data lake – Much valuable operational data about your customers, systems, sales, etc.
  • 6. © 2014 MapR Technologies 6 Typical Hadoop Deployment Weaknesses • Client operating system is trusted to identify user (weak authentication) – If I can compromise client, I can run jobs or access HDFS as anyone – Think about virtual machines with root access • Hadoop servers trust anyone that can reach them on the network – Could I falsify a data node, job tracker, etc.? • Hive Server runs as ‘system’ user – All Hive Server submitted jobs run as that ‘system’ user • Intruders can see and modify all network traffic
  • 7. © 2014 MapR Technologies 7 Agenda • What’s MapR • Why Secure Hadoop • Securing MapR Hadoop • Security beyond the core
  • 8. © 2014 MapR Technologies 8 MapR 3.1: Securing MapR Hadoop • Core goals – Authenticate network traffic • Users authenticate • Servers authenticate to each other – Encrypt network traffic – Authorization • Integrate with existing authorization functionality • Enhance MapR Tables authorization with fine grained controls – Low barrier to entry • Low performance overhead • Simple and easy to administer • Support, but do not require Kerberos – Leverage Apache Hadoop functionality
  • 9. © 2014 MapR Technologies 9 MapR Native Security • Hadoop security without Kerberos – But borrows heavily from Kerberos design • Kerberos integration if desired
  • 10. © 2014 MapR Technologies 10 Architecture • Shared secrets like Kerberos – Managed at cluster level – Two shared keys: cldb key and server key • Identity represented using a ticket which is issued by MapR CLDB servers (Container Location DataBase)
  • 11. © 2014 MapR Technologies 11 Tickets • A ticket represents a valid authenticated identity • Contains – An expiration time, renewal lifetime, and creation time – A randomly generated secret key – Information about the identity – userid, group ids • Signed and encrypted when issued by CLDB – CLDB key used for ‘permanent’ server tickets – Server key used for ephemeral tickets issued for users • A client authenticates to trusted servers using the ticket
  • 12. © 2014 MapR Technologies 12 User Experience • User invokes maprlogin – maprlogin connects to CLDB (over https) • Provide userid & password (or Kerberos ticket) for validation by CLDB – Ticket is returned, saved in file in /tmp file and accessible only by owning user – file name is /tmp/maprticket_<uid> • MapR PAM module – Optional MapR provided PAM module creates MapR tickets automatically during Unix login • All processes automatically pick up ticket (nothing to do) – Java and C/C++ clients implicitly look for valid ticket and use it – Clients optionally use existing Kerberos identity to get MapR ticket
  • 13. © 2014 MapR Technologies 13 Maprlogin • Primary user visible security tool • Actions are – password - authenticate to a MapR cluster using a valid password – kerberos - authenticate to a MapR cluster using Kerberos – print - print information on your existing credentials – authtest - test authentication as a generic client – end / logout - logout of cluster – renew - renew existing ticket • User information is obtained using PAM and Linux pwent APIs – Fully pluggable – MapR can authenticate using any registry that is PAM enabled and gets user information via Unix APIs which are NSSwitch controlled • Basically, if it works with Linux authentication, it should work with MapR
  • 14. © 2014 MapR Technologies 14 CLI Example $ hadoop fs -ls / Bad connection to FS. command aborted. exception: failure to login: Unable to obtain MapR credentials $ maprlogin password [Password for user 'fred' at cluster 'my.cluster.com': ] MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to ‘/tmp/maprticket_1001' $ hadoop fs -ls / Found 3 items -rwxr-xr-x 3 mapr mapr 0 2013-12-10 13:25 /hbase drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /user drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /var
  • 15. © 2014 MapR Technologies 15 Maprlogin – Under the Covers maprlogin MapR CLDB 1. username/passwd sent on https LDAP/ Kerberos /NIS 2. uses PAM to authenticate 3. ticket + user key returned FileServer/ CLDB 4. ticket + key saved in file in /tmp hadoop fs –ls / 5. cmd picks up ticket + key from file 6. client sends RPC encrypted with user-key + ticket 7. server decrypts ticket to authenticate user and checks permissions on ACL
  • 16. © 2014 MapR Technologies 16 Client First Contact • Client sends the ticket and data encrypted using secret key • Receiving server – Extracts and decrypts ticket to obtain secret key – Checks expiration – Uses the secret key to decrypt the data • This proves that the client possesses the key that corresponds to the ticket – Extracts identity information from ticket and uses that for authorization – Returns encrypted response to client • MapR user identity is independent of host or operating system identity
  • 17. © 2014 MapR Technologies 17 Server First Contact • When a trusted server starts it uses a local server ticket to authenticate to the CLDB – CLDB verifies the ticket’s authenticity using secret key – CLDB returns the server key that is used to create and validate user tickets – The server is now a trusted member of the cluster
  • 18. © 2014 MapR Technologies 18 Component Security • Security between MapR unique components (CLDB, file server, etc.) is handled via changes to the MapR RPC layer • Apache components support pluggable security mechanisms – typically SASL – We are providing a new mechanism called ‘maprsasl’ – maprsasl secures communication following the same techniques as the MapR RPC layer • Existing authorization code simply leverages the securely authenticated identity – File access – Job submission – Queue ACLs – And so on …
  • 19. © 2014 MapR Technologies 19 Example: Job Tracker Integration JT can create user tickets. TT copies ticket to private job directory on local disk. taskcontroller copies it to user private local disk dir and tasks set MAPR_TICKET_LOCATION to that place. JobClient JobTracker TaskTracker submit job (maprsasl) schedule job (maprsasl) File system 1. JC copies job conf securely to FS 4. TT launches job using ticket identity 3. TT fetches ticket 2. JT creates user ticket
  • 20. © 2014 MapR Technologies 20 Out of the Box Defaults • User experience – Users authenticate using maprlogin and passwords – User ‘mapr’ is admin as always • User must authenticate however, OS identity irrelevant – Operating system identity (on or off cluster) no longer relevant to MapR security • Obviously root user and ‘mapr’ user can read/write /opt/mapr • We’ve also tightened permissions for many directories under /opt/mapr – Web UIs require authentication – MapR CLIs require authentication • hadoop fs/mfs/jar/job/etc • maprcli – Any user can submit jobs, but can only admin their own jobs
  • 21. © 2014 MapR Technologies 21 Out of the Box Defaults • Cluster operations – All MapR servers authenticate to each other • Most communication paths encrypted – All nodes share common maprserverticket • Nodes can only join cluster if they have maprserverticket – Self-signed wildcard certificates created for HTTPS traffic • ssl_keystore contains certificate and private key, ssl_truststore contains certificate – We set JVM system property: javax.net.ssl.trustStore • Used by Web UIs, MCS, and maprlogin to CLDB • Uses hostname command to get DNS domain for cluster and put that into certificate
  • 22. © 2014 MapR Technologies 22 Cryptography • Encrypted using current NIST standards – AES-256 in GCM mode for encryption and signing • http://en.wikipedia.org/wiki/Galois/Counter_Mode • NIST standard - http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf – Leverage Intel hardware encryption where available, software otherwise • Use the open source crypto++ library for our C++ cryptography – http://cryptopp.com • Random number generation – Use secure random number generation as documented here http://www.cryptopp.com/docs/ref/class_auto_seeded_random_pool.ht ml#_details
  • 23. © 2014 MapR Technologies 23 Let’s Build a Secure Cluster! Node 1 apt-get install mapr…. configure.sh –C … -Z … -secure –genkeys – Generates all needed keys for MapR-RPC as well as for HTTPS Node N apt-get install mapr…. scp rootORmapr@node1:/opt/mapr/conf/{cldb.key,maprserverticket,ssl_ke ystore,ssl_truststore} /opt/mapr/conf configure.sh –C … -Z … -secure Clients apt-get install mapr… scp anyuser@nodeN:/opt/mapr/conf/ssl_truststore /opt/mapr/conf configure.sh … -secure
  • 24. © 2014 MapR Technologies 24 Kerberos • Not required but can use • Kerberos SSO – Explicitly using ‘maprlogin kerberos’ – Implicitly • If no MapR ticket available, client automatically detects and uses Kerberos ticket and uses it to obtain MapR ticket • Kerberos SSO requires only – Kerberos client on CLDB and client machines – Kerberos identity only for CLDB – typically 3-5 CLDBs • No need to manage identities for every node
  • 25. © 2014 MapR Technologies 25 Agenda • What’s MapR • Why Secure Hadoop • Securing MapR Hadoop • Security beyond the core
  • 26. © 2014 MapR Technologies 26 Hadoop Map Reduce Clients • Many components simply generate Map Reduce jobs. As such they implicitly leverage the security we’ve defined for Map Reduce previously. They are: – Hive (except Hive Server) – Pig – Mahout – Sqoop
  • 27. © 2014 MapR Technologies 27 Ecosystem Security • All ecosystem components run securely as well in a secure MapR cluster – Some by default – Some with minor configuration • Most Web UIs enhanced to use userid & password authentication and HTTPS – Can configure Kerberos SPNEGO, same as from Apache
  • 28. © 2014 MapR Technologies 28 MapR Ecosystem Security – by Default • By default, out of the box when security enabled – Hive Server 2 supports password authentication • Can configure Kerberos and SSL function, same as from Apache, including secure impersonation – Oozie supports MapR ticket authentication • Can configure Kerberos and SSL function, same as from Apache, including secure impersonation • HBase and Hive MetaServer require Kerberos to be secured • MapR Tables (HBase APIs) use native MapR security, no configuration needed
  • 29. © 2014 MapR Technologies 29 MapR Tables Authorization • boolean logic constraints on access to M7 tables – Uses user & group information – Very powerful • ( u:bob | g:admins) • ( g:managers & ! g:restricted) • ( g: managers & g:businessunity) | g:executives – Settable at table, column, and column family level for various actions – Queries silently hide data you are not authorized to see
  • 30. © 2014 MapR Technologies 30 MapR Hadoop Advantage • Vastly simpler – Core secured by default in one step – No requirement for Kerberos in core and associated complexity • Easier integration – Leverage existing Linux authentication (PAM and NSSwitch) • Faster – Leverage Intel AES hardware cryptography
  • 31. © 2014 MapR Technologies 31 Further Reading • MapR – http://mapr.com • MapR Native Security – http://www.mapr.com/blog/getting-started-mapr-security-0 – http://www.mapr.com/press-release/mapr-technologies-integrates-security- into-hadoop – http://www.mapr.com/products/only-with-mapr/mapr-integrates-security-into- hadoop • Adding Security to Apache Hadoop – http://hortonworks.com/wp-content/uploads/2011/10/security- design_withCover-1.pdf • The Evolution of Hadoop’s Security Model – http://www.infoq.com/articles/HadoopSecurityModel/
  • 32. © 2014 MapR Technologies 32 Q&A @mapr maprtech kbotzum@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 33. © 2014 MapR Technologies 33© 2014 MapR Technologies Appendix
  • 34. © 2014 MapR Technologies 34 Encrypted Shuffle (?) • No need to special case encrypting shuffle • MapR-FS is store for Map output – Shuffle inherits the same encryption, authentication, and authorization functionality of the rest of MapR-FS
  • 35. © 2014 MapR Technologies 35 Persistent Keys and Tickets CLDB/Z K 1 K CLDB/Z K N K Node 1 Node 2 Node N … …
  • 36. © 2014 MapR Technologies 36 Apache Hadoop Security • Kerberos as core authentication technology – Kerberos to access HDFS, JT, Oozie, etc. – Kerberos for server to server traffic • But Kerberos doesn’t fit perfectly with Hadoop model – Introduce delegation tokens for carrying identity in many scenarios • Kerberos is complicated – Need Kerberos identity for every server in the cluster • Lots to manage! – Every user needs a Kerberos identity to access cluster, Web UIs, etc. – Lots of steps • http://www.cloudera.com/content/cloudera-content/cloudera- docs/CDH4/4.3.0/CDH4-Security-Guide/cdh4sg_topic_3.html
  • 37. © 2014 MapR Technologies 37 Key Design Elements • User authentication and authorization information obtained using standard operating system information – PAM and nsswitch • MapR specific shared secret keys – Easier to manage – No dependencies on complex external security systems – Better performance • MapR servers (running as ‘mapr’) have access to maprserverticket and are therefore privileged processes • MapR-RPC altered to encrypt and authenticate traffic • Maprsasl created for Apache Java code to leverage similar security – Leverages same keys, authentication model, etc. – Reuses the C/C++ code via JNI

Notas do Editor

  1. That foundation is what we built with MapR!We provide a complete distribution for Apache Hadoop which gives you all the latest Apache and open source projects, but delivers it on a patent-pending data platform which provides the TRULY best enterprise-grade foundation for Hadoop and big data applications.We are also conitributors to the community through Drill, Mahout, and other projecst(Read off list of capabilities and features which resonate best for your audience)End-to-end HADR with mirroringData protection, consistent snapshots__________________MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates combined with a dozen open source packages. Any of the innovations MapR has delivered include 100% compatibility with the Apache Hadoop APIs. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.
  2. MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
  3. In initial release, server key and cldb key never changes. Server ticket also shared by all servers and does not expire.