SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Secure Solr With Apache Sentry
Gregory Chanan, Engineer @ Cloudera
gchanan AT cloudera.com
Who Am I?
•  Software Engineer at Cloudera
•  Apache Solr Committer
•  Apache Sentry Committer (incubating)
•  Apache HBase Committer
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Why Security?
•  Apache Solr only provides minimal security features
“Solr	
  allows	
  any	
  client	
  with	
  access	
  to	
  it	
  to	
  add,	
  update,	
  and	
  delete	
  documents	
  	
  
(and	
  of	
  course	
  search/read	
  too),	
  including	
  access	
  to	
  the	
  Solr	
  configura<on	
  and	
  
schema	
  files	
  and	
  the	
  administra<ve	
  user	
  interface.”[1]	
  
	
  
•  In the past, deployed as a single server
“It	
  is	
  strongly	
  recommended	
  that	
  the	
  applica<on	
  server	
  containing	
  Solr	
  be	
  firewalled	
  such	
  
the	
  only	
  clients	
  with	
  access	
  to	
  Solr	
  are	
  your	
  own.”	
  [1]	
  
Why Security?
•  SolrCloud driving adoption in Big Data space
•  Now, a component of a multi-tenant Hadoop cluster
•  Non-­‐solr	
  users	
  on	
  cluster	
  
•  Solr	
  communicates	
  across	
  machines	
  and	
  services	
  
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Why Apache Sentry?
•  Sentry already established in Hadoop ecosystem
•  Has	
  understood	
  authen<ca<on	
  model	
  (kerberos)	
  
•  Has	
  understood	
  privilege/ac<on	
  model	
  
•  Security-focused project
•  Solr	
  focus	
  on	
  Search	
  Engine	
  
•  Sentry	
  focus	
  on	
  Security	
  
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Authentication
•  Authentication: Verifying identity of a user or service
•  Solr supports authenticating with dependent services (i.e. HDFS
and ZooKeeper*)
•  Sentry goal: support other services / users authenticating with
Solr
•  Consistent with other HTTP-level Hadoop services (e.g. Oozie
and HttpFs), Apache Sentry uses:
•  Kerberos: a mutual authentication protocol that works on the
basis of “tickets”
•  SPNego: a negotiation mechanism for selecting an underlying
authentication protocol
SPNego advantages
•  HTTP Tools have built-in support for SPNego/Kerberos
•  Web browsers
•  curl (with --negotiate)
•  HTTP libraries, including Apache HttpClient (used by solrj)
•  Although an authentication (not authorization) protocol, can be
used for cluster-level access control
•  Only grant kerberos credentials to users who should have access to the cluster
Authentication Setup
•  Server side: use Sentry-provided web.xml which has a kerberos/
SPNego aware filter
•  Have	
  to	
  setup	
  keytabs/principals/JAAS	
  configura<ons	
  
	
  
•  Client side: Sentry provides HttpClient / HttpSolrServer
configuration for communicating with kerberos/SPNego aware
Solr servers
•  Have	
  to	
  setup	
  keytabs/principals/JAAS	
  configura<ons	
  
•  Cloudera Manager can do setup for you
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Authorization
•  Authorization: Controlling access to resources
•  Solr does not provide collection/document authorization support
•  Does support “hooks” via solr.xml and solrconfig.xml to override
request handler implementation
•  Sentry uses these “hooks” to implement collection and document level
authorization
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Collection-level Authorization
•  Sentry supports role-based granting of privileges
•  each	
  role	
  can	
  be	
  granted	
  QUERY,	
  UPDATE,	
  and/or	
  administra<ve	
  privileges	
  
on	
  an	
  collec<on	
  
•  Privileges stored in a “policy file” on HDFS:
[groups]	
  
#	
  Assigns	
  each	
  Hadoop	
  group	
  to	
  its	
  set	
  of	
  roles	
  
dev_ops	
  =	
  engineer_role,	
  ops_role	
  
[roles]	
  
#	
  Assigns	
  each	
  role	
  to	
  its	
  set	
  of	
  privileges	
  
engineer_role	
  =	
  collec<on	
  =	
  source_code-­‐>ac<on=Query,	
  
	
  	
  collec<on	
  =	
  source_code	
  -­‐>	
  ac<on=Update	
  
ops_role	
  =	
  collec<on	
  =	
  hbase_logs	
  -­‐>	
  ac<on=Query	
  
Integrating Sentry and Solr
•  Sentry integrated via “hooks” in request handlers:
•  Specified per collection in solrconfig.xml:
•  Sentry ships with its own version of solrconfig.xml with secure handlers,
called solrconfig.xml.secure
Administrative requests
•  That covers queries/updates of collections, but what about administrative
actions such as getting the status of the cores?
•  In SolrCloud, admin looks like a collection:
http://localhost:8983/solr/admin/cores?action=STATUS
•  Can just follow this structure in Sentry:
sample_role	
  =	
  collec<on	
  =	
  admin-­‐>ac<on=Query,	
  
•  Secure Admin Handlers controlled via cluster-wide “solr.xml” in
ZooKeeper. By default, you get Secure Admin Handlers if Sentry is
enabled
Administrative requests
•  Full privilege model documented here
•  Examples (colllection1 = arbitrary collection name):
Ac-on	
   Required	
  Privilege	
   Collec-on	
  
select	
   QUERY	
   collec<on1	
  
update/json	
   UPDATE	
   collec<on1	
  
ThreadDumpHandler	
   QUERY	
   admin	
  
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Document-level authorization motivation
•  Collection-level authorization useful when access control requirements
for documents are homogeneous
•  Security requirements may require restricting access to a subset of
documents
•  Consider “Confidential” and “Secret” documents. How to store with only
collection-level authorization?
•  Pushes complexity to application
Document-level authorization model
•  Instead of Policy File in HDFS:
[groups]	
  
#	
  Assigns	
  each	
  Hadoop	
  group	
  to	
  its	
  set	
  of	
  roles	
  
dev_ops	
  =	
  engineer_role,	
  ops_role	
  
[roles]	
  
#	
  Assigns	
  each	
  role	
  to	
  its	
  set	
  of	
  privileges	
  
engineer_role	
  =	
  collec<on	
  =	
  source_code-­‐>ac<on=Query,	
  
	
  	
  collec<on	
  =	
  source_code-­‐>ac<on=Update	
  
ops_role	
  =	
  collec<on	
  =	
  hbase_logs-­‐>ac<on=Query	
  
•  Store authorization tokens in each document
•  Many	
  more	
  documents	
  than	
  collec<ons;	
  doesn’t	
  scale	
  to	
  store	
  document-­‐
level	
  info	
  in	
  Policy	
  File	
  
•  Can	
  use	
  Solr’s	
  built-­‐in	
  filtering	
  capabili<es	
  to	
  restrict	
  access	
  
Document-level authorization model
•  A configurable field stores the authorization tokens
•  The authorization tokens are Sentry roles, i.e. “ops_role”
	
  [roles]	
  
	
  ops_role	
  =	
  collec<on	
  =	
  hbase_logs-­‐>ac<on=Query	
  
•  Represents the roles that are allowed to view the document. To
view a document, the querying user must belong to at least one
role whose token is stored in the token field
•  Can modify document permissions without restarting Solr
•  Can modify role memberships without reindexing
Document-level authorization impl
•  Intercepts the request via a SearchComponent
•  SearchComponent adds an “fq” or FilterQuery
•  Filter	
  out	
  all	
  documents	
  that	
  don’t	
  have	
  “role1”	
  or	
  “role2”	
  in	
  authField	
  
•  Filters are cached, so only construction expense once
•  Note: does not supersede collection-level authorization
Document-level authorization config
•  Configuration via solrconfig.xml.secure (per collection):
	
  	
  	
  <!-­‐-­‐	
  Set	
  to	
  true	
  to	
  enabled	
  document-­‐level	
  authoriza<on	
  -­‐-­‐>	
  
	
  	
  	
  <bool	
  name="enabled">false</bool>	
  
	
  	
  	
  <!-­‐-­‐	
  Field	
  where	
  the	
  auth	
  tokens	
  are	
  stored	
  in	
  the	
  document	
  -­‐-­‐>	
  
	
  	
  	
  <str	
  name="sentryAuthField">sentry_auth</str>	
  
	
  	
  	
  <!-­‐-­‐	
  Auth	
  token	
  defined	
  to	
  allow	
  any	
  role	
  to	
  access	
  the	
  	
  document.	
  	
  
	
  	
  	
  	
  	
  Uncomment	
  to	
  enable.	
  -­‐-­‐>	
  	
  
	
  	
  <!-­‐-­‐<str	
  name="allRolesToken">*</str>-­‐-­‐>	
  
•  No tokens = no access. To allow all users to access a document,
use the allRolesToken. Useful for getting started
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Secure Impersonation
•  But wait! My users don’t interact with Solr directly
•  Custom web UI, load balancer, etc.
•  Authorization won’t work!
•  “user” is forgotten, request to Solr from “UI”	
  
Secure Impersonation
•  Secure impersonation: the ability of a “super-user” to submit
requests on behalf of another user
•  Conceptually	
  similar	
  to	
  “sudo”	
  on	
  Unix	
  
•  Limited	
  to	
  only	
  groups/hosts	
  that	
  are	
  explicitly	
  configured	
  to	
  support	
  it	
  
•  Iden<cal	
  to	
  func<onality	
  provided	
  by	
  HDFS,	
  Oozie	
  
	
  
Hue Search App UI
•  Uses Secure Impersonation to integrate with its own security mechanisms
•  Users	
  can	
  login	
  to	
  Hue	
  via	
  LDAP	
  or	
  other	
  auth	
  mechanism	
  
•  Hue	
  makes	
  requests	
  on	
  behalf	
  of	
  logged	
  in	
  user	
  
•  Only	
  Hue	
  user	
  requires	
  kerberos	
  keytab	
  
•  Seamlessly integrates with the collection and document-level access control
mechanisms
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Performance Testing
•  Goal is to measure overhead of:
•  Kerberos Authentication
•  Sentry Collection-Level Authorization
•  Measure index, query overhead separately
Index Test Setup
•  20-node cluster: 12 cores, 96 GB RAM, 12x 2TB disks, 10G Ethernet
•  Cloudera Search-1.2.0, CDH 4.6, MR1, CentOS 6.4
•  260M tweets/docs, indexed across 17 fields
•  116 GB, ~800 JSON .gz files, ~130MB per file, 3-fold HDFS
replication
•  1 Solr server and 1 shard per node (44M docs per shard), no Solr
replication
•  Uses MapReduceIndexerTool contrib. mapper/reducer slots = 2x/1x
number of cores
•  Solr heap size = 20GB
•  Record end-to-end indexing time, i.e., indexing + mtree merge + go
live
•  Record average from 3 repeats
Index Performance Testing
•  Leg	
  column	
  is	
  unsecured	
  
baseline.	
  
•  Center	
  column	
  is	
  ~20%	
  
lower	
  →	
  HDFS	
  security	
  
introduces	
  ~20%	
  
performance	
  overhead.	
  
•  Right	
  column	
  is	
  ~same	
  as	
  
center	
  column	
  →	
  Solr	
  
security	
  introduces	
  no	
  
addi<onal	
  overhead.	
  	
  
Query Test Setup
•  Same setup as MapReduce batch indexing
•  Uses the output of MapReduce batch indexing
•  1 client, 30 threads per client
•  Uses internal tool - QueryRunner
•  Similar	
  to	
  SolrMeter	
  and	
  JMeter	
  
•  Query randomly sampled from fixed set of 10,000 strings
•  Record per thread query throughput for 5 runs of 30 min each
Query Performance Testing
•  Leg	
  column	
  is	
  unsecured	
  
baseline.	
  
•  Center	
  column	
  is	
  ~13%	
  
lower	
  →	
  HDFS	
  security	
  
introduces	
  ~13%	
  
performance	
  overhead.	
  
•  Right	
  column	
  is	
  same	
  as	
  
center	
  column	
  →	
  Solr	
  
security	
  introduces	
  no	
  
addi<onal	
  overhead.	
  	
  
Overview
•  Motivation
•  Why security for Solr / SolrCloud?
•  Why Apache Sentry?
•  Authentication
•  Authorization
•  Collection-level
•  Document-level
•  Secure Impersonation
•  Performance
•  Future Work
Future Work
•  Support for Sentry service with improved APIs / performance /
integration
•  Already supported for Hive/Impala
•  Currently in development upstream
•  “Lineage” security: data flows from one system to another and
retains security criteria
•  Example: Index HBase data for full-text queries in Solr. HBase Table
and Cell-level security tags automatically applied to Solr Collections,
Documents, and Fields
Questions?
•  Thanks for listening!
•  More information / Want to contribute?
http://sentry.incubator.apache.org/
•  Questions?

Mais conteúdo relacionado

Mais procurados

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 

Mais procurados (20)

Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 

Destaque

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Lucidworks
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 

Destaque (20)

Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Securing Solr Search Data in the Cloud
Securing Solr Search Data in the CloudSecuring Solr Search Data in the Cloud
Securing Solr Search Data in the Cloud
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
 
Multi-language Content Discovery Through Entity Driven Search: Presented by A...
Multi-language Content Discovery Through Entity Driven Search: Presented by A...Multi-language Content Discovery Through Entity Driven Search: Presented by A...
Multi-language Content Discovery Through Entity Driven Search: Presented by A...
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
Dive into sentry
Dive into sentryDive into sentry
Dive into sentry
 
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Language support and linguistics in lucene solr & its eco system
Language support and linguistics in lucene solr & its eco systemLanguage support and linguistics in lucene solr & its eco system
Language support and linguistics in lucene solr & its eco system
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Hadoop and Financial Services
Hadoop and Financial ServicesHadoop and Financial Services
Hadoop and Financial Services
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 

Semelhante a Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera

Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
Brock Noland
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 

Semelhante a Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera (20)

Vault
VaultVault
Vault
 
Part 5 of the REAL Webinars on Oracle Cloud Native Application Development - ...
Part 5 of the REAL Webinars on Oracle Cloud Native Application Development - ...Part 5 of the REAL Webinars on Oracle Cloud Native Application Development - ...
Part 5 of the REAL Webinars on Oracle Cloud Native Application Development - ...
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
Securing Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsSecuring Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOps
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
DEF CON 24 - workshop - Craig Young - brainwashing embedded systems
DEF CON 24 - workshop - Craig Young - brainwashing embedded systemsDEF CON 24 - workshop - Craig Young - brainwashing embedded systems
DEF CON 24 - workshop - Craig Young - brainwashing embedded systems
 
Building an Effective Architecture for Identity and Access Management.pdf
Building an Effective Architecture for Identity and Access Management.pdfBuilding an Effective Architecture for Identity and Access Management.pdf
Building an Effective Architecture for Identity and Access Management.pdf
 
Ten Commandments of Secure Coding - OWASP Top Ten Proactive Controls
Ten Commandments of Secure Coding - OWASP Top Ten Proactive ControlsTen Commandments of Secure Coding - OWASP Top Ten Proactive Controls
Ten Commandments of Secure Coding - OWASP Top Ten Proactive Controls
 
Ten Commandments of Secure Coding
Ten Commandments of Secure CodingTen Commandments of Secure Coding
Ten Commandments of Secure Coding
 
IBM Spectrum Scale Authentication For Object - Deep Dive
IBM Spectrum Scale Authentication For Object - Deep Dive IBM Spectrum Scale Authentication For Object - Deep Dive
IBM Spectrum Scale Authentication For Object - Deep Dive
 
ConFoo 2015 - Securing RESTful resources with OAuth2
ConFoo 2015 - Securing RESTful resources with OAuth2ConFoo 2015 - Securing RESTful resources with OAuth2
ConFoo 2015 - Securing RESTful resources with OAuth2
 
Securing Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPSecuring Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTP
 
Kubernetes Security
Kubernetes SecurityKubernetes Security
Kubernetes Security
 
Cache Security- The Basics
Cache Security- The BasicsCache Security- The Basics
Cache Security- The Basics
 
Putting it All Together: Securing Systems at Cloud Scale
Putting it All Together: Securing Systems at Cloud ScalePutting it All Together: Securing Systems at Cloud Scale
Putting it All Together: Securing Systems at Cloud Scale
 
Spa Secure Coding Guide
Spa Secure Coding GuideSpa Secure Coding Guide
Spa Secure Coding Guide
 
Managing your secrets in a cloud environment
Managing your secrets in a cloud environmentManaging your secrets in a cloud environment
Managing your secrets in a cloud environment
 
DevSecOps 實踐與 GitHub 進階安全: 建立安全的開發流程
DevSecOps 實踐與 GitHub 進階安全: 建立安全的開發流程DevSecOps 實踐與 GitHub 進階安全: 建立安全的開發流程
DevSecOps 實踐與 GitHub 進階安全: 建立安全的開發流程
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 

Mais de Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Mais de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Secure Search - Using Apache Sentry to Add Authentication and Authorization Support to Solr: Presented by Gregory Chanan, Cloudera

  • 1.
  • 2. Secure Solr With Apache Sentry Gregory Chanan, Engineer @ Cloudera gchanan AT cloudera.com
  • 3. Who Am I? •  Software Engineer at Cloudera •  Apache Solr Committer •  Apache Sentry Committer (incubating) •  Apache HBase Committer
  • 4. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 5. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 6. Why Security? •  Apache Solr only provides minimal security features “Solr  allows  any  client  with  access  to  it  to  add,  update,  and  delete  documents     (and  of  course  search/read  too),  including  access  to  the  Solr  configura<on  and   schema  files  and  the  administra<ve  user  interface.”[1]     •  In the past, deployed as a single server “It  is  strongly  recommended  that  the  applica<on  server  containing  Solr  be  firewalled  such   the  only  clients  with  access  to  Solr  are  your  own.”  [1]  
  • 7. Why Security? •  SolrCloud driving adoption in Big Data space •  Now, a component of a multi-tenant Hadoop cluster •  Non-­‐solr  users  on  cluster   •  Solr  communicates  across  machines  and  services  
  • 8. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 9. Why Apache Sentry? •  Sentry already established in Hadoop ecosystem •  Has  understood  authen<ca<on  model  (kerberos)   •  Has  understood  privilege/ac<on  model   •  Security-focused project •  Solr  focus  on  Search  Engine   •  Sentry  focus  on  Security  
  • 10. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 11. Authentication •  Authentication: Verifying identity of a user or service •  Solr supports authenticating with dependent services (i.e. HDFS and ZooKeeper*) •  Sentry goal: support other services / users authenticating with Solr •  Consistent with other HTTP-level Hadoop services (e.g. Oozie and HttpFs), Apache Sentry uses: •  Kerberos: a mutual authentication protocol that works on the basis of “tickets” •  SPNego: a negotiation mechanism for selecting an underlying authentication protocol
  • 12. SPNego advantages •  HTTP Tools have built-in support for SPNego/Kerberos •  Web browsers •  curl (with --negotiate) •  HTTP libraries, including Apache HttpClient (used by solrj) •  Although an authentication (not authorization) protocol, can be used for cluster-level access control •  Only grant kerberos credentials to users who should have access to the cluster
  • 13. Authentication Setup •  Server side: use Sentry-provided web.xml which has a kerberos/ SPNego aware filter •  Have  to  setup  keytabs/principals/JAAS  configura<ons     •  Client side: Sentry provides HttpClient / HttpSolrServer configuration for communicating with kerberos/SPNego aware Solr servers •  Have  to  setup  keytabs/principals/JAAS  configura<ons   •  Cloudera Manager can do setup for you
  • 14. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 15. Authorization •  Authorization: Controlling access to resources •  Solr does not provide collection/document authorization support •  Does support “hooks” via solr.xml and solrconfig.xml to override request handler implementation •  Sentry uses these “hooks” to implement collection and document level authorization
  • 16. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 17. Collection-level Authorization •  Sentry supports role-based granting of privileges •  each  role  can  be  granted  QUERY,  UPDATE,  and/or  administra<ve  privileges   on  an  collec<on   •  Privileges stored in a “policy file” on HDFS: [groups]   #  Assigns  each  Hadoop  group  to  its  set  of  roles   dev_ops  =  engineer_role,  ops_role   [roles]   #  Assigns  each  role  to  its  set  of  privileges   engineer_role  =  collec<on  =  source_code-­‐>ac<on=Query,      collec<on  =  source_code  -­‐>  ac<on=Update   ops_role  =  collec<on  =  hbase_logs  -­‐>  ac<on=Query  
  • 18. Integrating Sentry and Solr •  Sentry integrated via “hooks” in request handlers: •  Specified per collection in solrconfig.xml: •  Sentry ships with its own version of solrconfig.xml with secure handlers, called solrconfig.xml.secure
  • 19. Administrative requests •  That covers queries/updates of collections, but what about administrative actions such as getting the status of the cores? •  In SolrCloud, admin looks like a collection: http://localhost:8983/solr/admin/cores?action=STATUS •  Can just follow this structure in Sentry: sample_role  =  collec<on  =  admin-­‐>ac<on=Query,   •  Secure Admin Handlers controlled via cluster-wide “solr.xml” in ZooKeeper. By default, you get Secure Admin Handlers if Sentry is enabled
  • 20. Administrative requests •  Full privilege model documented here •  Examples (colllection1 = arbitrary collection name): Ac-on   Required  Privilege   Collec-on   select   QUERY   collec<on1   update/json   UPDATE   collec<on1   ThreadDumpHandler   QUERY   admin  
  • 21. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 22. Document-level authorization motivation •  Collection-level authorization useful when access control requirements for documents are homogeneous •  Security requirements may require restricting access to a subset of documents •  Consider “Confidential” and “Secret” documents. How to store with only collection-level authorization? •  Pushes complexity to application
  • 23. Document-level authorization model •  Instead of Policy File in HDFS: [groups]   #  Assigns  each  Hadoop  group  to  its  set  of  roles   dev_ops  =  engineer_role,  ops_role   [roles]   #  Assigns  each  role  to  its  set  of  privileges   engineer_role  =  collec<on  =  source_code-­‐>ac<on=Query,      collec<on  =  source_code-­‐>ac<on=Update   ops_role  =  collec<on  =  hbase_logs-­‐>ac<on=Query   •  Store authorization tokens in each document •  Many  more  documents  than  collec<ons;  doesn’t  scale  to  store  document-­‐ level  info  in  Policy  File   •  Can  use  Solr’s  built-­‐in  filtering  capabili<es  to  restrict  access  
  • 24. Document-level authorization model •  A configurable field stores the authorization tokens •  The authorization tokens are Sentry roles, i.e. “ops_role”  [roles]    ops_role  =  collec<on  =  hbase_logs-­‐>ac<on=Query   •  Represents the roles that are allowed to view the document. To view a document, the querying user must belong to at least one role whose token is stored in the token field •  Can modify document permissions without restarting Solr •  Can modify role memberships without reindexing
  • 25. Document-level authorization impl •  Intercepts the request via a SearchComponent •  SearchComponent adds an “fq” or FilterQuery •  Filter  out  all  documents  that  don’t  have  “role1”  or  “role2”  in  authField   •  Filters are cached, so only construction expense once •  Note: does not supersede collection-level authorization
  • 26. Document-level authorization config •  Configuration via solrconfig.xml.secure (per collection):      <!-­‐-­‐  Set  to  true  to  enabled  document-­‐level  authoriza<on  -­‐-­‐>        <bool  name="enabled">false</bool>        <!-­‐-­‐  Field  where  the  auth  tokens  are  stored  in  the  document  -­‐-­‐>        <str  name="sentryAuthField">sentry_auth</str>        <!-­‐-­‐  Auth  token  defined  to  allow  any  role  to  access  the    document.              Uncomment  to  enable.  -­‐-­‐>        <!-­‐-­‐<str  name="allRolesToken">*</str>-­‐-­‐>   •  No tokens = no access. To allow all users to access a document, use the allRolesToken. Useful for getting started
  • 27. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 28. Secure Impersonation •  But wait! My users don’t interact with Solr directly •  Custom web UI, load balancer, etc. •  Authorization won’t work! •  “user” is forgotten, request to Solr from “UI”  
  • 29. Secure Impersonation •  Secure impersonation: the ability of a “super-user” to submit requests on behalf of another user •  Conceptually  similar  to  “sudo”  on  Unix   •  Limited  to  only  groups/hosts  that  are  explicitly  configured  to  support  it   •  Iden<cal  to  func<onality  provided  by  HDFS,  Oozie    
  • 30. Hue Search App UI •  Uses Secure Impersonation to integrate with its own security mechanisms •  Users  can  login  to  Hue  via  LDAP  or  other  auth  mechanism   •  Hue  makes  requests  on  behalf  of  logged  in  user   •  Only  Hue  user  requires  kerberos  keytab   •  Seamlessly integrates with the collection and document-level access control mechanisms
  • 31. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 32. Performance Testing •  Goal is to measure overhead of: •  Kerberos Authentication •  Sentry Collection-Level Authorization •  Measure index, query overhead separately
  • 33. Index Test Setup •  20-node cluster: 12 cores, 96 GB RAM, 12x 2TB disks, 10G Ethernet •  Cloudera Search-1.2.0, CDH 4.6, MR1, CentOS 6.4 •  260M tweets/docs, indexed across 17 fields •  116 GB, ~800 JSON .gz files, ~130MB per file, 3-fold HDFS replication •  1 Solr server and 1 shard per node (44M docs per shard), no Solr replication •  Uses MapReduceIndexerTool contrib. mapper/reducer slots = 2x/1x number of cores •  Solr heap size = 20GB •  Record end-to-end indexing time, i.e., indexing + mtree merge + go live •  Record average from 3 repeats
  • 34. Index Performance Testing •  Leg  column  is  unsecured   baseline.   •  Center  column  is  ~20%   lower  →  HDFS  security   introduces  ~20%   performance  overhead.   •  Right  column  is  ~same  as   center  column  →  Solr   security  introduces  no   addi<onal  overhead.    
  • 35. Query Test Setup •  Same setup as MapReduce batch indexing •  Uses the output of MapReduce batch indexing •  1 client, 30 threads per client •  Uses internal tool - QueryRunner •  Similar  to  SolrMeter  and  JMeter   •  Query randomly sampled from fixed set of 10,000 strings •  Record per thread query throughput for 5 runs of 30 min each
  • 36. Query Performance Testing •  Leg  column  is  unsecured   baseline.   •  Center  column  is  ~13%   lower  →  HDFS  security   introduces  ~13%   performance  overhead.   •  Right  column  is  same  as   center  column  →  Solr   security  introduces  no   addi<onal  overhead.    
  • 37. Overview •  Motivation •  Why security for Solr / SolrCloud? •  Why Apache Sentry? •  Authentication •  Authorization •  Collection-level •  Document-level •  Secure Impersonation •  Performance •  Future Work
  • 38. Future Work •  Support for Sentry service with improved APIs / performance / integration •  Already supported for Hive/Impala •  Currently in development upstream •  “Lineage” security: data flows from one system to another and retains security criteria •  Example: Index HBase data for full-text queries in Solr. HBase Table and Cell-level security tags automatically applied to Solr Collections, Documents, and Fields
  • 39. Questions? •  Thanks for listening! •  More information / Want to contribute? http://sentry.incubator.apache.org/ •  Questions?