SlideShare uma empresa Scribd logo
1 de 36
How-to create a multi tenancy for
an interactive data analysis with
JupyterHub & LDAP
Spark Cluster + Jupyter + LDAP
Introduction
With this presentation you should be able to create an architecture for a framework of an
interactive data analysis by using a Cloudera Spark Cluster with Kerberos, a Jupyter
machine with JupyterHub and authentication via LDAP.
Architecture
This architecture enables the following:
● Transparent data-science development
● User Impersonation
● Authentication via LDAP
● Upgrades on Cluster won’t affect the developments.
● Controlled access to the data and resources by Kerberos/Sentry.
● Several coding API’s (Scala, R, Python, PySpark, etc…).
● Two layers of security with Kerberos & LDAP
Architecture
Pre-Assumptions
1. Cluster hostname: cm1.localdomain Jupyter hostname: cm3.localdomain
2. Cluster Python version: 3.7.1
3. Cluster Manager: Cloudera Manager 5.12.2
4. Service Yarn & PIP Installed
5. Cluster Authentication Pre-Installed: Kerberos
a. Kerberos Realm DOMAIN.COM
6. Chosen IDE: Jupyter
7. JupyterHub Machine Authentication Not-Installed: Kerberos
8. AD Machine Installed with hostname: ad.localdomain
9. Java 1.8 installed in Both Machines
10. Cluster Spark version 2.2.0
Anaconda
Download and installation
su - root
wget https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-x86_64.sh
chmod +x Anaconda3-2018.12-Linux-x86_64.sh
./Anaconda3-2018.12-Linux-x86_64.sh
Note 1: Change with your hostname and domain in the highlighted field.
Note 2: Due to the package SudoSpawner - that requires Anaconda be installed with the root user!
Note 3: JupyterHub requires Python 3.X, therefore it will be installed Anaconda 3
Anaconda
Path environment variables
export PATH=/opt/anaconda3/bin:$PATH
Java environment variables
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64/;
Spark environment variables
export SPARK_HOME=/opt/spark;
export SPARK_MASTER_IP=10.191.38.83;
Yarn environment variables
export YARN_CONF_DIR=/etc/hadoop/conf
Yarn environment variables
export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip;
export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py;
export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python;
Note: Change with your values in the highlighted field.
Hadoop environment variables
export HADOOP_HOME=/etc/hadoop/conf;
export HADOOP_CONF_DIR=/etc/hadoop/conf;
Hive environment variables
export HIVE_HOME=/etc/hadoop/conf;
Anaconda
Validate installation
anaconda-navigator
Update Conda (Only if needed)
conda update -n base -c defaults conda
Start Jupyter Notebook (If non root)
jupyter-notebook --ip='10.111.22.333' --port 9001 --debug > /opt/anaconda3/log.txt 2>&1
Start Jupyter Notebook (if root)
jupyter-notebook --ip='10.111.22.333' --port 9001 --debug --allow-root > /opt/anaconda3/log.txt 2>&1
Note: it’s only necessary to change the highlighted, ex: for your ip.
Jupyter or JupyterHub?
JupyterHub it’s a multi-purpose notebook that:
● Manages authentication.
● Spawns single-user notebook on-demand.
● Gives each user a complete notebook
server.
How to choose?
JupyterHub
Install JupyterHub Package (with Http-Proxy)
conda install -c conda-forge jupyterhub
Validate Installation
jupyterhub -h
Start JupyterHub Server
jupyterhub --ip='10.111.22.333' --port 9001 --debug > /opt/anaconda3/log.txt 2>&1
Note: it’s only necessary to change the highlighted, ex: for your ip.
JupyterHub With LDAP
Install Simple LDAP Authenticator Plugin for JupyterHub
conda install -c conda-forge jupyterhub-ldapauthenticator
Install SudoSpawner
conda install -c conda-forge sudospawner
Install Package LDAP to be able to Create Users Locally
pip install jupyterhub-ldapcreateusers
Generate JupyterHub Config File
jupyterhub --generate-config
Note 1: it’s only necessary to change the highlighted, ex: for your ip.
Note 2: Sudospawner enables JupyterHub to spawn single-user servers without being root
JupyterHub With LDAP
Configure JupyterHub Config File
nano /opt/anaconda3/jupyterhub_config.py
import os
import pwd
import subprocess
# Function to Create User Home
def create_dir_hook(spawner):
if not os.path.exists(os.path.join('/home/', spawner.user.name)):
subprocess.call(["sudo", "/sbin/mkhomedir_helper", spawner.user.name])
c.Spawner.pre_spawn_hook = create_dir_hook
c.JupyterHub.authenticator_class = 'ldapcreateusers.LocalLDAPCreateUsers'
c.LocalLDAPCreateUsers.server_address = 'ad.localdomain'
c.LocalLDAPCreateUsers.server_port = 3268
c.LocalLDAPCreateUsers.use_ssl = False
c.LocalLDAPCreateUsers.lookup_dn = True
# Instructions to Define LDAP Search - Doesn't have in consideration possible group users
c.LocalLDAPCreateUsers.bind_dn_template = ['CN={username},DC=ad,DC=localdomain']
c.LocalLDAPCreateUsers.user_search_base = 'DC=ad,DC=localdomain'
JupyterHub With LDAP
c.LocalLDAPCreateUsers.lookup_dn_search_user = 'admin'
c.LocalLDAPCreateUsers.lookup_dn_search_password = 'passWord'
c.LocalLDAPCreateUsers.lookup_dn_user_dn_attribute = 'CN'
c.LocalLDAPCreateUsers.user_attribute = 'sAMAccountName'
c.LocalLDAPCreateUsers.escape_userdn = False
c.JupyterHub.hub_ip = '10.111.22.333’
c.JupyterHub.port = 9001
# Instructions Required to Add User Home
c.LocalAuthenticator.add_user_cmd = ['useradd', '-m']
c.LocalLDAPCreateUsers.create_system_users = True
c.Spawner.debug = True
c.Spawner.default_url = 'tree/home/{username}'
c.Spawner.notebook_dir = '/'
c.PAMAuthenticator.open_sessions = True
Start JupyterHub Server With Config File
jupyterhub -f /opt/anaconda3/jupyterhub_config.py --debug
Note: it’s only necessary to change the highlighted, ex: for your ip.
JupyterHub with LDAP + ProxyUser
Has a reminder, to have ProxyUser working, you will require on both Machines (Cluster and JupyterHub): Java 1.8 and
same Spark version, for this example it will be used the 2.2.0.
[Cluster] Confirm Cluster Spark & Hadoop Version
spark-shell
hadoop version
[JupyterHub] Download Spark & Create Symbolic link
cd /tmp/
wget https://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.6.tgz
tar zxvf spark-2.2.0-bin-hadoop2.6.tgz
mv spark-2.2.0-bin-hadoop2.6 /opt/spark-2.2.0
ln -s /opt/spark-2.2.0 /opt/spark
Note: change with your Spark and Hadoop version in the highlighted field.
Jupyter Hub with LDAP + ProxyUser
[Cluster] Copy Hadoop/Hive/Spark Config files
cd /etc/spark2/conf.cloudera.spark2_on_yarn/
scp * root@10.111.22.333:/etc/hadoop/conf/
[Cluster] HDFS ProxyUser
Note: change with your IP and directory’s in the highlighted field.
[JupyterHub] Create hadoop config files directory
mkdir -p /etc/hadoop/conf/
ln -s /etc/hadoop/conf/ conf.cloudera.yarn
[JupyterHub] Create spark-events directory
mkdir /tmp/spark-events
chown spark:spark spark-events
chmod 777 /tmp/spark-events
[JupyterHub] Test Spark 2
spark-submit --class org.apache.spark.examples.SparkPi 
--master yarn 
--num-executors 1 --driver-memory 512m --executor-memory 512m 
--executor-cores 1 --deploy-mode cluster 
--proxy-user tpsimoes --keytab /root/jupyter.keytab 
--conf spark.eventLog.enabled=true 
/opt/spark-2.2.0/examples/jars/spark-examples_2.11-2.2.0.jar 10;
Check available kernel specs
jupyter kernelspec list
Install PySpark Kernel
conda install -c conda-forge pyspark
Confirm kernel installation
jupyter kernelspec list
Edit PySpark kernel
nano /opt/anaconda3/share/jupyter/kernels/pyspark/kernel.json
{"argv":
["/opt/anaconda3/share/jupyter/kernels/pyspark/python.sh", "-f", "{connection_file}"],
"display_name": "PySpark (Spark 2.2.0)", "language":"python" }
Create PySpark Script
cd /opt/anaconda3/share/jupyter/kernels/pyspark;
touch python.sh;
chmod a+x python.sh;
Jupyter Hub with LDAP + ProxyUser
Jupyter Hub with LDAP + ProxyUser
The python.sh script was elaborated due to the limitations on JupyterHub Kernel configurations that isn't able to get the
Kerberos Credentials and also due to LDAP package that doesn't allow the proxyUser has is possible with Zeppelin. Therefore
with this architecture solution you are able to:
● Add a new step of security, that requires the IDE keytab
● Enable the usage of proxyUser by using the flag from spark --proxy-user ${KERNEL_USERNAME}
Edit PySpark Script
touch /opt/anaconda3/share/jupyter/kernels/pyspark/python.sh;
nano /opt/anaconda3/share/jupyter/kernels/pyspark/python.sh;
# !/usr/bin/env bash
# setup environment variable, etc.
PROXY_USER="$(whoami)"
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
export SPARK_HOME=/opt/spark
export SPARK_MASTER_IP=10.111.22.333
export HADOOP_HOME=/etc/hadoop/conf
Jupyter Hub with LDAP + ProxyUser
Edit PySpark Script
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_HOME=/etc/hadoop/conf
export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip
export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py
export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python
export PYSPARK_SUBMIT_ARGS="-v --master yarn --deploy-mode client --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer --num-executors 2 --driver-memory 1024m --executor-memory 1024m
--executor-cores 2 --proxy-user "${PROXY_USER}" --keytab /tmp/jupyter.keytab pyspark-shell"
# Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS
kinit -kt /tmp/jupyter.keytab jupyter/cm1.localdomain@DOMAIN.COM
# run the ipykernel
exec /opt/anaconda3/bin/python -m ipykernel $@
Note: change with your IP and directories in the highlighted field.
Interact with JupyterHub
Login
http://10.111.22.333:9001/hub/login
Notebook Kernel
To use JupyterLab without it being the default interface, you just have to
swap on your browser url the “tree” with Lab!
http://10.111.22.333:9001/user/tpsimoes/lab
JupyterLab
JupyterLab it’s the next-generation web-based
interface for Jupyter.
Install JupyterLab
conda install -c conda-forge jupyterlab
Install JupyterLab Launcher
conda install -c conda-forge jupyterlab_launcher
JupyterLab
To be able to use the JupyterLab interface as default on Jupyter it requires additional changes.
● Change the JupyterHub Config File
● Additional extensions (for the Hub Menu)
● Create config file for JupyterLab
Edit PySpark Script
nano /opt/anaconda3/jupyterhub_config.py
...
# Change the values on this Flags
c.Spawner.default_url = '/lab'
c.Spawner.notebook_dir = '/home/{username}'
# Add this Flag
c.Spawner.cmd = ['jupyter-labhub']
JupyterLab
Install jupyterlab-hub extension
jupyter labextension install @jupyterlab/hub-extension
Create JupyterLab Config File
cd /opt/anaconda3/share/jupyter/lab/settings/
nano page_config.json
{
"hub_prefix": "/jupyter"
}
JupyterLab
The final architecture:
R, Hive and Impala on JupyterHub
On this section the focus will reside on R, Hive, Impala and Kerberized Kernel.
With R Kernel, it requires libs on both Machines (Cluster and Jupyter)
[Cluster & Jupyter] Install R Libs
yum install -y openssl-devel openssl libcurl-devel libssh2-devel
[Jupyter] Create SymLinks for R libs
ln -s /opt/anaconda3/lib/libssl.so.1.0.0 /usr/lib64/libssl.so.1.0.0;
ln -s /opt/anaconda3/lib/libcrypto.so.1.0.0 /usr/lib64/libcrypto.so.1.0.0;
[Cluster & Jupyter] To use SparkR
devtools::install_github('apache/spark@v2.2.0', subdir='R/pkg')
Note: Change with your values in the highlighted field.
[Cluster & Jupyter] Start R & Install Packages
R
install.packages('git2r')
install.packages('devtools')
install.packages('repr')
install.packages('IRdisplay')
install.packages('crayon')
install.packages('pbdZMQ')
R, Hive and Impala on JupyterHub
To interact with Hive metadata and the direct use of the sintax, the my recommendation is the HiveQL.
Install Developer Toolset Libs
yum install cyrus-sasl-devel.x86_64 cyrus-sasl-gssapi.x86_64 cyrus-sasl-sql.x86_64 cyrus-sasl-plain.x86_64 gcc-c++
Install Python + Hive interface (SQLAlchemy interface for Hive)
pip install pyhive
Install HiveQL Kernel
pip install --upgrade hiveqlKernel
jupyter hiveql install
Confirm HiveQL Kernel installation
jupyter kernelspec list
R, Hive and Impala on JupyterHub
Edit HiveQL Kernel
cd /usr/local/share/jupyter/kernels/hiveql
nano kernel.json
{"argv":
["/usr/local/share/jupyter/kernels/hiveql/hiveql.sh", "-f", "{connection_file}"],
"display_name": "HiveQL", "language": "hiveql", "name": "hiveql"}
Create and Edit HiveQL script
touch /opt/anaconda3/share/jupyter/kernels/hiveql/hiveql.sh;
nano /opt/anaconda3/share/jupyter/kernels/hiveql/hiveql.sh;
# !/usr/bin/env bash
# setup environment variable, etc.
PROXY_USER="$(whoami)"
R, Hive and Impala on JupyterHub
Edit HiveQL script
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
export SPARK_HOME=/opt/spark
export HADOOP_HOME=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_HOME=/etc/hadoop/conf
export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip
export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py
export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python
export HIVE_AUX_JARS_PATH=/etc/hadoop/postgresql-9.0-801.jdbc4.jar
export HADOOP_CLIENT_OPTS="-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true"
# Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS
kinit -kt /tmp/jupyter.keytab jupyter/cm1.localdomain@DOMAIN.COM
# run the ipykernel
exec /opt/anaconda3/bin/python -m ipykernel $@
Note 1: change with your IP. directories and versions in the highlighted field.
Note 2: add your users keytab to a chosen directory so that is possible to run with proxyuser
R, Hive and Impala on JupyterHub
To interact with Impala metadata, my recommendation is the Impyla, but there’s a catch, because due to a specific version of a
lib (thrift_sasl), the HiveQL will stop working, because hiveqlkernel 1.0.13 has the requirement thrift-sasl==0.3.*.
Install Developer Toolset Libs
yum install cyrus-sasl-devel.x86_64 cyrus-sasl-gssapi.x86_64 cyrus-sasl-sql.x86_64 cyrus-sasl-plain.x86_64 gcc-c++
Install additional Libs for Impyla
pip install thrift_sasl==0.2.1: pip install sasl;
Install ipython-sql
conda install -c conda-forge ipython-sql
Install impyla
pip install impyla==0.15a1
Note: it was installed a alfa version for impyla due to an incompatibility with python versions superior to 3.7.
R, Hive and Impala on JupyterHub
If you require to have access to Hive & Impala metadata, you can use Python + Hive with a kerberized custom kernel.
Install Jaydebeapi package
conda install -c conda-forge jaydebeapi
Create Python Kerberized Kernel
mkdir -p /usr/share/jupyter/kernels/pythonKerb
cd /usr/share/jupyter/kernels/pythonKerb
touch kernel.json
touch pythonKerb.sh
chmod a+x /usr/share/jupyter/kernels/pythonKerb/pythonKerb.sh
Note: Change with your values in the highlighted field.
Edit Kerberized Kernel
nano /usr/share/jupyter/kernels/kernel.json
{"argv":
["/usr/local/share/jupyter/kernels/pythonKerb/pythonKerb.sh
", "-f", "{connection_file}"],
"display_name": "PythonKerberized", "language": "python",
"name": "pythonKerb"}
Edit Kerberized Kernel script
nano /usr/share/jupyter/kernels/pythonKerb/pythonKerb.sh
R, Hive and Impala on JupyterHub
Edit Kerberized Kernel script
PROXY_USER="$(whoami)"
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
export SPARK_HOME=/opt/spark
export HADOOP_HOME=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_HOME=/etc/hadoop/conf
export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip
export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py
export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python
export HIVE_AUX_JARS_PATH=/etc/hadoop/postgresql-9.0-801.jdbc4.jar
export HADOOP_CLIENT_OPTS="-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true"
export CLASSPATH=$CLASSPATH:`hadoop classpath`:/etc/hadoop/*:/tmp/*
export PYTHONPATH=$PYTHONPATH:/opt/anaconda3/lib/python3.7/site-packages/jaydebeapi
# Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS
kinit -kt /tmp/${PROXY_USER}.keytab ${PROXY_USER}@DOMAIN.COM
# run the ipykernel
exec /opt/anaconda3/bin/python -m ipykernel_launcher $@
R, Hive and Impala on JupyterHub
Assuming that you don't have Impyla installed, or if so, you have created an environment for it!
HiveQL it’s the best Kernel to access to hive metadata and it has support.
Install Developer Toolset Libs
yum install cyrus-sasl-devel.x86_64 cyrus-sasl-gssapi.x86_64 cyrus-sasl-sql.x86_64 cyrus-sasl-plain.x86_64 gcc-c++
Install Hive interface & HiveQL Kernel
pip install pyhive; pip install --upgrade hiveqlKernel;
Jupyter Install Kernel
jupyter hiveql install
Check kernel installation
jupyter kernelspec list
R, Hive and Impala on JupyterHub
To access to a kerberized Cluster you will require a Kerberos Ticket in cache, therefore the solution will be the following:
Edit Kerberized Kernel
nano /usr/local/share/jupyter/kernels/hiveql/kernel.json
{"argv":
["/usr/local/share/jupyter/kernels/hiveql/hiveql.sh", "-f", "{connection_file}"],
"display_name": "HiveQL", "language": "hiveql", "name": "hiveql"}
Edit Kerberized Kernel script
touch /usr/local/share/jupyter/kernels/hiveql/hiveql.sh
nano /usr/local/share/jupyter/kernels/hiveql/hiveql.sh
Note: Change with your values in the highlighted field.
R, Hive and Impala on JupyterHub
Edit Kerberized Kernel script
PROXY_USER="$(whoami)"
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
export SPARK_HOME=/opt/spark
export HADOOP_HOME=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_HOME=/etc/hadoop/conf
export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip
export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py
export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python
export HIVE_AUX_JARS_PATH=/etc/hadoop/postgresql-9.0-801.jdbc4.jar
export HADOOP_CLIENT_OPTS="-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true"
# Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS
kinit -kt /tmp/${PROXY_USER}.keytab ${PROXY_USER}@DOMAIN.COM
# run the ipykernel
exec /opt/anaconda3/bin/python -m hiveql $@
Note: Change with your values in the highlighted field.
Interact with JupyterHub Kernels
The following information will serve as base of knowledge, how to interact with previous configured kernels with a
kerberized Cluster.
[HiveQL] Create Connection
$$ url=hive://hive@cm1.localdomain:10000/
$$ connect_args={"auth": "KERBEROS","kerberos_service_name": "hive"}
$$ pool_size=5
$$ max_overflow=10
[Impyla] Create Connection
from impala.dbapi import connect
conn = connect(host='cm1.localdomain', port=21050, kerberos_service_name='impala', auth_mechanism='GSSAPI')
Note: Change with your values in the highlighted field.
Interact with JupyterHub Kernels
[Impyla] Create Connection via SQLMagic
%load_ext sql
%config SqlMagic.autocommit=False
%sql impala://tpsimoes:welcome1@cm1.localdomain:21050/db?kerberos_service_name=impala&auth_mechanism=GSSAPI
[Python] Create Connection
import jaydebeapi
import pandas as pd
conn_hive =
jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver","jdbc:hive2://cm1.localdomain:10000/db;AuthMech=1;KrbRealm=DOMAIN.
COM;KrbHostFQDN=cm1.localdomain;KrbServiceName=hive;KrbAuthType=2")
[Python] Kinit Keytab
import subprocess
result = subprocess.run(['kinit', '-kt','/tmp/tpsimoes.keytab',tpsimoes/cm1.localdomain@DOMAIN.COM'],
stdout=subprocess.PIPE)
result.stdout
Note: Change with your values in the highlighted field.
Thanks
Big Data Engineer
Tiago Simões

Mais conteúdo relacionado

Mais procurados

Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Databricks
 
Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresNeo4j
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...HostedbyConfluent
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Getting Started with FIDO2
Getting Started with FIDO2Getting Started with FIDO2
Getting Started with FIDO2FIDO Alliance
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Web Authentication API
Web Authentication APIWeb Authentication API
Web Authentication APIFIDO Alliance
 
Easy Cloud Native Transformation with Nomad
Easy Cloud Native Transformation with NomadEasy Cloud Native Transformation with Nomad
Easy Cloud Native Transformation with NomadBram Vogelaar
 
Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링JANGWONSEO4
 
What makes a successful SSI strategy?
What makes a successful SSI strategy?What makes a successful SSI strategy?
What makes a successful SSI strategy?Evernym
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j
 
OpenID Connect: An Overview
OpenID Connect: An OverviewOpenID Connect: An Overview
OpenID Connect: An OverviewPat Patterson
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Fido Technical Overview
Fido Technical OverviewFido Technical Overview
Fido Technical OverviewFIDO Alliance
 
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...HostedbyConfluent
 

Mais procurados (20)

Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined Procedures
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
OrientDB
OrientDBOrientDB
OrientDB
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Getting Started with FIDO2
Getting Started with FIDO2Getting Started with FIDO2
Getting Started with FIDO2
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Web Authentication API
Web Authentication APIWeb Authentication API
Web Authentication API
 
Easy Cloud Native Transformation with Nomad
Easy Cloud Native Transformation with NomadEasy Cloud Native Transformation with Nomad
Easy Cloud Native Transformation with Nomad
 
Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링
 
What makes a successful SSI strategy?
What makes a successful SSI strategy?What makes a successful SSI strategy?
What makes a successful SSI strategy?
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
OpenID Connect: An Overview
OpenID Connect: An OverviewOpenID Connect: An Overview
OpenID Connect: An Overview
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Fido Technical Overview
Fido Technical OverviewFido Technical Overview
Fido Technical Overview
 
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
 

Semelhante a How to create a multi tenancy for an interactive data analysis with jupyter hub and ldap

How to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubHow to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubTiago Simões
 
Provisioning with Puppet
Provisioning with PuppetProvisioning with Puppet
Provisioning with PuppetJoe Ray
 
Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)HungWei Chiu
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
 
Puppet for Developers
Puppet for DevelopersPuppet for Developers
Puppet for Developerssagarhere4u
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetNicolas Brousse
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetAchieve Internet
 
Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with PuppetKris Buytaert
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with PuppetKris Buytaert
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context ConstraintsAlessandro Arrichiello
 
Pyramid Deployment and Maintenance
Pyramid Deployment and MaintenancePyramid Deployment and Maintenance
Pyramid Deployment and MaintenanceJazkarta, Inc.
 
k8s practice 2023.pptx
k8s practice 2023.pptxk8s practice 2023.pptx
k8s practice 2023.pptxwonyong hwang
 
Advanced Eclipse Workshop (held at IPC2010 -spring edition-)
Advanced Eclipse Workshop (held at IPC2010 -spring edition-)Advanced Eclipse Workshop (held at IPC2010 -spring edition-)
Advanced Eclipse Workshop (held at IPC2010 -spring edition-)Bastian Feder
 
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Nicolas Brousse
 
Chef - industrialize and automate your infrastructure
Chef - industrialize and automate your infrastructureChef - industrialize and automate your infrastructure
Chef - industrialize and automate your infrastructureMichaël Lopez
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013grim_radical
 
How to create a secured cloudera cluster
How to create a secured cloudera clusterHow to create a secured cloudera cluster
How to create a secured cloudera clusterTiago Simões
 

Semelhante a How to create a multi tenancy for an interactive data analysis with jupyter hub and ldap (20)

How to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubHow to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHub
 
Provisioning with Puppet
Provisioning with PuppetProvisioning with Puppet
Provisioning with Puppet
 
Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
 
Puppet for Developers
Puppet for DevelopersPuppet for Developers
Puppet for Developers
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with Puppet
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with Puppet
 
EC CUBE 3.0.x installation guide
EC CUBE 3.0.x installation guideEC CUBE 3.0.x installation guide
EC CUBE 3.0.x installation guide
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 
Cooking with Chef
Cooking with ChefCooking with Chef
Cooking with Chef
 
Pyramid Deployment and Maintenance
Pyramid Deployment and MaintenancePyramid Deployment and Maintenance
Pyramid Deployment and Maintenance
 
k8s practice 2023.pptx
k8s practice 2023.pptxk8s practice 2023.pptx
k8s practice 2023.pptx
 
Advanced Eclipse Workshop (held at IPC2010 -spring edition-)
Advanced Eclipse Workshop (held at IPC2010 -spring edition-)Advanced Eclipse Workshop (held at IPC2010 -spring edition-)
Advanced Eclipse Workshop (held at IPC2010 -spring edition-)
 
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
 
Chef - industrialize and automate your infrastructure
Chef - industrialize and automate your infrastructureChef - industrialize and automate your infrastructure
Chef - industrialize and automate your infrastructure
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013
 
How to create a secured cloudera cluster
How to create a secured cloudera clusterHow to create a secured cloudera cluster
How to create a secured cloudera cluster
 

Mais de Tiago Simões

How to go the extra mile on monitoring
How to go the extra mile on monitoringHow to go the extra mile on monitoring
How to go the extra mile on monitoringTiago Simões
 
How to scheduled jobs in a cloudera cluster without oozie
How to scheduled jobs in a cloudera cluster without oozieHow to scheduled jobs in a cloudera cluster without oozie
How to scheduled jobs in a cloudera cluster without oozieTiago Simões
 
How to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureHow to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureTiago Simões
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinTiago Simões
 
How to install and use multiple versions of applications in run-time
How to install and use multiple versions of applications in run-timeHow to install and use multiple versions of applications in run-time
How to install and use multiple versions of applications in run-timeTiago Simões
 
Hive vs impala vs spark - tuning
Hive vs impala vs spark - tuningHive vs impala vs spark - tuning
Hive vs impala vs spark - tuningTiago Simões
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisTiago Simões
 

Mais de Tiago Simões (7)

How to go the extra mile on monitoring
How to go the extra mile on monitoringHow to go the extra mile on monitoring
How to go the extra mile on monitoring
 
How to scheduled jobs in a cloudera cluster without oozie
How to scheduled jobs in a cloudera cluster without oozieHow to scheduled jobs in a cloudera cluster without oozie
How to scheduled jobs in a cloudera cluster without oozie
 
How to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureHow to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architecture
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelin
 
How to install and use multiple versions of applications in run-time
How to install and use multiple versions of applications in run-timeHow to install and use multiple versions of applications in run-time
How to install and use multiple versions of applications in run-time
 
Hive vs impala vs spark - tuning
Hive vs impala vs spark - tuningHive vs impala vs spark - tuning
Hive vs impala vs spark - tuning
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysis
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

How to create a multi tenancy for an interactive data analysis with jupyter hub and ldap

  • 1. How-to create a multi tenancy for an interactive data analysis with JupyterHub & LDAP Spark Cluster + Jupyter + LDAP
  • 2. Introduction With this presentation you should be able to create an architecture for a framework of an interactive data analysis by using a Cloudera Spark Cluster with Kerberos, a Jupyter machine with JupyterHub and authentication via LDAP.
  • 3. Architecture This architecture enables the following: ● Transparent data-science development ● User Impersonation ● Authentication via LDAP ● Upgrades on Cluster won’t affect the developments. ● Controlled access to the data and resources by Kerberos/Sentry. ● Several coding API’s (Scala, R, Python, PySpark, etc…). ● Two layers of security with Kerberos & LDAP
  • 5. Pre-Assumptions 1. Cluster hostname: cm1.localdomain Jupyter hostname: cm3.localdomain 2. Cluster Python version: 3.7.1 3. Cluster Manager: Cloudera Manager 5.12.2 4. Service Yarn & PIP Installed 5. Cluster Authentication Pre-Installed: Kerberos a. Kerberos Realm DOMAIN.COM 6. Chosen IDE: Jupyter 7. JupyterHub Machine Authentication Not-Installed: Kerberos 8. AD Machine Installed with hostname: ad.localdomain 9. Java 1.8 installed in Both Machines 10. Cluster Spark version 2.2.0
  • 6. Anaconda Download and installation su - root wget https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-x86_64.sh chmod +x Anaconda3-2018.12-Linux-x86_64.sh ./Anaconda3-2018.12-Linux-x86_64.sh Note 1: Change with your hostname and domain in the highlighted field. Note 2: Due to the package SudoSpawner - that requires Anaconda be installed with the root user! Note 3: JupyterHub requires Python 3.X, therefore it will be installed Anaconda 3
  • 7. Anaconda Path environment variables export PATH=/opt/anaconda3/bin:$PATH Java environment variables export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64/; Spark environment variables export SPARK_HOME=/opt/spark; export SPARK_MASTER_IP=10.191.38.83; Yarn environment variables export YARN_CONF_DIR=/etc/hadoop/conf Yarn environment variables export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip; export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py; export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python; Note: Change with your values in the highlighted field. Hadoop environment variables export HADOOP_HOME=/etc/hadoop/conf; export HADOOP_CONF_DIR=/etc/hadoop/conf; Hive environment variables export HIVE_HOME=/etc/hadoop/conf;
  • 8. Anaconda Validate installation anaconda-navigator Update Conda (Only if needed) conda update -n base -c defaults conda Start Jupyter Notebook (If non root) jupyter-notebook --ip='10.111.22.333' --port 9001 --debug > /opt/anaconda3/log.txt 2>&1 Start Jupyter Notebook (if root) jupyter-notebook --ip='10.111.22.333' --port 9001 --debug --allow-root > /opt/anaconda3/log.txt 2>&1 Note: it’s only necessary to change the highlighted, ex: for your ip.
  • 9. Jupyter or JupyterHub? JupyterHub it’s a multi-purpose notebook that: ● Manages authentication. ● Spawns single-user notebook on-demand. ● Gives each user a complete notebook server. How to choose?
  • 10. JupyterHub Install JupyterHub Package (with Http-Proxy) conda install -c conda-forge jupyterhub Validate Installation jupyterhub -h Start JupyterHub Server jupyterhub --ip='10.111.22.333' --port 9001 --debug > /opt/anaconda3/log.txt 2>&1 Note: it’s only necessary to change the highlighted, ex: for your ip.
  • 11. JupyterHub With LDAP Install Simple LDAP Authenticator Plugin for JupyterHub conda install -c conda-forge jupyterhub-ldapauthenticator Install SudoSpawner conda install -c conda-forge sudospawner Install Package LDAP to be able to Create Users Locally pip install jupyterhub-ldapcreateusers Generate JupyterHub Config File jupyterhub --generate-config Note 1: it’s only necessary to change the highlighted, ex: for your ip. Note 2: Sudospawner enables JupyterHub to spawn single-user servers without being root
  • 12. JupyterHub With LDAP Configure JupyterHub Config File nano /opt/anaconda3/jupyterhub_config.py import os import pwd import subprocess # Function to Create User Home def create_dir_hook(spawner): if not os.path.exists(os.path.join('/home/', spawner.user.name)): subprocess.call(["sudo", "/sbin/mkhomedir_helper", spawner.user.name]) c.Spawner.pre_spawn_hook = create_dir_hook c.JupyterHub.authenticator_class = 'ldapcreateusers.LocalLDAPCreateUsers' c.LocalLDAPCreateUsers.server_address = 'ad.localdomain' c.LocalLDAPCreateUsers.server_port = 3268 c.LocalLDAPCreateUsers.use_ssl = False c.LocalLDAPCreateUsers.lookup_dn = True # Instructions to Define LDAP Search - Doesn't have in consideration possible group users c.LocalLDAPCreateUsers.bind_dn_template = ['CN={username},DC=ad,DC=localdomain'] c.LocalLDAPCreateUsers.user_search_base = 'DC=ad,DC=localdomain'
  • 13. JupyterHub With LDAP c.LocalLDAPCreateUsers.lookup_dn_search_user = 'admin' c.LocalLDAPCreateUsers.lookup_dn_search_password = 'passWord' c.LocalLDAPCreateUsers.lookup_dn_user_dn_attribute = 'CN' c.LocalLDAPCreateUsers.user_attribute = 'sAMAccountName' c.LocalLDAPCreateUsers.escape_userdn = False c.JupyterHub.hub_ip = '10.111.22.333’ c.JupyterHub.port = 9001 # Instructions Required to Add User Home c.LocalAuthenticator.add_user_cmd = ['useradd', '-m'] c.LocalLDAPCreateUsers.create_system_users = True c.Spawner.debug = True c.Spawner.default_url = 'tree/home/{username}' c.Spawner.notebook_dir = '/' c.PAMAuthenticator.open_sessions = True Start JupyterHub Server With Config File jupyterhub -f /opt/anaconda3/jupyterhub_config.py --debug Note: it’s only necessary to change the highlighted, ex: for your ip.
  • 14. JupyterHub with LDAP + ProxyUser Has a reminder, to have ProxyUser working, you will require on both Machines (Cluster and JupyterHub): Java 1.8 and same Spark version, for this example it will be used the 2.2.0. [Cluster] Confirm Cluster Spark & Hadoop Version spark-shell hadoop version [JupyterHub] Download Spark & Create Symbolic link cd /tmp/ wget https://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.6.tgz tar zxvf spark-2.2.0-bin-hadoop2.6.tgz mv spark-2.2.0-bin-hadoop2.6 /opt/spark-2.2.0 ln -s /opt/spark-2.2.0 /opt/spark Note: change with your Spark and Hadoop version in the highlighted field.
  • 15. Jupyter Hub with LDAP + ProxyUser [Cluster] Copy Hadoop/Hive/Spark Config files cd /etc/spark2/conf.cloudera.spark2_on_yarn/ scp * root@10.111.22.333:/etc/hadoop/conf/ [Cluster] HDFS ProxyUser Note: change with your IP and directory’s in the highlighted field. [JupyterHub] Create hadoop config files directory mkdir -p /etc/hadoop/conf/ ln -s /etc/hadoop/conf/ conf.cloudera.yarn [JupyterHub] Create spark-events directory mkdir /tmp/spark-events chown spark:spark spark-events chmod 777 /tmp/spark-events [JupyterHub] Test Spark 2 spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 --deploy-mode cluster --proxy-user tpsimoes --keytab /root/jupyter.keytab --conf spark.eventLog.enabled=true /opt/spark-2.2.0/examples/jars/spark-examples_2.11-2.2.0.jar 10;
  • 16. Check available kernel specs jupyter kernelspec list Install PySpark Kernel conda install -c conda-forge pyspark Confirm kernel installation jupyter kernelspec list Edit PySpark kernel nano /opt/anaconda3/share/jupyter/kernels/pyspark/kernel.json {"argv": ["/opt/anaconda3/share/jupyter/kernels/pyspark/python.sh", "-f", "{connection_file}"], "display_name": "PySpark (Spark 2.2.0)", "language":"python" } Create PySpark Script cd /opt/anaconda3/share/jupyter/kernels/pyspark; touch python.sh; chmod a+x python.sh; Jupyter Hub with LDAP + ProxyUser
  • 17. Jupyter Hub with LDAP + ProxyUser The python.sh script was elaborated due to the limitations on JupyterHub Kernel configurations that isn't able to get the Kerberos Credentials and also due to LDAP package that doesn't allow the proxyUser has is possible with Zeppelin. Therefore with this architecture solution you are able to: ● Add a new step of security, that requires the IDE keytab ● Enable the usage of proxyUser by using the flag from spark --proxy-user ${KERNEL_USERNAME} Edit PySpark Script touch /opt/anaconda3/share/jupyter/kernels/pyspark/python.sh; nano /opt/anaconda3/share/jupyter/kernels/pyspark/python.sh; # !/usr/bin/env bash # setup environment variable, etc. PROXY_USER="$(whoami)" export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64 export SPARK_HOME=/opt/spark export SPARK_MASTER_IP=10.111.22.333 export HADOOP_HOME=/etc/hadoop/conf
  • 18. Jupyter Hub with LDAP + ProxyUser Edit PySpark Script export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf export HIVE_HOME=/etc/hadoop/conf export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python export PYSPARK_SUBMIT_ARGS="-v --master yarn --deploy-mode client --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --num-executors 2 --driver-memory 1024m --executor-memory 1024m --executor-cores 2 --proxy-user "${PROXY_USER}" --keytab /tmp/jupyter.keytab pyspark-shell" # Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS kinit -kt /tmp/jupyter.keytab jupyter/cm1.localdomain@DOMAIN.COM # run the ipykernel exec /opt/anaconda3/bin/python -m ipykernel $@ Note: change with your IP and directories in the highlighted field.
  • 20. To use JupyterLab without it being the default interface, you just have to swap on your browser url the “tree” with Lab! http://10.111.22.333:9001/user/tpsimoes/lab JupyterLab JupyterLab it’s the next-generation web-based interface for Jupyter. Install JupyterLab conda install -c conda-forge jupyterlab Install JupyterLab Launcher conda install -c conda-forge jupyterlab_launcher
  • 21. JupyterLab To be able to use the JupyterLab interface as default on Jupyter it requires additional changes. ● Change the JupyterHub Config File ● Additional extensions (for the Hub Menu) ● Create config file for JupyterLab Edit PySpark Script nano /opt/anaconda3/jupyterhub_config.py ... # Change the values on this Flags c.Spawner.default_url = '/lab' c.Spawner.notebook_dir = '/home/{username}' # Add this Flag c.Spawner.cmd = ['jupyter-labhub']
  • 22. JupyterLab Install jupyterlab-hub extension jupyter labextension install @jupyterlab/hub-extension Create JupyterLab Config File cd /opt/anaconda3/share/jupyter/lab/settings/ nano page_config.json { "hub_prefix": "/jupyter" }
  • 24. R, Hive and Impala on JupyterHub On this section the focus will reside on R, Hive, Impala and Kerberized Kernel. With R Kernel, it requires libs on both Machines (Cluster and Jupyter) [Cluster & Jupyter] Install R Libs yum install -y openssl-devel openssl libcurl-devel libssh2-devel [Jupyter] Create SymLinks for R libs ln -s /opt/anaconda3/lib/libssl.so.1.0.0 /usr/lib64/libssl.so.1.0.0; ln -s /opt/anaconda3/lib/libcrypto.so.1.0.0 /usr/lib64/libcrypto.so.1.0.0; [Cluster & Jupyter] To use SparkR devtools::install_github('apache/spark@v2.2.0', subdir='R/pkg') Note: Change with your values in the highlighted field. [Cluster & Jupyter] Start R & Install Packages R install.packages('git2r') install.packages('devtools') install.packages('repr') install.packages('IRdisplay') install.packages('crayon') install.packages('pbdZMQ')
  • 25. R, Hive and Impala on JupyterHub To interact with Hive metadata and the direct use of the sintax, the my recommendation is the HiveQL. Install Developer Toolset Libs yum install cyrus-sasl-devel.x86_64 cyrus-sasl-gssapi.x86_64 cyrus-sasl-sql.x86_64 cyrus-sasl-plain.x86_64 gcc-c++ Install Python + Hive interface (SQLAlchemy interface for Hive) pip install pyhive Install HiveQL Kernel pip install --upgrade hiveqlKernel jupyter hiveql install Confirm HiveQL Kernel installation jupyter kernelspec list
  • 26. R, Hive and Impala on JupyterHub Edit HiveQL Kernel cd /usr/local/share/jupyter/kernels/hiveql nano kernel.json {"argv": ["/usr/local/share/jupyter/kernels/hiveql/hiveql.sh", "-f", "{connection_file}"], "display_name": "HiveQL", "language": "hiveql", "name": "hiveql"} Create and Edit HiveQL script touch /opt/anaconda3/share/jupyter/kernels/hiveql/hiveql.sh; nano /opt/anaconda3/share/jupyter/kernels/hiveql/hiveql.sh; # !/usr/bin/env bash # setup environment variable, etc. PROXY_USER="$(whoami)"
  • 27. R, Hive and Impala on JupyterHub Edit HiveQL script export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64 export SPARK_HOME=/opt/spark export HADOOP_HOME=/etc/hadoop/conf export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf export HIVE_HOME=/etc/hadoop/conf export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python export HIVE_AUX_JARS_PATH=/etc/hadoop/postgresql-9.0-801.jdbc4.jar export HADOOP_CLIENT_OPTS="-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true" # Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS kinit -kt /tmp/jupyter.keytab jupyter/cm1.localdomain@DOMAIN.COM # run the ipykernel exec /opt/anaconda3/bin/python -m ipykernel $@ Note 1: change with your IP. directories and versions in the highlighted field. Note 2: add your users keytab to a chosen directory so that is possible to run with proxyuser
  • 28. R, Hive and Impala on JupyterHub To interact with Impala metadata, my recommendation is the Impyla, but there’s a catch, because due to a specific version of a lib (thrift_sasl), the HiveQL will stop working, because hiveqlkernel 1.0.13 has the requirement thrift-sasl==0.3.*. Install Developer Toolset Libs yum install cyrus-sasl-devel.x86_64 cyrus-sasl-gssapi.x86_64 cyrus-sasl-sql.x86_64 cyrus-sasl-plain.x86_64 gcc-c++ Install additional Libs for Impyla pip install thrift_sasl==0.2.1: pip install sasl; Install ipython-sql conda install -c conda-forge ipython-sql Install impyla pip install impyla==0.15a1 Note: it was installed a alfa version for impyla due to an incompatibility with python versions superior to 3.7.
  • 29. R, Hive and Impala on JupyterHub If you require to have access to Hive & Impala metadata, you can use Python + Hive with a kerberized custom kernel. Install Jaydebeapi package conda install -c conda-forge jaydebeapi Create Python Kerberized Kernel mkdir -p /usr/share/jupyter/kernels/pythonKerb cd /usr/share/jupyter/kernels/pythonKerb touch kernel.json touch pythonKerb.sh chmod a+x /usr/share/jupyter/kernels/pythonKerb/pythonKerb.sh Note: Change with your values in the highlighted field. Edit Kerberized Kernel nano /usr/share/jupyter/kernels/kernel.json {"argv": ["/usr/local/share/jupyter/kernels/pythonKerb/pythonKerb.sh ", "-f", "{connection_file}"], "display_name": "PythonKerberized", "language": "python", "name": "pythonKerb"} Edit Kerberized Kernel script nano /usr/share/jupyter/kernels/pythonKerb/pythonKerb.sh
  • 30. R, Hive and Impala on JupyterHub Edit Kerberized Kernel script PROXY_USER="$(whoami)" export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64 export SPARK_HOME=/opt/spark export HADOOP_HOME=/etc/hadoop/conf export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf export HIVE_HOME=/etc/hadoop/conf export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python export HIVE_AUX_JARS_PATH=/etc/hadoop/postgresql-9.0-801.jdbc4.jar export HADOOP_CLIENT_OPTS="-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true" export CLASSPATH=$CLASSPATH:`hadoop classpath`:/etc/hadoop/*:/tmp/* export PYTHONPATH=$PYTHONPATH:/opt/anaconda3/lib/python3.7/site-packages/jaydebeapi # Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS kinit -kt /tmp/${PROXY_USER}.keytab ${PROXY_USER}@DOMAIN.COM # run the ipykernel exec /opt/anaconda3/bin/python -m ipykernel_launcher $@
  • 31. R, Hive and Impala on JupyterHub Assuming that you don't have Impyla installed, or if so, you have created an environment for it! HiveQL it’s the best Kernel to access to hive metadata and it has support. Install Developer Toolset Libs yum install cyrus-sasl-devel.x86_64 cyrus-sasl-gssapi.x86_64 cyrus-sasl-sql.x86_64 cyrus-sasl-plain.x86_64 gcc-c++ Install Hive interface & HiveQL Kernel pip install pyhive; pip install --upgrade hiveqlKernel; Jupyter Install Kernel jupyter hiveql install Check kernel installation jupyter kernelspec list
  • 32. R, Hive and Impala on JupyterHub To access to a kerberized Cluster you will require a Kerberos Ticket in cache, therefore the solution will be the following: Edit Kerberized Kernel nano /usr/local/share/jupyter/kernels/hiveql/kernel.json {"argv": ["/usr/local/share/jupyter/kernels/hiveql/hiveql.sh", "-f", "{connection_file}"], "display_name": "HiveQL", "language": "hiveql", "name": "hiveql"} Edit Kerberized Kernel script touch /usr/local/share/jupyter/kernels/hiveql/hiveql.sh nano /usr/local/share/jupyter/kernels/hiveql/hiveql.sh Note: Change with your values in the highlighted field.
  • 33. R, Hive and Impala on JupyterHub Edit Kerberized Kernel script PROXY_USER="$(whoami)" export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64 export SPARK_HOME=/opt/spark export HADOOP_HOME=/etc/hadoop/conf export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf export HIVE_HOME=/etc/hadoop/conf export PYTHONPATH=/opt/spark-2.2.0/python:/opt/spark-2.2.0/python/lib/py4j-0.10.4-src.zip export PYTHONSTARTUP=/opt/spark-2.2.0/python/pyspark/shell.py export PYSPARK_PYTHON=/usr/src/Python-3.7.1/python export HIVE_AUX_JARS_PATH=/etc/hadoop/postgresql-9.0-801.jdbc4.jar export HADOOP_CLIENT_OPTS="-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true" # Kinit User/Keytab defined por the ProxyUser on the Cluster/HDFS kinit -kt /tmp/${PROXY_USER}.keytab ${PROXY_USER}@DOMAIN.COM # run the ipykernel exec /opt/anaconda3/bin/python -m hiveql $@ Note: Change with your values in the highlighted field.
  • 34. Interact with JupyterHub Kernels The following information will serve as base of knowledge, how to interact with previous configured kernels with a kerberized Cluster. [HiveQL] Create Connection $$ url=hive://hive@cm1.localdomain:10000/ $$ connect_args={"auth": "KERBEROS","kerberos_service_name": "hive"} $$ pool_size=5 $$ max_overflow=10 [Impyla] Create Connection from impala.dbapi import connect conn = connect(host='cm1.localdomain', port=21050, kerberos_service_name='impala', auth_mechanism='GSSAPI') Note: Change with your values in the highlighted field.
  • 35. Interact with JupyterHub Kernels [Impyla] Create Connection via SQLMagic %load_ext sql %config SqlMagic.autocommit=False %sql impala://tpsimoes:welcome1@cm1.localdomain:21050/db?kerberos_service_name=impala&auth_mechanism=GSSAPI [Python] Create Connection import jaydebeapi import pandas as pd conn_hive = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver","jdbc:hive2://cm1.localdomain:10000/db;AuthMech=1;KrbRealm=DOMAIN. COM;KrbHostFQDN=cm1.localdomain;KrbServiceName=hive;KrbAuthType=2") [Python] Kinit Keytab import subprocess result = subprocess.run(['kinit', '-kt','/tmp/tpsimoes.keytab',tpsimoes/cm1.localdomain@DOMAIN.COM'], stdout=subprocess.PIPE) result.stdout Note: Change with your values in the highlighted field.