SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
LEVERAGING DOCKER FOR
HADOOP BUILD
AUTOMATION
AND 

BIG DATA STACK
PROVISIONING
Evans Ye, Sr. Software Engineer
DataWorks Summit San Jose 2017
Who am I
• Tech Lead @ APAC Data Team, Y! Taiwan
• Building data products for E-Commerce business
• PMC chair of Apache Bigtop, ASF member
2
Outline
• Quick Intro to Apache Bigtop
• Docker for Bigtop Packaging
• Docker for Bigtop Provisioner
• Docker for Bigtop Sandbox
• Releases
3
QUICK INTRO TO 

APACHE BIGTOP
4
Linux Distributions
5
Hadoop Distributions
6
But there're some other great
Hadoop ecosystem components..
7
How do I add patches?
8
9
From source code to packages
Bigtop

Packaging
10
Supported components
11
Bigtop feature set
Packaging Testing Deployment Virtualization
for you to easily build your own Big Data Stack
12
Community stats
• 94 total contributors
• Spark: 1093, Hadoop: 99, HBase: 126, Hive:115
• 5 years since 2012
• 30 Hadoop ecosystem components packaged
• 5 Linux Distro., 2 archs supported
13
DOCKER FOR 

BIGTOP PACKAGING
14
Preparing build environment
15
Preparing build environment
…

Seriously ?
16
Bigtop Toolchain
• Puppet recipes to install required libraries, build tools
• To prepare a build environment:
• Prerequisite :
▪ Java
git clone https://github.com/apache/bigtop.git
cd bigtop
./bigtop_toolchain/bin/puppetize.sh
./gradlew toolchain
17
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
18
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
19
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
20
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
21
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
22
• Execute shell
• Bigtop CI Setup Guide
How to build packages
# OS=debian-8
# COMPONENT=hadoop
docker run -u jenkins --rm 
-v `pwd`:/bigtop --workdir /bigtop 
bigtop/slaves:trunk-$OS 
bash -l -c "./gradlew allclean $COMPONENT-pkg"
23
Bigtop packages on master
https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/
24
• Example: How to port Bigtop Distribution to PPC64LE?
• Prepare PPC64LE docker base image
• Apply Bigtop Toolchain on PPC64LE docker image
• Build Bigtop packages on PPC64LE slaves image
• 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches
• Credit: Amir Sanjar, IBM
Extremely friendly for porting
25
Bigtop early mission accomplished
Leveraged by app providers…
26
Get out from the Apache dome
27
New focus and target user
• Data engineers vs Distro. builders
• Solution diversity:
▪ Streaming: Flink, Apex
▪ In-memory cache: Alluxio, Ignite
▪ User/developer tools:
▪ Bigtop Provisioner
▪ Bigtop Sandbox
• Big data stack references
• Machine learning, deep learning components
28
DOCKER FOR 

BIGTOP PROVISIONER
29
Bigtop Provisioner
• A tool to demonstrate full life cycle of Bigtop
Packaging TestingDeploymentVirtualization
Create resources Run Bigtop Puppet Run Bigtop Tests
Bigtop Provisioner
30
• We use Vagrant as an abstraction layer to support
different kind of resource providers
Vagrant
Providers
One click Hadoop provisioning

(Bigtop 1.0.0)
bigtop/deploy image 

on Docker hub
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
32
https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
Problems with Vagrant’s Docker Provider
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker VM
• A bug for docker provider regarding provision keeps opening for 2 years
▪ Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers anyway
• Not all the docker options supported in Vagrantfile
• ^#?& slow
33
Replaced by docker-compose 

(Bigtop 1.2.0)
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
34
bigtop/deploy image 

on Docker hub
Advantages
• No need to create customized image beforehand
• Better compatibility with Docker’s native solutions
• Clear, simple yaml file for orchestration settings
• Supports new features such as overlay network
• Leverage Swarm for multi-node cluster deployment
• Fast —> better user experience
35
• Execute shell
• Bigtop CI Setup Guide
How to run Docker Provisioner
# See bigtop/provisioner/docker/*.yaml
CONFIG=YOUR_CUSTOM_CONF.yaml
# provision
./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 
docker-provisioner
# destroy provisioned cluster
./gradlew docker-provisioner-destroy
36
YOUR_CUSTOM_CONF.yaml example
37
docker:
memory_limit: "4g"
image: "bigtop/puppet:centos-7"
repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/
centos/7/x86_64"
distro: centos
components: [hdfs, yarn, mapreduce]
enable_local_repo: false
smoke_test_components: [hdfs, yarn, mapreduce]
38
Visibility for deployments
38
Use cases
• For application developers, cluster admins, users
▪ Run a Hadoop cluster to test your code on
▪ Try & test configurations before applying to Production
▪ Play around with Bigtop Big Data Stacks
• For contributors
▪ Easy to test your packaging, deployment, testing code
• For Distro. builders
▪ CI matrix —> patch upstream code made easier
39
DOCKER FOR
BIGTOP SANDBOX
40
Introducing Bigtop Sandbox
• Easy way to get started
• Docker images that has Bigtop stacks installed and
configured
• Pseudo cluster up & running w/o installation
• Command-line tool for you to build your own stack
41
Docker image layer
Interface
Customized	big	data	stack
Deployment	&	management	tool
Base	image	(OS)
42
Docker image layer
Concrete implementation
HDFS	+	YARN	+	Spark
Bigtop	Puppet
bigtop/puppet:ubuntu-16.04
43
Building images
Ubuntu	16.04
Bigtop	Puppet
HDFS	+	YARN	+	Spark
+ site.yaml
$ puppet apply
44
site.yaml example
45
bigtop::hadoop_head_node: bigtop.example.com
bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/
releases/1.2.0/debian/8/x86_64
hadoop::hadoop_storage_dirs: [/data/1, /data/2]
hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
How to build
• Or specify your custom conf:
git clone https://github.com/apache/bigtop.git
cd bigtop/docker/sandbox
./build.sh -a bigtop -o ubuntu-16.04 
-c "hdfs, yarn, spark"
./build.sh-a bigtop -o ubuntu-16.04 
-f custom_site.yaml -t dws2017
46
Running images
HDFS	+	YARN	+	Spark
$ puppet apply
47
How to run
docker run --name sandbox -d 
-p 50070:50070 -p 8088:8088 
evansye/sandbox:dws2017
docker logs -f sandbox
docker exec sandbox spark-example SparkPi
48
49
Bigtop Provisioner Bigtop Sandbox
Scalable V X
Portable X V
Flexibility High Medium
Speed > 2 mins > 15 secs
Requires Network V X
Port forwarding X V
50
Bigtop Provisioner Bigtop Sandbox
Data engineers
Multi-node 

cluster testing
Build/use
sandboxes 

for dev & test
Ops
Multi-node 

cluster testing
Single node 

testing
Contributors
Test packages,
puppet recipes,

test cases
Test packages,
puppet recipes,

test cases
Distro. Builders
Test packages,
puppet recipes,

test cases
Provide Sandboxes
51
Integration test in CI/CD pipeline
Unit	
Test	
Source	
code		
Compile	
	
Build	
Image	
Integra7on	test	with	
Sandbox	
Sandbox	Service	
CD	pipeline	with	Bigtop	Sandbox	
Docker	Registry	
Push	
Image	
Deploy	
	
FINISHED	
	
Data	
52
Future
• Production deployment using Sandbox images
▪ --net host or overlay network(SDN)?
▪ External volumes for edit logs, fsimages, etc
▪ Cluster orchestration
▪ Swarm, Kubernetes?
53
RELEASES
54
▪ New components:
▪ Ambari 2.5.0
▪ GPDB 5.0.0-alpha.0

(Greenplum)
Bigtop 1.2.0 Released April, 2017
▪ Featured upgrade:
▪ Hadoop 2.7.3
▪ Spark 2.1.0
▪ Kafka 0.10.1.1
▪ HBase 1.1.3
▪ and more
55
• New features:
▪ Juju bigtop charms
▪ Bigtop Sandbox (alpha, recommended to try master)
• Improvement:
▪ Bigtop Docker Provisioner made faster
New features in Bigtop 1.2.0
56
Juju Cloud Weather Report
http://bigtop.charm.qa/
57
• Expected to be out late June
• Hadoop 2.7.4 

(Interested in docker container support back ported, but I'm not sure yet)
• Mainly bug fixes:
• Packages
• Deployments
• Sandbox
Bigtop 1.2.1 up coming
58
• Machine Learning and Deep Learning integration
• Support aarch 64
• Enhance support set in Bigtop Puppet (not all components covered)
• Extend the CI matrix coverage to Bigtop Tests
• Ambari Bigtop stack integration
• Provide Big data stack references
Road ahead towards 1.3.0
59
60
• Submit your proposal, contribute Bigtop w/ funding!
• Improvements, new features, build, test, CI, etc
• CFP opened June 13, 2017

CFP closed July 14, 2017
• https://www.odpi.org/community/bigtopgrantfund
ODPi Apache Bigtop Test Drive Program
61
• Join mailing list, ask questions, suggest features, etc
• Contribute (components, tutorials, docs)
• Report bugs
▪ Home page: http://bigtop.apache.org/
▪ mailing list: http://bigtop.apache.org/mail-lists.html
▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index
▪ Source code: https://github.com/apache/bigtop
▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP
Reference
62
63
Questions?

Mais conteúdo relacionado

Mais procurados

Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaOCoderFest
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101Weaveworks
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide trainingSpark Summit
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNBelmiro Moreira
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
Docker Swarm for Beginner
Docker Swarm for BeginnerDocker Swarm for Beginner
Docker Swarm for BeginnerShahzad Masud
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Mais procurados (20)

Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101
 
kafka
kafkakafka
kafka
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Apache spark
Apache sparkApache spark
Apache spark
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Docker Swarm for Beginner
Docker Swarm for BeginnerDocker Swarm for Beginner
Docker Swarm for Beginner
 
6.hive
6.hive6.hive
6.hive
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a Leveraging Docker for Hadoop build automation and Big Data stack provisioning

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps WorkshopWeaveworks
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_videoPatrick Galbraith
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformrhatr
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024Cloud Native NoVA
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxGrace Jansen
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpNathan Handler
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Dockernklmish
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxGrace Jansen
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumWeaveworks
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containersRed Hat Developers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Gibson Fahnestock
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...Ambassador Labs
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloSUSE España
 

Semelhante a Leveraging Docker for Hadoop build automation and Big Data stack provisioning (20)

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptx
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and Cilium
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Leveraging Docker for Hadoop build automation and Big Data stack provisioning

  • 1. LEVERAGING DOCKER FOR HADOOP BUILD AUTOMATION AND 
 BIG DATA STACK PROVISIONING Evans Ye, Sr. Software Engineer DataWorks Summit San Jose 2017
  • 2. Who am I • Tech Lead @ APAC Data Team, Y! Taiwan • Building data products for E-Commerce business • PMC chair of Apache Bigtop, ASF member 2
  • 3. Outline • Quick Intro to Apache Bigtop • Docker for Bigtop Packaging • Docker for Bigtop Provisioner • Docker for Bigtop Sandbox • Releases 3
  • 4. QUICK INTRO TO 
 APACHE BIGTOP 4
  • 7. But there're some other great Hadoop ecosystem components.. 7
  • 8. How do I add patches? 8
  • 9. 9
  • 10. From source code to packages Bigtop
 Packaging 10
  • 12. Bigtop feature set Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack 12
  • 13. Community stats • 94 total contributors • Spark: 1093, Hadoop: 99, HBase: 126, Hive:115 • 5 years since 2012 • 30 Hadoop ecosystem components packaged • 5 Linux Distro., 2 archs supported 13
  • 14. DOCKER FOR 
 BIGTOP PACKAGING 14
  • 17. Bigtop Toolchain • Puppet recipes to install required libraries, build tools • To prepare a build environment: • Prerequisite : ▪ Java git clone https://github.com/apache/bigtop.git cd bigtop ./bigtop_toolchain/bin/puppetize.sh ./gradlew toolchain 17
  • 18. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave 18
  • 19. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 19
  • 20. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 20
  • 21. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 21
  • 22. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 22
  • 23. • Execute shell • Bigtop CI Setup Guide How to build packages # OS=debian-8 # COMPONENT=hadoop docker run -u jenkins --rm -v `pwd`:/bigtop --workdir /bigtop bigtop/slaves:trunk-$OS bash -l -c "./gradlew allclean $COMPONENT-pkg" 23
  • 24. Bigtop packages on master https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/ 24
  • 25. • Example: How to port Bigtop Distribution to PPC64LE? • Prepare PPC64LE docker base image • Apply Bigtop Toolchain on PPC64LE docker image • Build Bigtop packages on PPC64LE slaves image • 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches • Credit: Amir Sanjar, IBM Extremely friendly for porting 25
  • 26. Bigtop early mission accomplished Leveraged by app providers… 26
  • 27. Get out from the Apache dome 27
  • 28. New focus and target user • Data engineers vs Distro. builders • Solution diversity: ▪ Streaming: Flink, Apex ▪ In-memory cache: Alluxio, Ignite ▪ User/developer tools: ▪ Bigtop Provisioner ▪ Bigtop Sandbox • Big data stack references • Machine learning, deep learning components 28
  • 29. DOCKER FOR 
 BIGTOP PROVISIONER 29
  • 30. Bigtop Provisioner • A tool to demonstrate full life cycle of Bigtop Packaging TestingDeploymentVirtualization Create resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner 30
  • 31. • We use Vagrant as an abstraction layer to support different kind of resource providers Vagrant Providers
  • 32. One click Hadoop provisioning
 (Bigtop 1.0.0) bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 32 https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
  • 33. Problems with Vagrant’s Docker Provider • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker VM • A bug for docker provider regarding provision keeps opening for 2 years ▪ Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • ^#?& slow 33
  • 34. Replaced by docker-compose 
 (Bigtop 1.2.0) ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 34 bigtop/deploy image 
 on Docker hub
  • 35. Advantages • No need to create customized image beforehand • Better compatibility with Docker’s native solutions • Clear, simple yaml file for orchestration settings • Supports new features such as overlay network • Leverage Swarm for multi-node cluster deployment • Fast —> better user experience 35
  • 36. • Execute shell • Bigtop CI Setup Guide How to run Docker Provisioner # See bigtop/provisioner/docker/*.yaml CONFIG=YOUR_CUSTOM_CONF.yaml # provision ./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 docker-provisioner # destroy provisioned cluster ./gradlew docker-provisioner-destroy 36
  • 37. YOUR_CUSTOM_CONF.yaml example 37 docker: memory_limit: "4g" image: "bigtop/puppet:centos-7" repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/ centos/7/x86_64" distro: centos components: [hdfs, yarn, mapreduce] enable_local_repo: false smoke_test_components: [hdfs, yarn, mapreduce]
  • 39. Use cases • For application developers, cluster admins, users ▪ Run a Hadoop cluster to test your code on ▪ Try & test configurations before applying to Production ▪ Play around with Bigtop Big Data Stacks • For contributors ▪ Easy to test your packaging, deployment, testing code • For Distro. builders ▪ CI matrix —> patch upstream code made easier 39
  • 41. Introducing Bigtop Sandbox • Easy way to get started • Docker images that has Bigtop stacks installed and configured • Pseudo cluster up & running w/o installation • Command-line tool for you to build your own stack 41
  • 43. Docker image layer Concrete implementation HDFS + YARN + Spark Bigtop Puppet bigtop/puppet:ubuntu-16.04 43
  • 45. site.yaml example 45 bigtop::hadoop_head_node: bigtop.example.com bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/ releases/1.2.0/debian/8/x86_64 hadoop::hadoop_storage_dirs: [/data/1, /data/2] hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
  • 46. How to build • Or specify your custom conf: git clone https://github.com/apache/bigtop.git cd bigtop/docker/sandbox ./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, spark" ./build.sh-a bigtop -o ubuntu-16.04 -f custom_site.yaml -t dws2017 46
  • 48. How to run docker run --name sandbox -d -p 50070:50070 -p 8088:8088 evansye/sandbox:dws2017 docker logs -f sandbox docker exec sandbox spark-example SparkPi 48
  • 49. 49
  • 50. Bigtop Provisioner Bigtop Sandbox Scalable V X Portable X V Flexibility High Medium Speed > 2 mins > 15 secs Requires Network V X Port forwarding X V 50
  • 51. Bigtop Provisioner Bigtop Sandbox Data engineers Multi-node 
 cluster testing Build/use sandboxes 
 for dev & test Ops Multi-node 
 cluster testing Single node 
 testing Contributors Test packages, puppet recipes,
 test cases Test packages, puppet recipes,
 test cases Distro. Builders Test packages, puppet recipes,
 test cases Provide Sandboxes 51
  • 52. Integration test in CI/CD pipeline Unit Test Source code Compile Build Image Integra7on test with Sandbox Sandbox Service CD pipeline with Bigtop Sandbox Docker Registry Push Image Deploy FINISHED Data 52
  • 53. Future • Production deployment using Sandbox images ▪ --net host or overlay network(SDN)? ▪ External volumes for edit logs, fsimages, etc ▪ Cluster orchestration ▪ Swarm, Kubernetes? 53
  • 55. ▪ New components: ▪ Ambari 2.5.0 ▪ GPDB 5.0.0-alpha.0
 (Greenplum) Bigtop 1.2.0 Released April, 2017 ▪ Featured upgrade: ▪ Hadoop 2.7.3 ▪ Spark 2.1.0 ▪ Kafka 0.10.1.1 ▪ HBase 1.1.3 ▪ and more 55
  • 56. • New features: ▪ Juju bigtop charms ▪ Bigtop Sandbox (alpha, recommended to try master) • Improvement: ▪ Bigtop Docker Provisioner made faster New features in Bigtop 1.2.0 56
  • 57. Juju Cloud Weather Report http://bigtop.charm.qa/ 57
  • 58. • Expected to be out late June • Hadoop 2.7.4 
 (Interested in docker container support back ported, but I'm not sure yet) • Mainly bug fixes: • Packages • Deployments • Sandbox Bigtop 1.2.1 up coming 58
  • 59. • Machine Learning and Deep Learning integration • Support aarch 64 • Enhance support set in Bigtop Puppet (not all components covered) • Extend the CI matrix coverage to Bigtop Tests • Ambari Bigtop stack integration • Provide Big data stack references Road ahead towards 1.3.0 59
  • 60. 60
  • 61. • Submit your proposal, contribute Bigtop w/ funding! • Improvements, new features, build, test, CI, etc • CFP opened June 13, 2017
 CFP closed July 14, 2017 • https://www.odpi.org/community/bigtopgrantfund ODPi Apache Bigtop Test Drive Program 61
  • 62. • Join mailing list, ask questions, suggest features, etc • Contribute (components, tutorials, docs) • Report bugs ▪ Home page: http://bigtop.apache.org/ ▪ mailing list: http://bigtop.apache.org/mail-lists.html ▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index ▪ Source code: https://github.com/apache/bigtop ▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ ▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP Reference 62