SlideShare uma empresa Scribd logo
1 de 39
Building Hadoop Based
Big Data Environment
Evans Ye @ TWHUG

2013/12/14
Who am I
• Evans Ye @

• Dumbo Team
• http://dumbointaiwan.blogspot.tw/
12/14/2013

Copyright 2013 Trend Micro Inc.
Agenda
• Building your own Hadoop version
• Hadoop Deployment

• Hadoop release engineering
• The development environment

• Bigtop puppet

12/14/2013

Copyright 2013 Trend Micro Inc.
Why Build our own version
• Add your own patch at any time
– From community perspective, they need to take care about
backward complicity,
which need much more time and effort on it.

• Fetch official patches in to current adopted version
– You may not upgrade your Hadoop version frequently,
But there’s a specific need for that patch.

• Flexibility, Business needed features
12/14/2013

Copyright 2013 Trend Micro Inc.
As a Beginner

12/14/2013

Copyright 2013 Trend Micro Inc.
Build Hadoop Infrastructure

12/14/2013

Copyright 2013 Trend Micro Inc.

What’s your work?
….

12/14/2013

Copyright 2013 Trend Micro Inc.

I thought you just need to
yum install Hadoop.
Brute force
• git clone
• Make some changes

• Builde binary tarball

How to do version control?

core-site.xml
hdfs-site.xml
mapred-site.xml
…

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop

12/14/2013

Copyright 2013 Trend Micro Inc.
How bigtop helps you
• Apache Hadoop App developers:
– Run pseudo-distributed Hadoop cluster to test your code on.

• Vendors:
– Build your own Apache Hadoop distribution, customized from
Apache Bigtop bits.

• Packaging, Deployment, Integration Testing

12/14/2013

Copyright 2013 Trend Micro Inc.
Supported Linux Distro
• Ubuntu 10.10
• CentOS 5/6

• Fedora 18
• Mageia 1

• openSUSE 12.2

12/14/2013

Copyright 2013 Trend Micro Inc.
Build
• Build hadoop-common (see BUILDING.txt)
– hadoop-common$ mvn package –Pdist,docs,src,native -Dtar

• Prepare your src tar in bigtop
• Bigtop$ make hadoop-rpm

12/14/2013

Copyright 2013 Trend Micro Inc.
Hadoop Deployment

12/14/2013

Copyright 2013 Trend Micro Inc.
Configuration files
• Hadoop related config
–
–
–
–
–
–
–
–
–
12/14/2013

core-site.xml
hdfs-site.xml
mapred-site.xml
log4j.properties
hadoop-env.sh
fair-scheduler.xml
rack-topology
hadoop-metrics.properties
taskcontroller.cfg
Copyright 2013 Trend Micro Inc.
Local Directories
• Hadoop related file and directory
– Namenode metadata
• /name/1, /name/2

– Datanode
• /data/1, /data/2 , /data/3 , /data/4

– Tasktracker
• /mapred/1/local, /mapred/2/local

–…

12/14/2013

Copyright 2013 Trend Micro Inc.
More hadoop ecosystem

12/14/2013

Copyright 2013 Trend Micro Inc.
Problems to solve
• Lots of nodes need to be configured
• Less human involved, less mistake made

• Configuration changed quite often
– adjust fair scheduler
– enable/disable short circuit
– try more performance improvement configurations

12/14/2013

Copyright 2013 Trend Micro Inc.
Hadooppet

12/14/2013

Copyright 2013 Trend Micro Inc.
What is puppet ?
• A IT automation tool to help system administrators
automate the many repetitive tasks
• You need to only define the desired state

12/14/2013

Copyright 2013 Trend Micro Inc.
What is Hadooppet ?
• A general hadoop cluster deployment tool based on
puppet
• Kerberos / ldap auto configured
• A set of hadoop / kerberos management tool
• A set of sanity check scripts for trend hadoop related
services
• Manage configuration on puppetmaster

12/14/2013

Copyright 2013 Trend Micro Inc.
Design
• Abstract environment specific configurations in a single
configuration file
• setup.sh
–
–
–
–
–
–

12/14/2013

namenode_fqdns=(“dev1.example.com” “dev2.example.com”)
namenode_dirs=(“/name/1” “/name/2”)
namenode_heap=32g
map_slots=5
reduce_slots=3
…

Copyright 2013 Trend Micro Inc.
Benifits
• Can be used to setup any kind of hadoop cluster
• When doing main version upgarade, minimal the
downtime
– hadoop1  hadoop2
Namenode
Secondarynamenode

12/14/2013

Copyright 2013 Trend Micro Inc.

Active/Standby Namenode
Journalnodes
ZKFC
Release Engineering

12/14/2013

Copyright 2013 Trend Micro Inc.
Manually
• Build src tarball in hadoop-common
• Build rpms in bigtop

• submit build to release yum repo
• yum update on hadoop cluster…

12/14/2013

Copyright 2013 Trend Micro Inc.
Continuous Integration
• Setup hadoop-common daily build
• Setup Bigtop release Build
– should be manually triggered

• Setup Hadooppet daily build
– Run sanity checks on a REAL CLUSTER

12/14/2013

Copyright 2013 Trend Micro Inc.
Virtualization
• Build a Xen Server Cluster

12/14/2013

Copyright 2013 Trend Micro Inc.
12/14/2013

Copyright 2013 Trend Micro Inc.
give-me-vm
• Pycon 2012
– Small Python Tools for Software Release Engineering

• An automation tool to manage
VM lifecycle
• Use Python XenAPI

• Create temporary VM for testing
by self service
• Destroy it when the testing
is finished
12/14/2013

Copyright 2013 Trend Micro Inc.
Build auto deployment on Hadooppet
• ./give_me_vm.py
• setup passphraseless ssh between each VM

• set hostname
• Install Hadooppet on master

• run deployment
• run sanity checks
• ./destroy_vm.py
12/14/2013

Copyright 2013 Trend Micro Inc.
12/14/2013

Copyright 2013 Trend Micro Inc.
Development
Environment

12/14/2013

Copyright 2013 Trend Micro Inc.
For hadoop service developers…
• No enough hadoop client for each developers
• Developer can not reach server side while developing
hadoop related services
• Can not experiment new technology like impala spark
flume

• CI on Hadoop related services

12/14/2013

Copyright 2013 Trend Micro Inc.
give-me-vm + Hadoop all-in-one VM
• Use Hadooppet to setup a peudo-distributed hadoop
VM as Xenserver template

• get a Hadoop all-in-one VM via give-me-vm
• Services integrate its CI test with hadoop all-in-one VM

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop
puppet

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop puppet
• Bigtop also has a set of puppet scripts to deploy
Hadoop ecosystem

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop puppet
• Preparation:
– A VM with jdk, puppet installed
– mkdir –p /data/{1,2}
– git clone https://github.com/apache/bigtop.git

12/14/2013

Copyright 2013 Trend Micro Inc.
Conclusion
• There’re many great deployment tool exist
– Ambari, CM, ETU appliance
– Choose suitable distribution by your business need

• If you want to do it by yourself
– Bigtop can do packaging for you easily
– Leverage bigtop puppet module for your deployment

12/14/2013

Copyright 2013 Trend Micro Inc.
Questions?
Thank you !

Mais conteúdo relacionado

Mais procurados

Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningDataWorks Summit
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...In-Memory Computing Summit
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaJazz Yao-Tsung Wang
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorialmarpierc
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingAll Things Open
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Akihiro Suda
 

Mais procurados (19)

Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and Grafana
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Introduction to HCFS
Introduction to HCFSIntroduction to HCFS
Introduction to HCFS
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
London level39
London level39London level39
London level39
 
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its BestKeep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
SparkFramework
SparkFrameworkSparkFramework
SparkFramework
 
Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)
 

Destaque

Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesGregg Barrett
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Coastal Pet Products, Inc.
 
Generating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentGenerating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentDavid Wallom
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityNUS-ISS
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data scienceMahesh Kumar CV
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBlue Coat
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility SectorHerman Bosker
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneInnovative Management Services
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteThe Hive
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and HadoopKai Zheng
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy IndustryPaige Bailey
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Mr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricMr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricRohan Pinto
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 

Destaque (20)

Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 
Generating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentGenerating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the Environment
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise Agility
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility Sector
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom White
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Mr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricMr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electric
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 

Semelhante a Building hadoop based big data environment

How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competitionEvans Ye
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingAcquia
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
Spark forspringdevs springone_final
Spark forspringdevs springone_finalSpark forspringdevs springone_final
Spark forspringdevs springone_finalsdeeg
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Inside Analysis
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsMilind Bhandarkar
 
14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...Swiss Big Data User Group
 
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016Paul McKibben
 
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018hernanibf
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8Phase2
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuiteEDB
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 

Semelhante a Building hadoop based big data environment (20)

How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: Launching
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Spark forspringdevs springone_final
Spark forspringdevs springone_finalSpark forspringdevs springone_final
Spark forspringdevs springone_final
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
 
14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...
 
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
 
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Drupal in-depth
Drupal in-depthDrupal in-depth
Drupal in-depth
 
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
HugNov14
HugNov14HugNov14
HugNov14
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 

Mais de Evans Ye

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfEvans Ye
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽Evans Ye
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations publicEvans Ye
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessEvans Ye
 
The Apache Way
The Apache WayThe Apache Way
The Apache WayEvans Ye
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisionerEvans Ye
 
Docker workshop
Docker workshopDocker workshop
Docker workshopEvans Ye
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devopsEvans Ye
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaEvans Ye
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseEvans Ye
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineerEvans Ye
 

Mais de Evans Ye (18)

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
 
Docker workshop
Docker workshopDocker workshop
Docker workshop
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devops
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBase
 
Vagrant
VagrantVagrant
Vagrant
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineer
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Building hadoop based big data environment

  • 1. Building Hadoop Based Big Data Environment Evans Ye @ TWHUG 2013/12/14
  • 2. Who am I • Evans Ye @ • Dumbo Team • http://dumbointaiwan.blogspot.tw/ 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 3. Agenda • Building your own Hadoop version • Hadoop Deployment • Hadoop release engineering • The development environment • Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 4. Why Build our own version • Add your own patch at any time – From community perspective, they need to take care about backward complicity, which need much more time and effort on it. • Fetch official patches in to current adopted version – You may not upgrade your Hadoop version frequently, But there’s a specific need for that patch. • Flexibility, Business needed features 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 5. As a Beginner 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 6. Build Hadoop Infrastructure 12/14/2013 Copyright 2013 Trend Micro Inc. What’s your work?
  • 7. …. 12/14/2013 Copyright 2013 Trend Micro Inc. I thought you just need to yum install Hadoop.
  • 8. Brute force • git clone • Make some changes • Builde binary tarball How to do version control? core-site.xml hdfs-site.xml mapred-site.xml … 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 10. How bigtop helps you • Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on. • Vendors: – Build your own Apache Hadoop distribution, customized from Apache Bigtop bits. • Packaging, Deployment, Integration Testing 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 11. Supported Linux Distro • Ubuntu 10.10 • CentOS 5/6 • Fedora 18 • Mageia 1 • openSUSE 12.2 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 12. Build • Build hadoop-common (see BUILDING.txt) – hadoop-common$ mvn package –Pdist,docs,src,native -Dtar • Prepare your src tar in bigtop • Bigtop$ make hadoop-rpm 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 14. Configuration files • Hadoop related config – – – – – – – – – 12/14/2013 core-site.xml hdfs-site.xml mapred-site.xml log4j.properties hadoop-env.sh fair-scheduler.xml rack-topology hadoop-metrics.properties taskcontroller.cfg Copyright 2013 Trend Micro Inc.
  • 15. Local Directories • Hadoop related file and directory – Namenode metadata • /name/1, /name/2 – Datanode • /data/1, /data/2 , /data/3 , /data/4 – Tasktracker • /mapred/1/local, /mapred/2/local –… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 17. Problems to solve • Lots of nodes need to be configured • Less human involved, less mistake made • Configuration changed quite often – adjust fair scheduler – enable/disable short circuit – try more performance improvement configurations 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 19. What is puppet ? • A IT automation tool to help system administrators automate the many repetitive tasks • You need to only define the desired state 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 20. What is Hadooppet ? • A general hadoop cluster deployment tool based on puppet • Kerberos / ldap auto configured • A set of hadoop / kerberos management tool • A set of sanity check scripts for trend hadoop related services • Manage configuration on puppetmaster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 21. Design • Abstract environment specific configurations in a single configuration file • setup.sh – – – – – – 12/14/2013 namenode_fqdns=(“dev1.example.com” “dev2.example.com”) namenode_dirs=(“/name/1” “/name/2”) namenode_heap=32g map_slots=5 reduce_slots=3 … Copyright 2013 Trend Micro Inc.
  • 22. Benifits • Can be used to setup any kind of hadoop cluster • When doing main version upgarade, minimal the downtime – hadoop1  hadoop2 Namenode Secondarynamenode 12/14/2013 Copyright 2013 Trend Micro Inc. Active/Standby Namenode Journalnodes ZKFC
  • 24. Manually • Build src tarball in hadoop-common • Build rpms in bigtop • submit build to release yum repo • yum update on hadoop cluster… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 25. Continuous Integration • Setup hadoop-common daily build • Setup Bigtop release Build – should be manually triggered • Setup Hadooppet daily build – Run sanity checks on a REAL CLUSTER 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 26. Virtualization • Build a Xen Server Cluster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 28. give-me-vm • Pycon 2012 – Small Python Tools for Software Release Engineering • An automation tool to manage VM lifecycle • Use Python XenAPI • Create temporary VM for testing by self service • Destroy it when the testing is finished 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 29. Build auto deployment on Hadooppet • ./give_me_vm.py • setup passphraseless ssh between each VM • set hostname • Install Hadooppet on master • run deployment • run sanity checks • ./destroy_vm.py 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 32. For hadoop service developers… • No enough hadoop client for each developers • Developer can not reach server side while developing hadoop related services • Can not experiment new technology like impala spark flume • CI on Hadoop related services 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 33. give-me-vm + Hadoop all-in-one VM • Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template • get a Hadoop all-in-one VM via give-me-vm • Services integrate its CI test with hadoop all-in-one VM 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 35. Bigtop puppet • Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 36. Bigtop puppet • Preparation: – A VM with jdk, puppet installed – mkdir –p /data/{1,2} – git clone https://github.com/apache/bigtop.git 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 37. Conclusion • There’re many great deployment tool exist – Ambari, CM, ETU appliance – Choose suitable distribution by your business need • If you want to do it by yourself – Bigtop can do packaging for you easily – Leverage bigtop puppet module for your deployment 12/14/2013 Copyright 2013 Trend Micro Inc.