SlideShare a Scribd company logo
1 of 35
Download to read offline
Continuous Integration
on top of Hadoop
Wisely Chen and Neal Lee
Saturday, August 3, 13
Agenda
• Who I am
• Problem
• Solution
• Demo
• Q&A
Saturday, August 3, 13
Who I am
• Wisely Chen ( thegiive@gmail.com )
• Release manager of Yahoo![Taiwan] shopping and data team
• Loves to promote open source tech in Taiwan
• Hadoop Summit 2013 San Jose
• Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007
• Puppet : PHPConf 2012 , RubyConf 2012
• Release Practice :Webconf 2013, Coscup 2012
Saturday, August 3, 13
Who I am
• Neal Lee (@neal_lee)
• Data Engineer at Yahoo![Taiwan] data team
• Aims to build a easy to use self-service BI platform
connecting to Hadoop.
Saturday, August 3, 13
EC Data Team
拍賣/商城/購物中心
站台
流量/點擊/使用者行為 追蹤
Transactional
data
Tracking data
Data
Highway
Data Warehouse/
Data Mart
Data
Infra BI
Platform
Report
Recommendation
API
Machine
Learning
Serve
Saturday, August 3, 13
Problem : Debug
Saturday, August 3, 13
Problem : Performance
Saturday, August 3, 13
Solution
Saturday, August 3, 13
Continuous Integration
Saturday, August 3, 13
Continuous Integration
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commit to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Saturday, August 3, 13
We focus on
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commits to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Saturday, August 3, 13
CI on Hadoop Flow
Code
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
One Click Deploy
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
Toolset
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Vaidya
BASH
Saturday, August 3, 13
System diagram
CI Master
GitHub
Alpha
CI Slave
Beta Cluster
Hadoop
JobTracker
CI Slave Hadoop
node
Hadoop
node
Hadoop
node
Hadoop
node
Slave
Node
Prod ClusterGateway
Saturday, August 3, 13
Unit Test
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
PigUnit
• A simple xUnit framework
• No cluster set up is required in local mode
• Unit testing, regression testing, and rapid
prototyping on the fly
Saturday, August 3, 13
Using PigUnit
• After
• Coding
• Write PigUnit test case
• Run local PigUnit test
• Push to cluster
• Run Pig on cluster
• Get right result !
• Before
• Coding
• Manual local test
• Push to cluster
• Run Pig on cluster
• Get right result !
Saturday, August 3, 13
Unit test is live doc
• Unit test is runnable live doc
• Pass test case and meet previous
requirement
Saturday, August 3, 13
Flexible
• Pig can use PigUnit
• MapReduce can use MapUnit
• Hive can use hive_test
Saturday, August 3, 13
Performance Test
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
Vaidya
• Rule based performance diagnosis of M/R jobs
• Extensible framework
• You can add your own rules
• Write complex rules using existing rules
Saturday, August 3, 13
Performance Test
Pig Job
Pig Job
History
Vaidya
Vaidya
Rule
4
Pig Job
Conf
Notify
User
3
Performance
result
Next CI
Stage
1
1
2
2
2
5
1. Exec pig job with sampling data on beta server
2. Vaidya read job history,conf,rule
to check performance problem
3. If ok, create performance result
4. If job has performance issue,
notify user
5. Go to next CI stage
Sampling
data
1
Saturday, August 3, 13
Vaidya Rule<Diagnos)cTest>
<Title><![CDATA[Balanaced Reduce Partitioning]]></Title>
	
  <ClassName>
	
  	
  	
  	
  <![CDATA[
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  org.apache.hadoop.vaidya.postexdiagnosis.tests.BalancedReducePar77oning
	
  	
  	
  	
  ]]>
</ClassName>
<Descrip)on>
	
  	
  	
  	
  	
  <![CDATA[This	
  rule	
  tests	
  as	
  to	
  how	
  well	
  the	
  input	
  to	
  reduce	
  tasks	
  is	
  balanced]]>
</Descrip)on>
<Importance><![CDATA[High]]></Importance>
<SuccessThreshold><![CDATA[0.20]]></SuccessThreshold>
<Prescrip)on><![CDATA[advice]]></Prescrip)on>
</Diagnos)cTest>
See	
  if	
  the	
  reduce	
  job	
  is	
  
balance	
  or	
  not	
  
Rule	
  importance
Diagnose	
  success	
  
threshold
Test	
  Java	
  Class
Saturday, August 3, 13
Deploy
Commit
Unit
Test
Performance
Test
Deploy Doc Execution
Saturday, August 3, 13
Deploy
• Deploy to production cluster
• Easy to rollback
• Create a git tag
• Auto doc generating
• Each release should map to a ticket
• Auto comment in Bugzilla
Saturday, August 3, 13
Auto comment in bugzilla
Repo url
Release
Note
Issue status
change
Saturday, August 3, 13
Auto create git tag
Release Note
[Bug xxx] log....
Git Tag
Saturday, August 3, 13
DEMO
Saturday, August 3, 13
Demo
• Demo1 : Unit test fail
• Demo2 : Unit test success
• Demo3 : Check performance test
• Demo4 :Auto generate Doc
• Demo5 : Notify user
Saturday, August 3, 13
Demo
Saturday, August 3, 13
Conclusion
• CI will revolutionize your workflow
• CI will boost your productivity
Saturday, August 3, 13
Saturday, August 3, 13
Logic Debug
• Map/Reduce	
  job	
  oJen	
  takes	
  a	
  lot	
  of	
  )me	
  for	
  execu)on
• Repeated	
  Map/Reduce	
  execu)on	
  cost	
  	
  a	
  lot	
  of	
  )me	
  
during	
  logic	
  debugging	
  phase
• Need	
  a	
  way	
  to	
  find	
  out	
  logic	
  problem	
  before	
  
execu)on	
  produc)on	
  job
• Coding
Manual
Test
Exec
Get Bug
Saturday, August 3, 13
Performance
• Map/Reduce	
  performance	
  is	
  hard	
  to	
  es)mate	
  before	
  execu)on	
  
• Production Grid computing resource is shared by allYahoos
• Bad performance will affect otherYahoos Grid jobs
• Putting bad performance code on production grid is guilty
• We manually investigate the job performance before we actually execute it
on production Grid
Coding
Manual
Test
Manual
investgate
Get Bug
Saturday, August 3, 13

More Related Content

What's hot

LCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node TestingLCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node TestingLinaro
 
Java 9 Functionality and Tooling
Java 9 Functionality and ToolingJava 9 Functionality and Tooling
Java 9 Functionality and ToolingTrisha Gee
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
LCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: IntroductionLCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: IntroductionLinaro
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)Linaro
 
Perl Continous Integration
Perl Continous IntegrationPerl Continous Integration
Perl Continous IntegrationMichael Peters
 
Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)David Carr
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...All Things Open
 
Puppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with PuppetPuppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with PuppetPuppet
 
Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)Trisha Gee
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineEduardo Piairo
 
Ratpack Web Framework
Ratpack Web FrameworkRatpack Web Framework
Ratpack Web FrameworkDaniel Woods
 
Asynchronous job queues with python-rq
Asynchronous job queues with python-rqAsynchronous job queues with python-rq
Asynchronous job queues with python-rqAshish Acharya
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
 
Extreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA StoryExtreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA StoryAtlassian
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineEduardo Piairo
 

What's hot (20)

LCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node TestingLCE13: LAVA Multi-Node Testing
LCE13: LAVA Multi-Node Testing
 
Java 9 Functionality and Tooling
Java 9 Functionality and ToolingJava 9 Functionality and Tooling
Java 9 Functionality and Tooling
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
LCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: IntroductionLCA13: LAVA Workshop Day 1: Introduction
LCA13: LAVA Workshop Day 1: Introduction
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
 
Perl Continous Integration
Perl Continous IntegrationPerl Continous Integration
Perl Continous Integration
 
Logstash and friends
Logstash and friendsLogstash and friends
Logstash and friends
 
Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)
 
Nautilus
NautilusNautilus
Nautilus
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...
 
Puppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with PuppetPuppet Camp Austin 2015: Getting Started with Puppet
Puppet Camp Austin 2015: Getting Started with Puppet
 
Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)Refactoring to Java 8 (QCon New York)
Refactoring to Java 8 (QCon New York)
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipeline
 
Test driving-qml
Test driving-qmlTest driving-qml
Test driving-qml
 
Ratpack Web Framework
Ratpack Web FrameworkRatpack Web Framework
Ratpack Web Framework
 
Geb Best Practices
Geb Best PracticesGeb Best Practices
Geb Best Practices
 
Asynchronous job queues with python-rq
Asynchronous job queues with python-rqAsynchronous job queues with python-rq
Asynchronous job queues with python-rq
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing one
 
Extreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA StoryExtreme CI Savings with Bamboo 3.1: The JIRA Story
Extreme CI Savings with Bamboo 3.1: The JIRA Story
 
Adding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipelineAdding unit tests to the database deployment pipeline
Adding unit tests to the database deployment pipeline
 

Viewers also liked

Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applicationsKnoldus Inc.
 
Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)Swiss Big Data User Group
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Introduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unitIntroduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unitEdureka!
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Spark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkAnu Shetty
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupQualitest
 

Viewers also liked (7)

Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)Practical Pig and PigUnit (Michael Noll, Verisign)
Practical Pig and PigUnit (Michael Noll, Verisign)
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Introduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unitIntroduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unit
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Spark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing spark
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 

Similar to Coscup 2013 : Continuous Integration on top of hadoop

Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIwajrcs
 
Intro to PHP Testing
Intro to PHP TestingIntro to PHP Testing
Intro to PHP TestingRan Mizrahi
 
Accelerating Your Test Execution Pipeline
Accelerating Your Test Execution PipelineAccelerating Your Test Execution Pipeline
Accelerating Your Test Execution PipelineSmartBear
 
Automated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AUAutomated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AUApplitools
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?Dmitri Shiryaev
 
Continuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukContinuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukMarcinStachniuk
 
Heavenly hell – automated tests at scale wojciech seliga
Heavenly hell – automated tests at scale   wojciech seligaHeavenly hell – automated tests at scale   wojciech seliga
Heavenly hell – automated tests at scale wojciech seligaAtlassian
 
So you-want-to-go-faster
So you-want-to-go-fasterSo you-want-to-go-faster
So you-want-to-go-fasterOoblioob
 
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdfTest Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdfDiana Gray, MBA
 
How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015hirokiky
 
Comprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionComprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionTechWell
 
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...NETWAYS
 
Automate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City WorkshopAutomate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City WorkshopRed Gate Software
 
Automate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAutomate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAnand Bagmar
 
AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance TuningDavid Chen
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)CIVEL Benoit
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1CIVEL Benoit
 
Scalamen and OT
Scalamen and OTScalamen and OT
Scalamen and OTgetch123
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopBrian Christner
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton Araf Karsh Hamid
 

Similar to Coscup 2013 : Continuous Integration on top of hadoop (20)

Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CI
 
Intro to PHP Testing
Intro to PHP TestingIntro to PHP Testing
Intro to PHP Testing
 
Accelerating Your Test Execution Pipeline
Accelerating Your Test Execution PipelineAccelerating Your Test Execution Pipeline
Accelerating Your Test Execution Pipeline
 
Automated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AUAutomated Visual Testing in NSW.Gov.AU
Automated Visual Testing in NSW.Gov.AU
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
 
Continuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukContinuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin Stachniuk
 
Heavenly hell – automated tests at scale wojciech seliga
Heavenly hell – automated tests at scale   wojciech seligaHeavenly hell – automated tests at scale   wojciech seliga
Heavenly hell – automated tests at scale wojciech seliga
 
So you-want-to-go-faster
So you-want-to-go-fasterSo you-want-to-go-faster
So you-want-to-go-faster
 
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdfTest Automation using UiPath Test Suite - Developer Circle Part-2.pdf
Test Automation using UiPath Test Suite - Developer Circle Part-2.pdf
 
How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015
 
Comprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionComprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live Production
 
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
 
Automate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City WorkshopAutomate Database Deployment - SQL In The City Workshop
Automate Database Deployment - SQL In The City Workshop
 
Automate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaSAutomate across Platform, OS, Technologies with TaaS
Automate across Platform, OS, Technologies with TaaS
 
AppEngine Performance Tuning
AppEngine Performance TuningAppEngine Performance Tuning
AppEngine Performance Tuning
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 
Scalamen and OT
Scalamen and OTScalamen and OT
Scalamen and OT
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Coscup 2013 : Continuous Integration on top of hadoop

  • 1. Continuous Integration on top of Hadoop Wisely Chen and Neal Lee Saturday, August 3, 13
  • 2. Agenda • Who I am • Problem • Solution • Demo • Q&A Saturday, August 3, 13
  • 3. Who I am • Wisely Chen ( thegiive@gmail.com ) • Release manager of Yahoo![Taiwan] shopping and data team • Loves to promote open source tech in Taiwan • Hadoop Summit 2013 San Jose • Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007 • Puppet : PHPConf 2012 , RubyConf 2012 • Release Practice :Webconf 2013, Coscup 2012 Saturday, August 3, 13
  • 4. Who I am • Neal Lee (@neal_lee) • Data Engineer at Yahoo![Taiwan] data team • Aims to build a easy to use self-service BI platform connecting to Hadoop. Saturday, August 3, 13
  • 5. EC Data Team 拍賣/商城/購物中心 站台 流量/點擊/使用者行為 追蹤 Transactional data Tracking data Data Highway Data Warehouse/ Data Mart Data Infra BI Platform Report Recommendation API Machine Learning Serve Saturday, August 3, 13
  • 10. Continuous Integration • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commit to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Saturday, August 3, 13
  • 11. We focus on • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commits to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Saturday, August 3, 13
  • 12. CI on Hadoop Flow Code Unit Test Performance Test Deploy Doc Execution Saturday, August 3, 13
  • 13. One Click Deploy Commit Unit Test Performance Test Deploy Doc Execution Saturday, August 3, 13
  • 15. System diagram CI Master GitHub Alpha CI Slave Beta Cluster Hadoop JobTracker CI Slave Hadoop node Hadoop node Hadoop node Hadoop node Slave Node Prod ClusterGateway Saturday, August 3, 13
  • 16. Unit Test Commit Unit Test Performance Test Deploy Doc Execution Saturday, August 3, 13
  • 17. PigUnit • A simple xUnit framework • No cluster set up is required in local mode • Unit testing, regression testing, and rapid prototyping on the fly Saturday, August 3, 13
  • 18. Using PigUnit • After • Coding • Write PigUnit test case • Run local PigUnit test • Push to cluster • Run Pig on cluster • Get right result ! • Before • Coding • Manual local test • Push to cluster • Run Pig on cluster • Get right result ! Saturday, August 3, 13
  • 19. Unit test is live doc • Unit test is runnable live doc • Pass test case and meet previous requirement Saturday, August 3, 13
  • 20. Flexible • Pig can use PigUnit • MapReduce can use MapUnit • Hive can use hive_test Saturday, August 3, 13
  • 22. Vaidya • Rule based performance diagnosis of M/R jobs • Extensible framework • You can add your own rules • Write complex rules using existing rules Saturday, August 3, 13
  • 23. Performance Test Pig Job Pig Job History Vaidya Vaidya Rule 4 Pig Job Conf Notify User 3 Performance result Next CI Stage 1 1 2 2 2 5 1. Exec pig job with sampling data on beta server 2. Vaidya read job history,conf,rule to check performance problem 3. If ok, create performance result 4. If job has performance issue, notify user 5. Go to next CI stage Sampling data 1 Saturday, August 3, 13
  • 24. Vaidya Rule<Diagnos)cTest> <Title><![CDATA[Balanaced Reduce Partitioning]]></Title>  <ClassName>        <![CDATA[                      org.apache.hadoop.vaidya.postexdiagnosis.tests.BalancedReducePar77oning        ]]> </ClassName> <Descrip)on>          <![CDATA[This  rule  tests  as  to  how  well  the  input  to  reduce  tasks  is  balanced]]> </Descrip)on> <Importance><![CDATA[High]]></Importance> <SuccessThreshold><![CDATA[0.20]]></SuccessThreshold> <Prescrip)on><![CDATA[advice]]></Prescrip)on> </Diagnos)cTest> See  if  the  reduce  job  is   balance  or  not   Rule  importance Diagnose  success   threshold Test  Java  Class Saturday, August 3, 13
  • 26. Deploy • Deploy to production cluster • Easy to rollback • Create a git tag • Auto doc generating • Each release should map to a ticket • Auto comment in Bugzilla Saturday, August 3, 13
  • 27. Auto comment in bugzilla Repo url Release Note Issue status change Saturday, August 3, 13
  • 28. Auto create git tag Release Note [Bug xxx] log.... Git Tag Saturday, August 3, 13
  • 30. Demo • Demo1 : Unit test fail • Demo2 : Unit test success • Demo3 : Check performance test • Demo4 :Auto generate Doc • Demo5 : Notify user Saturday, August 3, 13
  • 32. Conclusion • CI will revolutionize your workflow • CI will boost your productivity Saturday, August 3, 13
  • 34. Logic Debug • Map/Reduce  job  oJen  takes  a  lot  of  )me  for  execu)on • Repeated  Map/Reduce  execu)on  cost    a  lot  of  )me   during  logic  debugging  phase • Need  a  way  to  find  out  logic  problem  before   execu)on  produc)on  job • Coding Manual Test Exec Get Bug Saturday, August 3, 13
  • 35. Performance • Map/Reduce  performance  is  hard  to  es)mate  before  execu)on   • Production Grid computing resource is shared by allYahoos • Bad performance will affect otherYahoos Grid jobs • Putting bad performance code on production grid is guilty • We manually investigate the job performance before we actually execute it on production Grid Coding Manual Test Manual investgate Get Bug Saturday, August 3, 13