SlideShare uma empresa Scribd logo
1 de 40
Developing unit-testable
software with Hadoop
HUG UK - Jan 13th 2015
8
108 Keywords
1010 Web hits
9
Big data +
High stakes =
Exciting
12
Big data +
High stakes +
Complexity +
Change +
Unknowns =
Fear
13
Benefits of unit testing
• Build confidence
• Enable change
• Describe behaviour
• Accelerate development
14
Tests need to be:
• Simple and quick to write
• Simple and quick to run
15
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
16
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
17
Sequential modularization
18
Modularization via encapsulation
19
Hive modularity
20
Cascading modularity
21
Pig modularity
22
Crunch modularity
23
Mapreduce modularity
25
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
26
Local execution
Framework Local engine
Hive Local-mode
Cascading LocalFlowConnector
Pig Local mode
Crunch MemPipeline
27
Hadoop testing challenges
• Framework modularization issues
• Heavyweight execution engine
• Availability of testing utilities
28
Helper libraries
Framework Library
Hive HiveRunner
Cascading cascading-test
Plunger
Pig PigUnit
Crunch MemPipeline
29
Example
Topic | Subtopic
30
Hive example with HiveRunner
[https://github.com/klarna/HiveRunner]
31
Hive + HiveRunner: pros
• Write/test Hive apps in the same environment
• Seamless UDF development
32
Hive + Runner: cons
• Slow execution
• CSV data – hard to maintain
• Assertions on CSV strings is brittle
• Hadoop compatibility issues
33
Cascading example with Plunger
[https://github.com/HotelsDotCom/plunger]
34
Cascading + Plunger: pros
• Compact tests, well defined scope
• Use standard Java tools
• Fast
35
Cascading + Plunger: cons
• Some tools only appear to work
• False sense of security
36
Measuring coverage
• Identify activated branches
• No tools do this
37
Conclusions
• Testing possible with most frameworks
• Efficacy largely influenced by framework
• Tooling is immature
38
We’re hiring
Java developers
Hadoop developers
39
Questions?
40
Attribution
https://flic.kr/p/7XQdXm - Chris Campbell - CC BY-NC 2.0
https://flic.kr/p/4hpX7j - Andrew_Writer - CC BY-NC-ND 2.0
http://bit.ly/1BVe8xH - Prokofiev - CC BY-SA 3.0
Resources
HiveRunner
https://github.com/klarna/HiveRunner
Plunger
https://github.com/HotelsDotCom/plunger

Mais conteúdo relacionado

Mais procurados

Spca2014 7 tenets of highly scalable applications kapic
Spca2014 7 tenets of highly scalable applications kapicSpca2014 7 tenets of highly scalable applications kapic
Spca2014 7 tenets of highly scalable applications kapic
NCCOMMS
 

Mais procurados (20)

GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Spca2014 7 tenets of highly scalable applications kapic
Spca2014 7 tenets of highly scalable applications kapicSpca2014 7 tenets of highly scalable applications kapic
Spca2014 7 tenets of highly scalable applications kapic
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
 
Reliable, Scalable Kubernetes on AWS
Reliable, Scalable Kubernetes on AWSReliable, Scalable Kubernetes on AWS
Reliable, Scalable Kubernetes on AWS
 
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
 
Instaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraInstaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to Cassandra
 
Automation for Anyone at Nutanix NEXT 2017 US
Automation for Anyone at Nutanix NEXT 2017 USAutomation for Anyone at Nutanix NEXT 2017 US
Automation for Anyone at Nutanix NEXT 2017 US
 
Campus days Azure HDInsight automation
Campus days Azure HDInsight automationCampus days Azure HDInsight automation
Campus days Azure HDInsight automation
 
Creating a Kubernetes Operator in Java
Creating a Kubernetes Operator in JavaCreating a Kubernetes Operator in Java
Creating a Kubernetes Operator in Java
 
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
 
MVC 6 - the new unified Web programming model
MVC 6 - the new unified Web programming modelMVC 6 - the new unified Web programming model
MVC 6 - the new unified Web programming model
 
Greetings from AWS User Group Taiwan
Greetings from AWS User Group TaiwanGreetings from AWS User Group Taiwan
Greetings from AWS User Group Taiwan
 
Jolt: Distributed, fault-tolerant test running at scale using Mesos
Jolt: Distributed, fault-tolerant test running at scale using MesosJolt: Distributed, fault-tolerant test running at scale using Mesos
Jolt: Distributed, fault-tolerant test running at scale using Mesos
 
ECS19 - Erwin Van Hunen - PnP Provisioning on Steroids
ECS19 - Erwin Van Hunen - PnP Provisioning on SteroidsECS19 - Erwin Van Hunen - PnP Provisioning on Steroids
ECS19 - Erwin Van Hunen - PnP Provisioning on Steroids
 
CCCNA17 Dynamic Roles in CloudStack
CCCNA17 Dynamic Roles in CloudStackCCCNA17 Dynamic Roles in CloudStack
CCCNA17 Dynamic Roles in CloudStack
 
Breaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloudBreaking the Monolith - Microservice Extraction at SoundCloud
Breaking the Monolith - Microservice Extraction at SoundCloud
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changers
 
The Cloud Native Stack
The Cloud Native StackThe Cloud Native Stack
The Cloud Native Stack
 
AWS reInvent 2016 recap Taiwan
AWS reInvent 2016 recap TaiwanAWS reInvent 2016 recap Taiwan
AWS reInvent 2016 recap Taiwan
 

Semelhante a Developing Unit Testable Software with Hadoop at Expedia

Continuous delivery with Jenkins Enterprise and Deployit
Continuous delivery with Jenkins Enterprise and DeployitContinuous delivery with Jenkins Enterprise and Deployit
Continuous delivery with Jenkins Enterprise and Deployit
XebiaLabs
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622
Anthony Potappel
 
Calculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the CloudCalculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the Cloud
Acquia
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10
Chris Bunch
 

Semelhante a Developing Unit Testable Software with Hadoop at Expedia (20)

DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
 
Building the Web with Gradle
Building the Web with GradleBuilding the Web with Gradle
Building the Web with Gradle
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
12-factor-jruby
12-factor-jruby12-factor-jruby
12-factor-jruby
 
Neotys PAC - Ian Molyneaux
Neotys PAC - Ian MolyneauxNeotys PAC - Ian Molyneaux
Neotys PAC - Ian Molyneaux
 
Legacy On Premise Apps Got You Down? No Problem - DevOps for All
Legacy On Premise Apps Got You Down? No Problem - DevOps for AllLegacy On Premise Apps Got You Down? No Problem - DevOps for All
Legacy On Premise Apps Got You Down? No Problem - DevOps for All
 
Continuous delivery with Jenkins Enterprise and Deployit
Continuous delivery with Jenkins Enterprise and DeployitContinuous delivery with Jenkins Enterprise and Deployit
Continuous delivery with Jenkins Enterprise and Deployit
 
Introduction to Google Cloud Services / Platforms
Introduction to Google Cloud Services / PlatformsIntroduction to Google Cloud Services / Platforms
Introduction to Google Cloud Services / Platforms
 
Self Healing blue/green Deployments with Dynatrace and Keptn
Self Healing blue/green Deployments with Dynatrace and KeptnSelf Healing blue/green Deployments with Dynatrace and Keptn
Self Healing blue/green Deployments with Dynatrace and Keptn
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Drupal In The Cloud
Drupal In The CloudDrupal In The Cloud
Drupal In The Cloud
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
 
Operating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud MicroservicesOperating a High Velocity Large Organization with Spring Cloud Microservices
Operating a High Velocity Large Organization with Spring Cloud Microservices
 
What Is New In TestMaker 6.5
What Is New In TestMaker 6.5What Is New In TestMaker 6.5
What Is New In TestMaker 6.5
 
Calculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the CloudCalculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the Cloud
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10
 
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
 
MyHeritage - End 2 End testing Infra
MyHeritage - End 2 End testing InfraMyHeritage - End 2 End testing Infra
MyHeritage - End 2 End testing Infra
 

Mais de huguk

Mais de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Developing Unit Testable Software with Hadoop at Expedia