SlideShare uma empresa Scribd logo
1 de 34
1
Genie – Hadoop Platform as a Service at Netflix
Sriram Krishnan
Hadoop Summit, June 26, 2013
Netflix does Hadoop
Netflix does Hadoop at scale
Netflix does Hadoop at scale*
Netflix does Hadoop at scale in the cloud
S3 as the Cloud Data Warehouse
Cloud Data Warehouse
Multiple Hadoop Clusters
Cloud Data Warehouse
Hadoop (EMR) Clusters
Data Platform as a Service
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Large Ecosystem of Clients & Tools
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Why Genie?
 Simple API for job submission and management
 Accessible from the data center and the cloud
 Abstraction of physical details of back-end
Hadoop clusters
What Genie is Not
 A workflow scheduler, such as Oozie
 A task scheduler, such as fair share or capacity
schedulers
 An end-to-end resource management tool
Genie: Job Execution
 API to run Hadoop, Hive and Pig
jobs
 Auto-magic submission of jobs
to the right Hadoop cluster
 Abstracting away cluster details
from clients
Genie: Resource Configuration
 API for management of cluster
metadata
 Status: up, out of service, or
terminated
 Site-specific Hadoop, Hive and
Pig configurations
 Cluster naming/tagging for job
submissions
Eureka ServiceEureka Service
ClientEureka
Client
Ribbon
Client Eureka
Client
Python API
Registers
service
Discovers
service
Discovers
service
Invokes
(submits job)
Launches
cluster(s)
Launches
job
Registers
cluster
End-users
Admins
Netflix OSS
http://netflix.github.com
Karyon
Eureka
Client
Ribbon
Servo
Hadoop
Hive
Pig
Karyon
Archaius
Ribbon
Servo
Hadoop
Hive
Pig
Eureka
Client
Genie: Job Execution
• Job Type: {hadoop, hive, pig}
• File dependencies (script, udfs, etc)
• Command-line arguments
• Schedule: {adhoc, sla}
• Configuration: {prod, test, unittest}
REST call
Genie: Job Execution
* Used to query status, get outputs, kill job
Response: job ID*
Genie Job Details
Job ID
Script to execute
Standard output and error
Pig logs
Job conf directory
Genie – Use Cases Enabled at Netflix
 Running nightly short-lived “bonus” clusters to
augment ETL processing
 Re-routing traffic between clusters
 “Red/black” pushes for clusters
 Attaching stand-alone gateways to clusters
 Running 100% of all SLA jobs, and a high
percentage of ad-hoc jobs
Nightly Short-lived Bonus Clusters
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Execution Service Configuration Service
{Schedule=bonus,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc, sla
Configurations: prod, test
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Genie Usage at Netflix
 Usage statistics brought to you by “Sherlock”
 Pig job to gather Hadoop job statistics
 Tableau-based visualization
Cloud Deployment
 Asgard is also part of Netflix OSS
 https://github.com/Netflix/asgard
Auto Scaling in the Cloud
Genie is now part of Netflix OSS!
 http://techblog.netflix.com/2013/06/genie-is-out-
of-bottle.html
 Clone it on GitHub at:
 https://github.com/Netflix/genie
 Still “version 0” – work in progress!
 All contributions and feedback welcome!
 Come talk to us and check out live demos at the
Netflix Booth
Watching Pigs Fly with the
Netflix Hadoop Toolkit
 Sriram Krishnan
We’re hiring!
Thank you!
Home: http://www.netflix.com
Jobs: http://jobs.netflix.com
Tech Blog: http://techblog.netflix.com/

Mais conteúdo relacionado

Destaque

May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopYahoo Developer Network
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12mislam77
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NYahoo Developer Network
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Kurt Brown
 
Oozie sweet
Oozie sweetOozie sweet
Oozie sweetmislam77
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
Data Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixData Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixKurt Brown
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtimeDataWorks Summit
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 

Destaque (19)

May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
 
October 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.xOctober 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.x
 
Oozie sweet
Oozie sweetOozie sweet
Oozie sweet
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Data Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixData Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at Netflix
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtime
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 

Último

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Genie - Hadoop Platform as a Service at Netflix

  • 1. 1 Genie – Hadoop Platform as a Service at Netflix Sriram Krishnan Hadoop Summit, June 26, 2013
  • 5. Netflix does Hadoop at scale in the cloud
  • 6. S3 as the Cloud Data Warehouse Cloud Data Warehouse
  • 7. Multiple Hadoop Clusters Cloud Data Warehouse Hadoop (EMR) Clusters
  • 8. Data Platform as a Service Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 9. Large Ecosystem of Clients & Tools Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 10. Why Genie?  Simple API for job submission and management  Accessible from the data center and the cloud  Abstraction of physical details of back-end Hadoop clusters
  • 11. What Genie is Not  A workflow scheduler, such as Oozie  A task scheduler, such as fair share or capacity schedulers  An end-to-end resource management tool
  • 12. Genie: Job Execution  API to run Hadoop, Hive and Pig jobs  Auto-magic submission of jobs to the right Hadoop cluster  Abstracting away cluster details from clients
  • 13. Genie: Resource Configuration  API for management of cluster metadata  Status: up, out of service, or terminated  Site-specific Hadoop, Hive and Pig configurations  Cluster naming/tagging for job submissions
  • 14. Eureka ServiceEureka Service ClientEureka Client Ribbon Client Eureka Client Python API Registers service Discovers service Discovers service Invokes (submits job) Launches cluster(s) Launches job Registers cluster End-users Admins Netflix OSS http://netflix.github.com Karyon Eureka Client Ribbon Servo Hadoop Hive Pig Karyon Archaius Ribbon Servo Hadoop Hive Pig Eureka Client
  • 15. Genie: Job Execution • Job Type: {hadoop, hive, pig} • File dependencies (script, udfs, etc) • Command-line arguments • Schedule: {adhoc, sla} • Configuration: {prod, test, unittest} REST call
  • 16. Genie: Job Execution * Used to query status, get outputs, kill job Response: job ID*
  • 17. Genie Job Details Job ID Script to execute Standard output and error Pig logs Job conf directory
  • 18. Genie – Use Cases Enabled at Netflix  Running nightly short-lived “bonus” clusters to augment ETL processing  Re-routing traffic between clusters  “Red/black” pushes for clusters  Attaching stand-alone gateways to clusters  Running 100% of all SLA jobs, and a high percentage of ad-hoc jobs
  • 19. Nightly Short-lived Bonus Clusters Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod
  • 20. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Execution Service Configuration Service {Schedule=bonus, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod
  • 21. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 22. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: TERMINATED Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 23. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 24. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc, sla Configurations: prod, test Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE
  • 25. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 26. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 27. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 28. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: TERMINATED Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 29. Genie Usage at Netflix  Usage statistics brought to you by “Sherlock”  Pig job to gather Hadoop job statistics  Tableau-based visualization
  • 30. Cloud Deployment  Asgard is also part of Netflix OSS  https://github.com/Netflix/asgard
  • 31. Auto Scaling in the Cloud
  • 32. Genie is now part of Netflix OSS!  http://techblog.netflix.com/2013/06/genie-is-out- of-bottle.html  Clone it on GitHub at:  https://github.com/Netflix/genie  Still “version 0” – work in progress!  All contributions and feedback welcome!  Come talk to us and check out live demos at the Netflix Booth
  • 33. Watching Pigs Fly with the Netflix Hadoop Toolkit
  • 34.  Sriram Krishnan We’re hiring! Thank you! Home: http://www.netflix.com Jobs: http://jobs.netflix.com Tech Blog: http://techblog.netflix.com/

Notas do Editor

  1. Referencehttp://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.htmlUse cases – reporting, analytics, insights, algorithms (e.g. recommendations)But big deal – so does everyone in the room
  2. What is scale? It means different things to different people
  3. Few petabytes of data – billons of log events captured each data, with retention of a few monthsMany clusters – 1000s of nodesAgain, big deal – there are many others in the room who do Hadoop at this scale (petabyte is the new terabyte)
  4. Our Hadoop processing is 100% in the (public) cloudIn our case, public cloud is AWSThis is what differentiates our infrastructure from the restHadoop in the cloud is different from Hadoop in the datacenter – in this talk, we will discuss our cloud-based Hadoop platform
  5. S3 is the source of truthDecoupling of storage from the computational infrastructureS3 benefitsHighly durable and available – 11 9’sBucket versioningHighly elastic - we grew our data warehouse organically from a few hundred terabytes to petabytes without having to provision any storage resources in advanceHDFS? Only for transient data, intermediate results for multi-stage jobsS3 cons – performance, eventual consistency
  6. Another benefit of S3 - Multiple clusters can read/process the same data(Semi-) persistent sla and ad-hoc clusters~800-1300 nodesMultiple ad-hoc clusters to A/B test new releases/featuresNightly "bonus" clusters to supplement SLA clusterOperation assumption – clusters may go down at any time
  7. Traditional Gateways/CLIsAd-hoc queryingGenieREST API for job execution/monitoringRepository/abstraction for clusters and metastoresFranklin – MDSUses HCAT/HiveServer to talk to Hive metastore
  8. Next – we will focus on Genie for the rest of the talkOther tools will be talked about in the other Netflix talk
  9. EMR: HadoopIaaS, and an API to run jobs on transient clusters – our clusters are semi-persistent, and job submissions don’t result in new clusters.Oozie: Workflow tool, which only supports Hadoop ecosystem – we have hybrid jobs (Teradata+Hadoop) being orchestrated by UC4, so we just needed a job submission API. Also no support for Hive when we started.Templeton: No multi-cluster, multi-user support, not quite ready for prime-time.
  10. * Genie is a resource “match-maker”
  11. Unit of execution is a Hadoop/Hive/Pig jobUsers provide scripts, dependencies and other metadataDoes no scheduling per se – only does “meta-scheduling” or resource matching
  12. Status defines whether it is accepting jobsConfigurations are *-site.xmls and propertiesCluster name, schedule, etc
  13. Two classes of users: admins and end-usersAdmins spin up clusters, set cluster metadataUsers use the clusters once they have been registeredGenie is built on top of Netflix OSS
  14. Genie figures out the resources to run jobs on – back-end resources are abstracted outAsynchronous execution since jobs may be long-running
  15. Every job run as a separate process using Hadoop/Hive/Pig CLIAvoids “jar hell” since it needs Hadoop jarsJobs run in their own sandbox (working directory)Provides isolation between jobs, and between Genie and the jobsStandard output/error of jobs easily availableAble to support multiple versions of Hadoop/Hive/Pig, and connect to multiple clusters
  16. Configuration service helps us do crazy (cool) thingsWill describe each of these in greater detail
  17. New bonus clusters launched each night – but clients are oblivious of actual host names/IP’sOne way to do thisHigher SLA jobs first ask for cluster by name
  18. If it doesn’t exist, revert back to existing clusterWhy not just expand?Better isolationMixing matching instance types not ideal for HadoopProd cluster uses m1.xlarges for slave nodesShrink has proven to be a problemWe want to do hard shutdown when those instances are needed on awsprod
  19. We had to bounce the prod job tracker to enable priorities for “long-pole” jobsWanted to do it with minimal impact to SLA jobs
  20. Must wait for all existing jobs to finish for minimal impactHadoop jobs are long running – don’t want to kill a 5 hour job nearing its finish
  21. Prod cluster is back up after maintenanceJobs that were scheduled on query cluster will continue to run there until it finishesThis is done from time to time – although not too often, we do red-black pushes…
  22. This is initial state – we need to spin up a new cluster, e.g. to push a new feature
  23. * Spin up new cluster, mark it as UP, mark old cluster as OOS
  24. OUT_OF_SERVICE to TERMINATED
  25. Mention that we will be writing a techblog about this soon, with more detailsTwo query clusters – A/B testing new fair share scheduler
  26. Set up desired instance counts across multiple AZ’sDo “red-black” pushes using “sequential ASGs”Loss of individual nodes will cause jobs running on those nodes to be lost
  27. Auto-scaling policy set up to expand if number of running jobs > ~80%
  28. Still biased towards running in the cloud and at Netflix, but will generalize/improve it based on community feedback
  29. * Come listen to how we enable “Data Platform as a Service” – it is truly Lipstick on a Pig.