SlideShare uma empresa Scribd logo
1 de 34
1
Genie – Hadoop Platform as a Service at Netflix
Sriram Krishnan
Hadoop Summit, June 26, 2013
Netflix does Hadoop
Netflix does Hadoop at scale
Netflix does Hadoop at scale*
Netflix does Hadoop at scale in the cloud
S3 as the Cloud Data Warehouse
Cloud Data Warehouse
Multiple Hadoop Clusters
Cloud Data Warehouse
Hadoop (EMR) Clusters
Data Platform as a Service
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Large Ecosystem of Clients & Tools
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Why Genie?
 Simple API for job submission and management
 Accessible from the data center and the cloud
 Abstraction of physical details of back-end
Hadoop clusters
What Genie is Not
 A workflow scheduler, such as Oozie
 A task scheduler, such as fair share or capacity
schedulers
 An end-to-end resource management tool
Genie: Job Execution
 API to run Hadoop, Hive and Pig
jobs
 Auto-magic submission of jobs
to the right Hadoop cluster
 Abstracting away cluster details
from clients
Genie: Resource Configuration
 API for management of cluster
metadata
 Status: up, out of service, or
terminated
 Site-specific Hadoop, Hive and
Pig configurations
 Cluster naming/tagging for job
submissions
Eureka ServiceEureka Service
Registers
service
ClientEureka
Client
Ribbon
Discovers
service
Invokes
(submits job)
Launches
job
Discovers
service
Client Eureka
Client
Python API
Launches
cluster(s)
Registers
cluster
End-users
Admins
Netflix OSS
http://netflix.github.com
Karyon
Eureka
Client
Ribbon
Servo
Hadoop
Hive
Pig
Karyon
Archaius
Ribbon
Servo
Hadoop
Hive
Pig
Eureka
Client
Genie: Job Execution
• Job Type: {hadoop, hive, pig}
• File dependencies (script, udfs, etc)
• Command-line arguments
• Schedule: {adhoc, sla}
• Configuration: {prod, test, unittest}
REST call
Genie: Job Execution
* Used to query status, get outputs, kill job
Response: job ID*
Genie Job Details
Job ID
Script to execute
Standard output and error
Pig logs
Job conf directory
Genie – Use Cases Enabled at Netflix
 Running nightly short-lived “bonus” clusters to
augment ETL processing
 Re-routing traffic between clusters
 “Red/black” pushes for clusters
 Attaching stand-alone gateways to clusters
 Running 100% of all SLA jobs, and a high
percentage of ad-hoc jobs
Nightly Short-lived Bonus Clusters
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Execution Service Configuration Service
{Schedule=bonus,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc, sla
Configurations: prod, test
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Genie Usage at Netflix
 Usage statistics brought to you by “Sherlock”
 Pig job to gather Hadoop job statistics
 Tableau-based visualization
Genie Deployment in the Cloud
 Asgard is also part of Netflix OSS
 https://github.com/Netflix/asgard
Auto Scaling in the Cloud
Genie is now part of Netflix OSS!
 http://techblog.netflix.com/2013/06/genie-is-out-
of-bottle.html
 Clone it on GitHub at:
 https://github.com/Netflix/genie
 Still “version 0” – work in progress!
 All contributions and feedback welcome!
 Come talk to us and check out live demos at the
Netflix Booth
Watching Pigs Fly with the
Netflix Hadoop Toolkit
 Sriram Krishnan
We’re hiring!
Thank you!
Home: http://www.netflix.com
Jobs: http://jobs.netflix.com
Tech Blog: http://techblog.netflix.com/

Mais conteúdo relacionado

Destaque

sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...JYOTI DEVENDRA
 
Cip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentationCip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentationAraik Ambartsumyan
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Copenhagenomics
 
Analyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingAnalyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingQIAGEN
 
Different methods of gene sequencing durgesh sirohi
Different methods of  gene sequencing   durgesh sirohiDifferent methods of  gene sequencing   durgesh sirohi
Different methods of gene sequencing durgesh sirohiD. Sirohi
 
Protein synthesis with turning point
Protein synthesis with turning pointProtein synthesis with turning point
Protein synthesis with turning pointtas11244
 

Destaque (17)

Sterilization methods of parenterals
Sterilization methods of parenteralsSterilization methods of parenterals
Sterilization methods of parenterals
 
Fermenter and their oprations
Fermenter and their oprationsFermenter and their oprations
Fermenter and their oprations
 
Fermentation
FermentationFermentation
Fermentation
 
Hoofdstuk 20 2008 deel 3
Hoofdstuk 20 2008 deel 3Hoofdstuk 20 2008 deel 3
Hoofdstuk 20 2008 deel 3
 
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
sem4-cdna sythesis,pcr,designing primers for pcr, synthesis of genes, shotgun...
 
Media Sterilisation
Media SterilisationMedia Sterilisation
Media Sterilisation
 
Steralization
SteralizationSteralization
Steralization
 
Development of media
Development of mediaDevelopment of media
Development of media
 
Cip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentationCip sip-ctd solution-ivt_presentation
Cip sip-ctd solution-ivt_presentation
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
 
Analyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingAnalyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation Sequencing
 
Fermentor
Fermentor   Fermentor
Fermentor
 
Purification product
Purification product Purification product
Purification product
 
Different methods of gene sequencing durgesh sirohi
Different methods of  gene sequencing   durgesh sirohiDifferent methods of  gene sequencing   durgesh sirohi
Different methods of gene sequencing durgesh sirohi
 
Fermentation technology
Fermentation technology Fermentation technology
Fermentation technology
 
Genes
GenesGenes
Genes
 
Protein synthesis with turning point
Protein synthesis with turning pointProtein synthesis with turning point
Protein synthesis with turning point
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Genie - Hadoop Platform as a Service at Netflix

  • 1. 1 Genie – Hadoop Platform as a Service at Netflix Sriram Krishnan Hadoop Summit, June 26, 2013
  • 5. Netflix does Hadoop at scale in the cloud
  • 6. S3 as the Cloud Data Warehouse Cloud Data Warehouse
  • 7. Multiple Hadoop Clusters Cloud Data Warehouse Hadoop (EMR) Clusters
  • 8. Data Platform as a Service Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 9. Large Ecosystem of Clients & Tools Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 10. Why Genie?  Simple API for job submission and management  Accessible from the data center and the cloud  Abstraction of physical details of back-end Hadoop clusters
  • 11. What Genie is Not  A workflow scheduler, such as Oozie  A task scheduler, such as fair share or capacity schedulers  An end-to-end resource management tool
  • 12. Genie: Job Execution  API to run Hadoop, Hive and Pig jobs  Auto-magic submission of jobs to the right Hadoop cluster  Abstracting away cluster details from clients
  • 13. Genie: Resource Configuration  API for management of cluster metadata  Status: up, out of service, or terminated  Site-specific Hadoop, Hive and Pig configurations  Cluster naming/tagging for job submissions
  • 14. Eureka ServiceEureka Service Registers service ClientEureka Client Ribbon Discovers service Invokes (submits job) Launches job Discovers service Client Eureka Client Python API Launches cluster(s) Registers cluster End-users Admins Netflix OSS http://netflix.github.com Karyon Eureka Client Ribbon Servo Hadoop Hive Pig Karyon Archaius Ribbon Servo Hadoop Hive Pig Eureka Client
  • 15. Genie: Job Execution • Job Type: {hadoop, hive, pig} • File dependencies (script, udfs, etc) • Command-line arguments • Schedule: {adhoc, sla} • Configuration: {prod, test, unittest} REST call
  • 16. Genie: Job Execution * Used to query status, get outputs, kill job Response: job ID*
  • 17. Genie Job Details Job ID Script to execute Standard output and error Pig logs Job conf directory
  • 18. Genie – Use Cases Enabled at Netflix  Running nightly short-lived “bonus” clusters to augment ETL processing  Re-routing traffic between clusters  “Red/black” pushes for clusters  Attaching stand-alone gateways to clusters  Running 100% of all SLA jobs, and a high percentage of ad-hoc jobs
  • 19. Nightly Short-lived Bonus Clusters Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod
  • 20. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Execution Service Configuration Service {Schedule=bonus, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod
  • 21. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 22. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: TERMINATED Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 23. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 24. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc, sla Configurations: prod, test Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE
  • 25. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 26. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 27. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 28. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: TERMINATED Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 29. Genie Usage at Netflix  Usage statistics brought to you by “Sherlock”  Pig job to gather Hadoop job statistics  Tableau-based visualization
  • 30. Genie Deployment in the Cloud  Asgard is also part of Netflix OSS  https://github.com/Netflix/asgard
  • 31. Auto Scaling in the Cloud
  • 32. Genie is now part of Netflix OSS!  http://techblog.netflix.com/2013/06/genie-is-out- of-bottle.html  Clone it on GitHub at:  https://github.com/Netflix/genie  Still “version 0” – work in progress!  All contributions and feedback welcome!  Come talk to us and check out live demos at the Netflix Booth
  • 33. Watching Pigs Fly with the Netflix Hadoop Toolkit
  • 34.  Sriram Krishnan We’re hiring! Thank you! Home: http://www.netflix.com Jobs: http://jobs.netflix.com Tech Blog: http://techblog.netflix.com/

Notas do Editor

  1. Reference tech blogs: http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.htmlUse cases – reporting, analytics, insights, algorithms (e.g. recommendations)But big deal – so does everyone in the room
  2. What is scale? It means different things to different people
  3. 80-100 billion events per day, 10s of TB of data (compressed)Totals ~2PB (retention is a few months)Many clusters – 2000-2500 nodes at different times during the dayAgain, big deal – there are many others in the room who do Hadoop at this scale (petabyte is the new terabyte)
  4. Our Hadoop processing is 100% in the (public) cloudIn our case, public cloud is AWSThis is what differentiates our infrastructure from the restHadoop in the cloud is different from Hadoop in the datacenter – in this talk, we will discuss our cloud-based Hadoop platformWe made certain architectural choices to make it easy for our end-users to run Hadoop jobs, and for us to manage Hadoop resources
  5. S3 is the source of truthS3 benefitsHighly durable and available – 11 9’sBucket versioningHighly elastic - we grew our data warehouse organically from a few hundred terabytes to petabytes without having to provision any storage resources in advanceHDFS? Only for transient data, intermediate results for multi-stage jobsS3 cons – performance, eventual consistency
  6. Another benefit of S3 - Multiple clusters can read/process the same data(Semi-) persistent sla and ad-hoc clusters:~800-1300 nodesMultiple ad-hoc clusters to A/B test new releases/featuresNightly "bonus" clusters to supplement SLA clusterOperation assumption – clusters may go down at any timeIf we lose a cluster, we just respin itClusters are inter-changeable: Decoupling of storage from the computational infrastructure
  7. All end-users want to do is run jobs, and access their dataAs the platform team, our goal is to shield them from the back-end complexityGenieREST API for job execution/monitoringRepository/abstraction for clusters and metastoresFranklin – MDSUses HiveServerto talk to Hive metastoreIn all honesty – very few people use this API directly
  8. Next – we will focus on Genie for the rest of the talkOther tools will be talked about in the other Netflix talk – Watching Pigs Fly with the Netflix Hadoop ToolkitThu, 1:40PM
  9. EMR: HadoopIaaS, and an API to run jobs on transient clusters – our clusters are semi-persistent, and job submissions don’t result in new clusters.Oozie: Workflow tool, which only supports Hadoop ecosystem – we have hybrid jobs (Teradata+Hadoop) being orchestrated by UC4, so we just needed a job submission API. Also no support for Hive when we started.Templeton: No multi-cluster, multi-user support, not quite ready for prime-time.
  10. Genie is a resource “match-maker”Next – we look at two key services that Genie provides
  11. Unit of execution is a Hadoop/Hive/Pig jobUsers provide scripts, dependencies and other metadataDoes no scheduling per se – only does “meta-scheduling” or resource matching
  12. Status defines whether it is accepting jobsConfigurations are *-site.xmls and propertiesCluster name, schedule, etcNext we look at the two classes of users supported by Genie – and overall lifecycle
  13. Two classes of users: admins and end-usersAdmins spin up clusters, set cluster metadataUsers use the clusters once they have been registeredGenie is built on top of Netflix OSS
  14. Genie figures out the resources to run jobs on – back-end resources are abstracted outAsynchronous execution since jobs may be long-running
  15. Every job run as a separate process using Hadoop/Hive/Pig CLIAvoids “jar hell” since it needs Hadoop jarsJobs run in their own sandbox (working directory)Provides isolation between jobs, and between Genie and the jobsStandard output/error of jobs easily availableAble to support multiple versions of Hadoop/Hive/Pig, and connect to multiple clusters
  16. Configuration service helps us do crazy (cool) thingsWill describe each of these in greater detail
  17. New bonus clusters launched each night – but clients are oblivious of actual host names/IP’sOne way to do thisHigher SLA jobs first ask for cluster by name
  18. New bonus clusters launched each night – but clients are oblivious of actual host names/IP’sOne way to do thisHigher SLA jobs first ask for cluster by name
  19. If it doesn’t exist, revert back to existing clusterWhy not just expand?Better isolationMixing matching instance types not ideal for HadoopProd cluster uses m1.xlarges for slave nodesShrink has proven to be a problemWe want to do hard shutdown when those instances are needed on awsprod
  20. If it doesn’t exist, revert back to existing clusterWhy not just expand?Better isolationMixing matching instance types not ideal for HadoopProd cluster uses m1.xlarges for slave nodesShrink has proven to be a problemWe want to do hard shutdown when those instances are needed on awsprod
  21. We had to bounce the prod job tracker to enable priorities for “long-pole” jobsWanted to do it with minimal impact to SLA jobs
  22. Must wait for all existing jobs to finish for minimal impactHadoop jobs are long running – don’t want to kill a 5 hour job nearing its finish
  23. Prod cluster is back up after maintenanceJobs that were scheduled on query cluster will continue to run there until it finishesThis is done from time to time – although not too often, we do red-black pushes…
  24. This is initial state – we need to spin up a new cluster, e.g. to push a new feature
  25. * Spin up new cluster, mark it as UP, mark old cluster as OOS
  26. OUT_OF_SERVICE to TERMINATED
  27. Our techblog shows number of Hadoop jobs – this shows Genie jobsTwo query clusters – A/B testing new fair share schedulerMention that we will be writing a techblog about this soon, with more details
  28. Set up desired instance counts across multiple AZ’sDo “red-black” pushes using “sequential ASGs”Loss of individual nodes will cause jobs running on those nodes to be lost
  29. Auto-scaling policy set up to expand if number of running jobs > ~80%
  30. Still biased towards running in the cloud and at Netflix, but will generalize/improve it based on community feedback
  31. * Come listen to how we enable “Data Platform as a Service” – it is truly Lipstick on a Pig.