SlideShare uma empresa Scribd logo
1 de 49
Hadoop  Voldemort  @ LinkedIn Bhupesh Bansal 20 January , 2010 01/21/10
The plan ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Motivation I : Big Data  Proprietary & Confidential 01/21/10 Reference :  algo2.iti.kit.edu/.../fopraext/index.html
Motivation II: Data Driven Features
Motivation III  Proprietary & Confidential 01/21/10
Motivation IV Proprietary & Confidential 01/21/10
Why Is This Hard? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some Problems we worked on lately ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Server side views ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Failure Detection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
EC2 based testing  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Coming this Jan (finally): Rebalancing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Administration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Present day ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop @ Linkedin
Batch Computing at Linkedin  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What do we use Hadoop for ? Proprietary & Confidential 01/21/10
How do we store Data ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How do we manage workflows ?  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introducing Azkaban
How do we do ETL ? : Getting data in  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ETL II: Getting data out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ETL II : Getting Data Out : Existing Solutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
ETL II : Getting Data Out : Our solution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Voldemort Read only store: version I ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Voldemort Read only store: version II ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Batch Computing at LinkedIn
Infrastructure At LinkedIn ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
The End
Core Concepts
Core Concepts - I ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Core Concept - II ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Core Concept - III ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Core Concepts - IV ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Implementation
Voldemort Design
Client API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Versioning & Conflict Resolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Serialization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Routing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Voldemort Physical Deployment
Routing With Failures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Repair Mechanism ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 01/21/10
Network Layer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Persistence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_finalRamya Sunil
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Chris Nauroth
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierKellyn Pot'Vin-Gorman
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 
Kscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data PlatformKscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data PlatformKyle Hailey
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
Pass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft ProfessionalPass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft ProfessionalKellyn Pot'Vin-Gorman
 

Mais procurados (20)

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Mumak
MumakMumak
Mumak
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Kscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data PlatformKscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data Platform
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
Pass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft ProfessionalPass Summit Linux Scripting for the Microsoft Professional
Pass Summit Linux Scripting for the Microsoft Professional
 

Semelhante a Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Building Low Cost Scalable Web Applications Tools & Techniques
Building Low Cost Scalable Web Applications   Tools & TechniquesBuilding Low Cost Scalable Web Applications   Tools & Techniques
Building Low Cost Scalable Web Applications Tools & Techniquesrramesh
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 

Semelhante a Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010 (20)

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Building Low Cost Scalable Web Applications Tools & Techniques
Building Low Cost Scalable Web Applications   Tools & TechniquesBuilding Low Cost Scalable Web Applications   Tools & Techniques
Building Low Cost Scalable Web Applications Tools & Techniques
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010

Notas do Editor

  1. Thanks all, excited to talk to you
  2. Core concepts (optional) Implementation (optional)
  3. Key and value can be arbitrarily complex, they are serialized and can
  4. Statistical learning as the ultimate agile development tool (Peter Norvig), “business logic” through data rather than code
  5. Reference to previous presentation and amazon dynamo model
  6. Simple configuration to use a compressed store
  7. EC2 testing should be ready in next few weeks
  8. Main project for Bhupesh and I Minimal and tunable performance for the cluster.
  9. Will be in next release, happens November 15th
  10. Client bound
  11. Give Azkaban demo here Show HelloWorldJob Show hello1.properties Show dependencies with graph job Started out as a week project, was much more complex than we realized
  12. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  13. Give Azkaban demo here Show HelloWorldJob Show hello1.properties Show dependencies with graph job Started out as a week project, was much more complex than we realized
  14. Give Azkaban demo here Show HelloWorldJob Show hello1.properties Show dependencies with graph job Started out as a week project, was much more complex than we realized
  15. Started out as a week project, was much more complex than we realized Period can be daily, hourly depending on need and database availability.
  16. Utilizes Hadoop power for computation intensive index build Provides Voldemort online serving advantages.
  17. Voldemort storage engine is very fast Key lookup using binary search Value lookup using single seek in value file. Operation system cache optimizes .
  18. Client bound
  19. Questions, comments, etc
  20. Switch to Bhup
  21. - Strong Consistency: all clients see the same view, even in presence of updates - High Availability: all clients can find some replica of the data, even in the presence of failures Partition-tolerance: the system properties hold even when the system is partitioned high availability : Mantra for websites Better to deal with inconsistencies, because their primary need is to scale well to allow for a smooth user experience.
  22. Hashing .. Why do we need it ?? Basic problem : Clients need to know which data is where ?? Many ways of solving it Central configuration Hashing Linear hashing works : issue is when cluster is dynamic ?? KeyHash –node IDmapping change for a lot of entries When you add new slots Consistent hashing : preserves key –Node mapping for most of the keys and only change the minimal amount needed How to do it ?? Number of partitions ---------------------------- Arbitrary , each node is allocated many partitions (better load balancing and fault tolerance) Few hundreds to few thousands .. Key  partition mapping is fixed and only ownership of partitions can change
  23. Fancy way of doing Optimistic locking
  24. Will discuss each layer in more detail Layered design One interface for all layers: put/get/delete Each layer decorates the next Very flexible Easy to test Client API : very basic API just provides the raw interface to user Conflct reslution layer : handles all the versioning issues and provides hooks to supply custom conflict resolution strategy Serialization : Object <=> Data Network Layer : Depending on configuration can fit either here or below .. Main job is to handle the network interface, socket pooling other performance related optimizatons Routing layer : Handles and hide many details from the client/user hashing schema failed nodes replication required reads/required writes Storage engine Handle disk persistenct
  25. Very simple APIS NO Range Scans .. . No iterator on KeySet / Entry SET : Very hard to fix performance Have plans to provide such an iterator
  26. Give example of read and writes with vector clocks Pros and cons vs paxos and 2pc User can supply strategy for handling cases where v1 and v2 are not comparable.
  27. Avro is good, but new and unreleased Storing data is really different from an RPC IDL Data is forever (some data) Inverted index Threaded comment tree Don’t hardcode serialization Don’t just do byte[] -- checks are good, many features depend on knowing what the data is Xml profile
  28. Explain about partitions Make things fast by removing slow things, not by tuning HTTP client not performant Separate caching layer
  29. Client v. server - client is better - but harder
  30. You can write an implementation very easily We support plugins so you can run this backend without being in the main code base