SlideShare uma empresa Scribd logo
1 de 43
Big Data and Me Bhupesh Bansal Feb 3, 2012
Relational Model Architecture Reference :  http:// www.slideshare.net / adorepump / voldemort-nosql
Linkedin 2006 Reference :  http://www.slideshare.net/linkedin/linked-in-javaone-2008-tech-session-comm
Relational model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Specialized Systems Architecture Reference :  http:// www.slideshare.net / adorepump / voldemort-nosql
Linkedin 2007 Reference :  http://www.slideshare.net/linkedin/linked-in-javaone-2008-tech-session-comm
Specialized systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Batch Driven Architecture Reference :   http:// www.slideshare.net / bhupeshbansal /hadoop-user-group-jan2010
Motivation I : Big Data  02/06/12 Reference :  algo2.iti.kit.edu/.../fopraext/index.html
Motivation II: Data Driven Features
Motivation III: Makes Money  02/06/12 Proprietary & Confidential
Motivation IV: Big Data is cool 02/06/12
Reference : http:// www.slideshare.net / BenSiscovick /the-business-of-big-data-ia-ventures-8577588
Big Data Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why is this hard ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some good news !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What works !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What doesn’t works !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Storage ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Publish/Subscribe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Real time analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Batch computation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Other ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
My personal picks !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Jeff Dean’s Thoughts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Reference  : http:// www.slideshare.net / xlight /google-designs-lessons-and-advice-from-building-large-distributed-systems
How Voldemort was born ? Reference : 1)  http://www.slideshare.net/bhupeshbansal/hadoop-user-group-jan2010 2)  http://www.slideshare.net/adorepump/voldemort-nosql
Why NoSQL ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inspired By Amazon Dynamo & Memcached ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ACID Vs CAP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
Consistent Hashing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
R+W > N  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
Versioning & Conflict Resolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Vector Clock ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
Client API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Voldemort Physical Deployment
 
Read-only storage engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Reference :   http:// www.slideshare.net / bhupeshbansal /hadoop-user-group-jan2010
What do we use Hadoop/Voldemort for ? Proprietary & Confidential 02/06/12
Batch Driven Architecture Reference :   http:// www.slideshare.net / bhupeshbansal /hadoop-user-group-jan2010
Data Flow Driven Architecture Reference : http:// sna-projects.com /blog/2011/08/ kafka /
Questions

Mais conteúdo relacionado

Mais procurados

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
Cloudera, Inc.
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 

Mais procurados (20)

File Context
File ContextFile Context
File Context
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBaseComparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Allyourbase
AllyourbaseAllyourbase
Allyourbase
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Speed it up and Spark it up at Intel
Speed it up and Spark it up at IntelSpeed it up and Spark it up at Intel
Speed it up and Spark it up at Intel
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 

Semelhante a Bhupeshbansal bigdata

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
guest18a0f1
 

Semelhante a Bhupeshbansal bigdata (20)

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
No sql
No sqlNo sql
No sql
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Bhupeshbansal bigdata

Notas do Editor

  1. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  2. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  3. Statistical learning as the ultimate agile development tool (Peter Norvig), “business logic” through data rather than code
  4. No Joins Across data domains due to APIs Within data domains due to performance Natural operation: getAll(id…) Latency: if you want to call 30 services on your main pages, they better be quick (30 * 20ms = 600ms)
  5. - Strong Consistency: all clients see the same view, even in presence of updates - High Availability: all clients can find some replica of the data, even in the presence of failures Partition-tolerance: the system properties hold even when the system is partitioned high availability : Mantra for websites Better to deal with inconsistencies, because their primary need is to scale well to allow for a smooth user experience.
  6. Hashing .. Why do we need it ?? Basic problem : Clients need to know which data is where ?? Many ways of solving it Central configuration Hashing Linear hashing works : issue is when cluster is dynamic ?? KeyHash –node IDmapping change for a lot of entries When you add new slots Consistent hashing : preserves key –Node mapping for most of the keys and only change the minimal amount needed How to do it ?? Number of partitions ---------------------------- Arbitrary , each node is allocated many partitions (better load balancing and fault tolerance) Few hundreds to few thousands .. Key  partition mapping is fixed and only ownership of partitions can change
  7. Give example of read and writes with vector clocks Pros and cons vs paxos and 2pc User can supply strategy for handling cases where v1 and v2 are not comparable.
  8. Fancy way of doing Optimistic locking
  9. Very simple APIS NO Range Scans .. . No iterator on KeySet / Entry SET : Very hard to fix performance Have plans to provide such an iterator
  10. Explain about partitions Make things fast by removing slow things, not by tuning HTTP client not performant Separate caching layer
  11. Transfer time: 30 minutes Can max out a gb network, so be careful
  12. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  13. Questions, comments, etc