SlideShare uma empresa Scribd logo
1 de 34
Real Time Analytics for Big Data Lessons from Facebook..
The Real Time Boom.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Google Real Time  Web Analytics Google Real Time Search Facebook Real Time  Social Analytics  Twitter paid tweet analytics SaaS Real Time User Tracking New Real Time Analytics  Startups..
Analytics @ Twitter
Note the Time dimension
The data resolution & processing models
Traditional analytics applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
CEP – Complex Event Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
In Memory Data Grid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
NoSQL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Hadoop MapReudce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Hadoop Map/Reduce – Reality check.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
So what’s the bottom line? ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Facebook Real-time Analytics System ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Goals ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
The actual analytics.. ,[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Technology Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
The solution.. PTail Scribe Puma Hbase HDFS Real Time Long Term Batch 1.5 Sec 10,000 write/sec per server FACEBOOK Log FACEBOOK Log FACEBOOK Log
Checking the assumptions.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Facebook Analytics.Next.. ,[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  ,[object Object],[object Object],[object Object]
Step 1: Use memory.. ,[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Events Memory Grid Data Grid Data Grid Data Grid FACEBOOK FACEBOOK FACEBOOK
Step 1: Use memory.. ,[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Events Any API Data Grid FACEBOOK FACEBOOK FACEBOOK
Step 2 – Collocate ,[object Object],Events Processing Grid Data Grid Data Grid Data Grid FACEBOOK FACEBOOK FACEBOOK
Step 2 – Collocate ,[object Object],Events Processing Grid Data Grid Data Grid Data Grid FACEBOOK FACEBOOK FACEBOOK @EventDriven   @Polling public   class   SimpleListener   { @EventTemplate Data   unprocessedData ()   { Data   template   =   new   Data (); template . setProcessed ( false ); return   template ; } @SpaceDataEvent public   Data   eventListener ( Data   event )   { //process Data here } }
Step 3 – Write behind to SQL/NoSQL Events Processing Grid Open Long Term  persistency Write  Behind FACEBOOK FACEBOOK FACEBOOK Data Grid Data Grid Data Grid
Economic Data Scaling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Memory Disk
Economic Scaling ,[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting it all together Analytic Application Event Sources Write  behind ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Generate Patterns
Putting it all together Analytic Application Event Sources Write  behind ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Generate Patterns Real Time Map/Reduce R Script  script = new StaticScritpt( “groovy”,”println hi; return 0”) Query  q = em.createNativeQuery( “execute ?”); q.setParamter(1, script); Integer  result = query.getSingleResult();
5x better performance per server! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Event injector Up to 128 threads GigaSpaces/  (Other Msg Server)  App Services Up to 128 threads  Other Giga 50,000 write/sec per server
Live demo Inter Day Activity (Real Time) Monthly Trend Analysis
5 Big Data Predictions  ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Summary Big Data  Development Made Simple:   Focus on your business logic, Use Big Data platform for dealing scalability, performance, continues availability ,.. Its Open: Use Any Stack : Avoid Lockin Any  database (RDBMS or NoSQL); Any Cloud, Use common API’s & Frameworks .  All While Minimizing Cost Use Memory & Disk  for optimum cost/performance  .  Built-in Automation and management - Reduces operational costs Elasticity – reduce over provisioning cost
Further reading.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Thank YOU! @natishalom http://blog.gigaspaces.com

Mais conteúdo relacionado

Mais procurados

Get started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google CloudGet started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google CloudDaniel Zivkovic
 
10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
 
Batch 21(14,64,66)
Batch 21(14,64,66)Batch 21(14,64,66)
Batch 21(14,64,66)swethadln
 
Social Media, Big Data
Social Media, Big Data Social Media, Big Data
Social Media, Big Data robin fay
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with CodeRi Liu
 
A mini project on designing a DATABASE for Library management system using mySQL
A mini project on designing a DATABASE for Library management system using mySQLA mini project on designing a DATABASE for Library management system using mySQL
A mini project on designing a DATABASE for Library management system using mySQLsvrohith 9
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Jonathan Gray
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
The Ultimate Guide to Implementing Conversational AI
The Ultimate Guide to Implementing Conversational AIThe Ultimate Guide to Implementing Conversational AI
The Ultimate Guide to Implementing Conversational AICeline Rayner
 
online movie ticket booking system
online movie ticket booking systemonline movie ticket booking system
online movie ticket booking systemSikandar Pandit
 
Book Store Management System - Functional Requirements - 2021
Book Store Management System - Functional Requirements - 2021Book Store Management System - Functional Requirements - 2021
Book Store Management System - Functional Requirements - 2021Bharat Chawda
 
Hrm database-management-java-project
Hrm database-management-java-projectHrm database-management-java-project
Hrm database-management-java-projectchetanmbhimewal
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligenceManish Jain
 
50 of the best chatbot use cases [Case studies Included]
50 of the best chatbot use cases [Case studies Included]50 of the best chatbot use cases [Case studies Included]
50 of the best chatbot use cases [Case studies Included]Sam Prasanth
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 

Mais procurados (20)

Get started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google CloudGet started with Dialogflow & Contact Center AI on Google Cloud
Get started with Dialogflow & Contact Center AI on Google Cloud
 
10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options
 
Batch 21(14,64,66)
Batch 21(14,64,66)Batch 21(14,64,66)
Batch 21(14,64,66)
 
Social Media, Big Data
Social Media, Big Data Social Media, Big Data
Social Media, Big Data
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with Code
 
A mini project on designing a DATABASE for Library management system using mySQL
A mini project on designing a DATABASE for Library management system using mySQLA mini project on designing a DATABASE for Library management system using mySQL
A mini project on designing a DATABASE for Library management system using mySQL
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Chatbot Technology
Chatbot TechnologyChatbot Technology
Chatbot Technology
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
The Ultimate Guide to Implementing Conversational AI
The Ultimate Guide to Implementing Conversational AIThe Ultimate Guide to Implementing Conversational AI
The Ultimate Guide to Implementing Conversational AI
 
online movie ticket booking system
online movie ticket booking systemonline movie ticket booking system
online movie ticket booking system
 
Book Store Management System - Functional Requirements - 2021
Book Store Management System - Functional Requirements - 2021Book Store Management System - Functional Requirements - 2021
Book Store Management System - Functional Requirements - 2021
 
Hrm database-management-java-project
Hrm database-management-java-projectHrm database-management-java-project
Hrm database-management-java-project
 
HOSPITAL MANAGEMENT SYSTEM PROJECT
HOSPITAL MANAGEMENT SYSTEM PROJECTHOSPITAL MANAGEMENT SYSTEM PROJECT
HOSPITAL MANAGEMENT SYSTEM PROJECT
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
50 of the best chatbot use cases [Case studies Included]
50 of the best chatbot use cases [Case studies Included]50 of the best chatbot use cases [Case studies Included]
50 of the best chatbot use cases [Case studies Included]
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 

Destaque

Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Eugene O'Loughlin
 
Principles of Data Visualization
Principles of Data VisualizationPrinciples of Data Visualization
Principles of Data VisualizationEamonn Maguire
 
Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기SangWoo Kim
 
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualizationZach Gemignani
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 

Destaque (6)

Data Visualization Tools
Data Visualization ToolsData Visualization Tools
Data Visualization Tools
 
Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17
 
Principles of Data Visualization
Principles of Data VisualizationPrinciples of Data Visualization
Principles of Data Visualization
 
Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기
 
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualization
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 

Semelhante a Big Data Real Time Analytics - A Facebook Case Study

Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias StorageFran Navarro
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overviewjimliddle
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Big data tim
Big data timBig data tim
Big data timT Weir
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsBig and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsFred Melo
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 

Semelhante a Big Data Real Time Analytics - A Facebook Case Study (20)

Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
Big Data
Big DataBig Data
Big Data
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Final deck
Final deckFinal deck
Final deck
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Big data tim
Big data timBig data tim
Big data tim
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsBig and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable Systems
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 

Mais de Nati Shalom

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Nati Shalom
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...Nati Shalom
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Nati Shalom
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like Nati Shalom
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production Nati Shalom
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackNati Shalom
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitNati Shalom
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaNati Shalom
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Nati Shalom
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined OperatorNati Shalom
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsNati Shalom
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)Nati Shalom
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to DevopsNati Shalom
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Nati Shalom
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOpsNati Shalom
 

Mais de Nati Shalom (20)

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail!
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the Summit
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & Tosca
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined Operator
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOps
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to Devops
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOps
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Big Data Real Time Analytics - A Facebook Case Study

  • 1. Real Time Analytics for Big Data Lessons from Facebook..
  • 2. The Real Time Boom.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved Google Real Time Web Analytics Google Real Time Search Facebook Real Time Social Analytics Twitter paid tweet analytics SaaS Real Time User Tracking New Real Time Analytics Startups..
  • 4. Note the Time dimension
  • 5. The data resolution & processing models
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Hadoop Map/Reduce – Reality check.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 12. So what’s the bottom line? ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 13. Facebook Real-time Analytics System ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 14.
  • 15.
  • 16.
  • 17. The solution.. PTail Scribe Puma Hbase HDFS Real Time Long Term Batch 1.5 Sec 10,000 write/sec per server FACEBOOK Log FACEBOOK Log FACEBOOK Log
  • 18. Checking the assumptions.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Step 3 – Write behind to SQL/NoSQL Events Processing Grid Open Long Term persistency Write Behind FACEBOOK FACEBOOK FACEBOOK Data Grid Data Grid Data Grid
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Live demo Inter Day Activity (Real Time) Monthly Trend Analysis
  • 31. 5 Big Data Predictions ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 32. Summary Big Data Development Made Simple: Focus on your business logic, Use Big Data platform for dealing scalability, performance, continues availability ,.. Its Open: Use Any Stack : Avoid Lockin Any database (RDBMS or NoSQL); Any Cloud, Use common API’s & Frameworks . All While Minimizing Cost Use Memory & Disk for optimum cost/performance . Built-in Automation and management - Reduces operational costs Elasticity – reduce over provisioning cost
  • 33. Further reading.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 34. Thank YOU! @natishalom http://blog.gigaspaces.com

Notas do Editor

  1. http://developers.facebook.com/blog/post/476/
  2. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html MySQL DB Counters Have a row with a key and a counter. Results in lots of database activity. Stats are kept at a day bucket granularity. Every day at midnight the stats would roll over.  When the roll over period is reached this resulted in a lot of writes to the database, which caused a lot of lock contention. Tried to spread the work by taking into account time zones.  Tried to shard things differently. The high write rate led to lock contention, it was easy to overload the databases, had to constantly monitor the databases, and had to rethink their sharding strategy. Solution not well tailored to the problem. In-Memory Counters If you are worried about bottlenecks in IO then throw it all in-memory. No scale issues. Counters are stored in memory so writes are fast and the counters are easy to shard. Felt in-memory counters, for reasons not explained, weren't as accurate as other approaches. Even a 1% failure rate would be unacceptable. Analytics drive money so the counters have to be highly accurate.  They didn't implement this system. It was a thought experiment and the accuracy issue caused them to move on. MapReduce Used Hadoop/Hive for previous solution.  Flexible. Easy to get running. Can handle IO, both massive writes and reads. Don't have to know how they will query ahead of time. The data can be stored and then queried. Not realtime. Many dependencies. Lots of points of failure. Complicated system. Not dependable enough to hit realtime goals. Cassandra HBase seemed a better solution based on availability and the write rate. Write rate was the huge bottleneck being solved.
  3. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html The Winner: HBase + Scribe + Ptail + Puma At a high level: HBase stores data across distributed machines. Use a tailing architecture, new events are stored in log files, and the logs are tailed. A system rolls the events up and writes them into storage. A UI pulls the data out and displays it to users. Data Flow User clicks Like on a web page. Fires AJAX request to Facebook. Request is written to a log file using Scribe.  Scribe handles issues like file roll over. Scribe is built on the same HTFS file store Hadoop is built on. Write extremely lean log lines. The more compact the log lines the more can be stored in memory. Ptail Data is read from the log files using Ptail. Ptail is an internal tool built to aggregate data from multiple Scribe stores. It tails the log files and pulls data out. Ptail data is separated out into three streams so they can eventually be sent to their own clusters in different datacenters. Plugin impression News feed impressions Actions (plugin + news feed) Puma Batch data to lessen the impact of hot keys. Even though HBase can handle a lot of writes per second they still want to batch data. A hot article will generate a lot of impressions and news feed impressions which will cause huge data skews which will cause IO issues. The more batching the better. Batch for 1.5 seconds on average. Would like to batch longer but they have so many URLs that they run out of memory when creating a hashtable. Wait for last flush to complete for starting new batch to avoid lock contention issues. UI  Renders Data Frontends are all written in PHP. The backend is written in Java and Thrift is used as the messaging format so PHP programs can query Java services. Caching solutions are used to make the web pages display more quickly. Performance varies by the statistic. A counter can come back quickly. Find the top URL in a domain can take longer. Range from .5 to a few seconds.  The more and longer data is cached the less realtime it is. Set different caching TTLs in memcache. MapReduce The data is then sent to MapReduce servers so it can be queried via Hive. This also serves as a backup plan as the data can be recovered from Hive. Raw logs are removed after a period of time. HBase is a distribute column store.  Database interface to Hadoop. Facebook has people working internally on HBase.  Unlike a relational database you don't create mappings between tables. You don't create indexes. The only index you have a primary row key. From the row key you can have millions of sparse columns of storage. It's very flexible. You don't have to specify the schema. You define column families to which you can add keys at anytime. Key feature to scalability and reliability is the WAL, write ahead log, which is a log of the operations that are supposed to occur.  Based on the key, data is sharded to a region server.  Written to WAL first. Data is put into memory. At some point in time or if enough data has been accumulated the data is flushed to disk. If the machine goes down you can recreate the data from the WAL. So there's no permanent data loss. Use a combination of the log and in-memory storage they can handle an extremely high rate of IO reliably.  HBase handles failure detection and automatically routes across failures. Currently HBase resharding is done manually. Automatic hot spot detection and resharding is on the roadmap for HBase, but it's not there yet. Every Tuesday someone looks at the keys and decides what changes to make in the sharding plan. Schema  Store on a per URL basis a bunch of counters. A row key, which is the only lookup key, is the MD5 hash of the reverse domain Selecting the proper key structure helps with scanning and sharding. A problem they have is sharding data properly onto different machines. Using a MD5 hash makes it easier to say this range goes here and that range goes there.  For URLs they do something similar, plus they add an ID on top of that. Every URL in Facebook is represented by a unique ID, which is used to help with sharding. A reverse domain,  com.facebook/  for example, is used so that the data is clustered together. HBase is really good at scanning clustered data, so if they store the data so it's clustered together they can efficiently calculate stats across domains.  Think of every row a URL and every cell as a counter, you are able to set different TTLs (time to live) for each cell. So if keeping an hourly count there's no reason to keep that around for every URL forever, so they set a TTL of two weeks. Typically set TTLs on a per column family basis.  Per server they can handle 10,000 writes per second.  Checkpointing is used to prevent data loss when reading data from log files.  Tailers save log stream check points  in HBase. Replayed on startup so won't lose data. Useful for detecting click fraud, but it doesn't have fraud detection built in. Tailer Hot Spots In a distributed system there's a chance one part of the system can be hotter than another. One example are region servers that can be hot because more keys are being directed that way. One tailer can be lag behind another too. If one tailer is an hour behind and the others are up to date, what numbers do you display in the UI? For example, impressions have a way higher volume than actions, so CTR rates were way higher in the last hour. Solution is to figure out the least up to date tailer and use that when querying metrics.
  4. A Potential for Improvement There are lots of areas in which you can see potential improvements, if the assumptions are changed. As a contrast to Facebook's working system: We can simplify the design. If memory can be seen as transactional - and it can - we can use them without transforming them as they proceed along our analytics workflow. This makes our design and implementation much simpler to implement and test, and performance improves as well. We can strengthen the design. With a polling semantic, such systems are brittle, relying on systems that pull data in order to generate realtime analytics data. We should be able to reduce the fragility of the system, even while making it faster. We can strengthen the implementation. With batching subsystems, there are limits shouldn’t exist. For example, one concern in Facebook's implementation is the use of an in-memory hash table that stores intermediate data; the in-memory aspect isn’t a concern until you realize that the batch sizes are chosen partially to make sure that this hash table doesn’t overflow available space. We can allow deployments to change databases based on their requirements. There's nothing wrong with HBase, but it's got specific characteristics that aren't appropriate for all enterprises. We can design a system which you’d be able to deploy on various and flexible platforms, and we can migrate the underlying long-term data store to a different database if needed. We can consolidate the analytics system so that management is easier and unified. While there are system management standards like SNMP that allow management events to be presented  in the same way no matter the source, having so many different pieces means that managing the system requires an encompassing understanding, which makes maintenance and scaling more difficult. What we want to do, then, is create a general model for an application that can accomplish the same goals as Facebook’s realtime analytics system, while leveraging the capabilities that in-memory data grids offer where available, potentially offering improvement in the areas of scalability, manageability, latency, platform neutrality, and simplicity, all while increasing ease of data access. That sounds like quite a tall order, but it’s doable. The key is to remember that at heart, realtime analytics represent an events system. Facebook’s entire architecture is designed to funnel events through various channels, such that they can safely and sequentially manage event updates. Therefore, they receive a massive set of events that “look like” marbles, which they line up in single file; they then sort the marbles by color, you might say, and for each color they create a bundle of sticks; the sticks are lit on fire, and when the heat goes up past a certain temperature, steam is generated, which turns a turbine. It’s a real-life Rube Goldberg machine, which is admirable in that it works, but much of it is still unnecessary if the assumptions about memory ("unreliable") and database ("HBase is the only target that counts") are changed. Looking at the analogy from the previous paragraph, there’s no need to change a marble into anything. The marble is enough.
  5. Value Write/Read scaling through partitioning Performance through Memory speed Reliability through replication and redundancy
  6. Value Data Grid like GigaSpaces comes with rich set of API that provides not only the mean to store data fast and reliably but also access the data, query it just as you would do with a database. Specifically for GigaSpaces we support both JPA and Document API and the way to mix and match between those API’s Unlike Scribe and log system we can now look at the data as it comes in and not only once it is stored into the database The later makes it possible to partition data based on time – First day in memory and the rest through the database etc.
  7. Collocating the processing with the data can provides the biggest gain in terms of scalbility and performance as we reduce the amount of network hops as well as serialization overhead. We also reduce the number of moving parts which in itself simplifies our runtime architecture and our ability to scale. The other benefit is that we decentralize the Puma services from the facebook example and thus make the entire architecture significantly more scalable.
  8. He snippet of code shows the part of the code that generate the statistical information as the events comes in The template defines the fliter for the events. In the above example we will filter any event that is of type Data that has a false value in its “processed” attribute. For every event that match this filter the method eventListener will be called with the appropriate data object.
  9. Value gained: Avoid lockin to specific NoSQL API Performance – reduced network hops, serialization overhead Simplicity – less moving parts Scalability without compromising on consistency (Strict consistency at the front, eventual consistency for the long term data) JPA/Stanard API
  10. content based routing, workflow