SlideShare uma empresa Scribd logo
1 de 34
Real Time Analytics for Big Data Lessons from Facebook..
The Real Time Boom.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Google Real Time  Web Analytics Google Real Time Search Facebook Real Time  Social Analytics  Twitter paid tweet analytics SaaS Real Time User Tracking New Real Time Analytics  Startups..
Analytics @ Twitter
Note the Time dimension
The data resolution & processing models
Traditional analytics applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
CEP – Complex Event Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
In Memory Data Grid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
NoSQL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Hadoop MapReudce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Hadoop Map/Reduce – Reality check.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
So what’s the bottom line? ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Facebook Real-time Analytics System ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Goals ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
The actual analytics.. ,[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Technology Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
The solution.. PTail Scribe Puma Hbase HDFS Real Time Long Term Batch 1.5 Sec 10,000 write/sec per server FACEBOOK Log FACEBOOK Log FACEBOOK Log
Checking the assumptions.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Facebook Analytics.Next.. ,[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  ,[object Object],[object Object],[object Object]
Step 1: Use memory.. ,[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Events Memory Grid Data Grid Data Grid Data Grid FACEBOOK FACEBOOK FACEBOOK
Step 1: Use memory.. ,[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Events Any API Data Grid FACEBOOK FACEBOOK FACEBOOK
Step 2 – Collocate ,[object Object],Events Processing Grid Data Grid Data Grid Data Grid FACEBOOK FACEBOOK FACEBOOK
Step 2 – Collocate ,[object Object],Events Processing Grid Data Grid Data Grid Data Grid FACEBOOK FACEBOOK FACEBOOK @EventDriven   @Polling public   class   SimpleListener   { @EventTemplate Data   unprocessedData ()   { Data   template   =   new   Data (); template . setProcessed ( false ); return   template ; } @SpaceDataEvent public   Data   eventListener ( Data   event )   { //process Data here } }
Step 3 – Write behind to SQL/NoSQL Events Processing Grid Open Long Term  persistency Write  Behind FACEBOOK FACEBOOK FACEBOOK Data Grid Data Grid Data Grid
Economic Data Scaling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved  Memory Disk
Economic Scaling ,[object Object],[object Object],[object Object],[object Object],® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting it all together Analytic Application Event Sources Write  behind ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Generate Patterns
Putting it all together Analytic Application Event Sources Write  behind ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Generate Patterns Real Time Map/Reduce R Script  script = new StaticScritpt( “groovy”,”println hi; return 0”) Query  q = em.createNativeQuery( “execute ?”); q.setParamter(1, script); Integer  result = query.getSingleResult();
5x better performance per server! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Event injector Up to 128 threads GigaSpaces/  (Other Msg Server)  App Services Up to 128 threads  Other Giga 50,000 write/sec per server
Live demo Inter Day Activity (Real Time) Monthly Trend Analysis
5 Big Data Predictions  ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Summary Big Data  Development Made Simple:   Focus on your business logic, Use Big Data platform for dealing scalability, performance, continues availability ,.. Its Open: Use Any Stack : Avoid Lockin Any  database (RDBMS or NoSQL); Any Cloud, Use common API’s & Frameworks .  All While Minimizing Cost Use Memory & Disk  for optimum cost/performance  .  Built-in Automation and management - Reduces operational costs Elasticity – reduce over provisioning cost
Further reading.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Thank YOU! @natishalom http://blog.gigaspaces.com

Mais conteúdo relacionado

Mais procurados

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
Youtube big data analysis using hadoop,pig,hive
Youtube big data analysis using hadoop,pig,hiveYoutube big data analysis using hadoop,pig,hive
Youtube big data analysis using hadoop,pig,hivepankaj chhipa
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 

Mais procurados (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Youtube big data analysis using hadoop,pig,hive
Youtube big data analysis using hadoop,pig,hiveYoutube big data analysis using hadoop,pig,hive
Youtube big data analysis using hadoop,pig,hive
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 

Destaque

Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Eugene O'Loughlin
 
Principles of Data Visualization
Principles of Data VisualizationPrinciples of Data Visualization
Principles of Data VisualizationEamonn Maguire
 
Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기SangWoo Kim
 
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualizationZach Gemignani
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 

Destaque (6)

Data Visualization Tools
Data Visualization ToolsData Visualization Tools
Data Visualization Tools
 
Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17
 
Principles of Data Visualization
Principles of Data VisualizationPrinciples of Data Visualization
Principles of Data Visualization
 
Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기Apache Zeppelin으로 데이터 분석하기
Apache Zeppelin으로 데이터 분석하기
 
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualization
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 

Semelhante a Big Data Real Time Analytics - A Facebook Case Study

Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias StorageFran Navarro
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overviewjimliddle
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Big data tim
Big data timBig data tim
Big data timT Weir
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsBig and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsFred Melo
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 

Semelhante a Big Data Real Time Analytics - A Facebook Case Study (20)

Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
Big Data
Big DataBig Data
Big Data
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Final deck
Final deckFinal deck
Final deck
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Big data tim
Big data timBig data tim
Big data tim
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsBig and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable Systems
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 

Mais de Nati Shalom

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Nati Shalom
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...Nati Shalom
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Nati Shalom
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like Nati Shalom
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production Nati Shalom
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackNati Shalom
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitNati Shalom
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaNati Shalom
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Nati Shalom
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined OperatorNati Shalom
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsNati Shalom
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)Nati Shalom
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to DevopsNati Shalom
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Nati Shalom
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOpsNati Shalom
 

Mais de Nati Shalom (20)

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail!
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the Summit
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & Tosca
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined Operator
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOps
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to Devops
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOps
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Big Data Real Time Analytics - A Facebook Case Study

  • 1. Real Time Analytics for Big Data Lessons from Facebook..
  • 2. The Real Time Boom.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved Google Real Time Web Analytics Google Real Time Search Facebook Real Time Social Analytics Twitter paid tweet analytics SaaS Real Time User Tracking New Real Time Analytics Startups..
  • 4. Note the Time dimension
  • 5. The data resolution & processing models
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Hadoop Map/Reduce – Reality check.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 12. So what’s the bottom line? ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 13. Facebook Real-time Analytics System ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 14.
  • 15.
  • 16.
  • 17. The solution.. PTail Scribe Puma Hbase HDFS Real Time Long Term Batch 1.5 Sec 10,000 write/sec per server FACEBOOK Log FACEBOOK Log FACEBOOK Log
  • 18. Checking the assumptions.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Step 3 – Write behind to SQL/NoSQL Events Processing Grid Open Long Term persistency Write Behind FACEBOOK FACEBOOK FACEBOOK Data Grid Data Grid Data Grid
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Live demo Inter Day Activity (Real Time) Monthly Trend Analysis
  • 31. 5 Big Data Predictions ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 32. Summary Big Data Development Made Simple: Focus on your business logic, Use Big Data platform for dealing scalability, performance, continues availability ,.. Its Open: Use Any Stack : Avoid Lockin Any database (RDBMS or NoSQL); Any Cloud, Use common API’s & Frameworks . All While Minimizing Cost Use Memory & Disk for optimum cost/performance . Built-in Automation and management - Reduces operational costs Elasticity – reduce over provisioning cost
  • 33. Further reading.. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 34. Thank YOU! @natishalom http://blog.gigaspaces.com

Notas do Editor

  1. http://developers.facebook.com/blog/post/476/
  2. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html MySQL DB Counters Have a row with a key and a counter. Results in lots of database activity. Stats are kept at a day bucket granularity. Every day at midnight the stats would roll over.  When the roll over period is reached this resulted in a lot of writes to the database, which caused a lot of lock contention. Tried to spread the work by taking into account time zones.  Tried to shard things differently. The high write rate led to lock contention, it was easy to overload the databases, had to constantly monitor the databases, and had to rethink their sharding strategy. Solution not well tailored to the problem. In-Memory Counters If you are worried about bottlenecks in IO then throw it all in-memory. No scale issues. Counters are stored in memory so writes are fast and the counters are easy to shard. Felt in-memory counters, for reasons not explained, weren't as accurate as other approaches. Even a 1% failure rate would be unacceptable. Analytics drive money so the counters have to be highly accurate.  They didn't implement this system. It was a thought experiment and the accuracy issue caused them to move on. MapReduce Used Hadoop/Hive for previous solution.  Flexible. Easy to get running. Can handle IO, both massive writes and reads. Don't have to know how they will query ahead of time. The data can be stored and then queried. Not realtime. Many dependencies. Lots of points of failure. Complicated system. Not dependable enough to hit realtime goals. Cassandra HBase seemed a better solution based on availability and the write rate. Write rate was the huge bottleneck being solved.
  3. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html The Winner: HBase + Scribe + Ptail + Puma At a high level: HBase stores data across distributed machines. Use a tailing architecture, new events are stored in log files, and the logs are tailed. A system rolls the events up and writes them into storage. A UI pulls the data out and displays it to users. Data Flow User clicks Like on a web page. Fires AJAX request to Facebook. Request is written to a log file using Scribe.  Scribe handles issues like file roll over. Scribe is built on the same HTFS file store Hadoop is built on. Write extremely lean log lines. The more compact the log lines the more can be stored in memory. Ptail Data is read from the log files using Ptail. Ptail is an internal tool built to aggregate data from multiple Scribe stores. It tails the log files and pulls data out. Ptail data is separated out into three streams so they can eventually be sent to their own clusters in different datacenters. Plugin impression News feed impressions Actions (plugin + news feed) Puma Batch data to lessen the impact of hot keys. Even though HBase can handle a lot of writes per second they still want to batch data. A hot article will generate a lot of impressions and news feed impressions which will cause huge data skews which will cause IO issues. The more batching the better. Batch for 1.5 seconds on average. Would like to batch longer but they have so many URLs that they run out of memory when creating a hashtable. Wait for last flush to complete for starting new batch to avoid lock contention issues. UI  Renders Data Frontends are all written in PHP. The backend is written in Java and Thrift is used as the messaging format so PHP programs can query Java services. Caching solutions are used to make the web pages display more quickly. Performance varies by the statistic. A counter can come back quickly. Find the top URL in a domain can take longer. Range from .5 to a few seconds.  The more and longer data is cached the less realtime it is. Set different caching TTLs in memcache. MapReduce The data is then sent to MapReduce servers so it can be queried via Hive. This also serves as a backup plan as the data can be recovered from Hive. Raw logs are removed after a period of time. HBase is a distribute column store.  Database interface to Hadoop. Facebook has people working internally on HBase.  Unlike a relational database you don't create mappings between tables. You don't create indexes. The only index you have a primary row key. From the row key you can have millions of sparse columns of storage. It's very flexible. You don't have to specify the schema. You define column families to which you can add keys at anytime. Key feature to scalability and reliability is the WAL, write ahead log, which is a log of the operations that are supposed to occur.  Based on the key, data is sharded to a region server.  Written to WAL first. Data is put into memory. At some point in time or if enough data has been accumulated the data is flushed to disk. If the machine goes down you can recreate the data from the WAL. So there's no permanent data loss. Use a combination of the log and in-memory storage they can handle an extremely high rate of IO reliably.  HBase handles failure detection and automatically routes across failures. Currently HBase resharding is done manually. Automatic hot spot detection and resharding is on the roadmap for HBase, but it's not there yet. Every Tuesday someone looks at the keys and decides what changes to make in the sharding plan. Schema  Store on a per URL basis a bunch of counters. A row key, which is the only lookup key, is the MD5 hash of the reverse domain Selecting the proper key structure helps with scanning and sharding. A problem they have is sharding data properly onto different machines. Using a MD5 hash makes it easier to say this range goes here and that range goes there.  For URLs they do something similar, plus they add an ID on top of that. Every URL in Facebook is represented by a unique ID, which is used to help with sharding. A reverse domain,  com.facebook/  for example, is used so that the data is clustered together. HBase is really good at scanning clustered data, so if they store the data so it's clustered together they can efficiently calculate stats across domains.  Think of every row a URL and every cell as a counter, you are able to set different TTLs (time to live) for each cell. So if keeping an hourly count there's no reason to keep that around for every URL forever, so they set a TTL of two weeks. Typically set TTLs on a per column family basis.  Per server they can handle 10,000 writes per second.  Checkpointing is used to prevent data loss when reading data from log files.  Tailers save log stream check points  in HBase. Replayed on startup so won't lose data. Useful for detecting click fraud, but it doesn't have fraud detection built in. Tailer Hot Spots In a distributed system there's a chance one part of the system can be hotter than another. One example are region servers that can be hot because more keys are being directed that way. One tailer can be lag behind another too. If one tailer is an hour behind and the others are up to date, what numbers do you display in the UI? For example, impressions have a way higher volume than actions, so CTR rates were way higher in the last hour. Solution is to figure out the least up to date tailer and use that when querying metrics.
  4. A Potential for Improvement There are lots of areas in which you can see potential improvements, if the assumptions are changed. As a contrast to Facebook's working system: We can simplify the design. If memory can be seen as transactional - and it can - we can use them without transforming them as they proceed along our analytics workflow. This makes our design and implementation much simpler to implement and test, and performance improves as well. We can strengthen the design. With a polling semantic, such systems are brittle, relying on systems that pull data in order to generate realtime analytics data. We should be able to reduce the fragility of the system, even while making it faster. We can strengthen the implementation. With batching subsystems, there are limits shouldn’t exist. For example, one concern in Facebook's implementation is the use of an in-memory hash table that stores intermediate data; the in-memory aspect isn’t a concern until you realize that the batch sizes are chosen partially to make sure that this hash table doesn’t overflow available space. We can allow deployments to change databases based on their requirements. There's nothing wrong with HBase, but it's got specific characteristics that aren't appropriate for all enterprises. We can design a system which you’d be able to deploy on various and flexible platforms, and we can migrate the underlying long-term data store to a different database if needed. We can consolidate the analytics system so that management is easier and unified. While there are system management standards like SNMP that allow management events to be presented  in the same way no matter the source, having so many different pieces means that managing the system requires an encompassing understanding, which makes maintenance and scaling more difficult. What we want to do, then, is create a general model for an application that can accomplish the same goals as Facebook’s realtime analytics system, while leveraging the capabilities that in-memory data grids offer where available, potentially offering improvement in the areas of scalability, manageability, latency, platform neutrality, and simplicity, all while increasing ease of data access. That sounds like quite a tall order, but it’s doable. The key is to remember that at heart, realtime analytics represent an events system. Facebook’s entire architecture is designed to funnel events through various channels, such that they can safely and sequentially manage event updates. Therefore, they receive a massive set of events that “look like” marbles, which they line up in single file; they then sort the marbles by color, you might say, and for each color they create a bundle of sticks; the sticks are lit on fire, and when the heat goes up past a certain temperature, steam is generated, which turns a turbine. It’s a real-life Rube Goldberg machine, which is admirable in that it works, but much of it is still unnecessary if the assumptions about memory ("unreliable") and database ("HBase is the only target that counts") are changed. Looking at the analogy from the previous paragraph, there’s no need to change a marble into anything. The marble is enough.
  5. Value Write/Read scaling through partitioning Performance through Memory speed Reliability through replication and redundancy
  6. Value Data Grid like GigaSpaces comes with rich set of API that provides not only the mean to store data fast and reliably but also access the data, query it just as you would do with a database. Specifically for GigaSpaces we support both JPA and Document API and the way to mix and match between those API’s Unlike Scribe and log system we can now look at the data as it comes in and not only once it is stored into the database The later makes it possible to partition data based on time – First day in memory and the rest through the database etc.
  7. Collocating the processing with the data can provides the biggest gain in terms of scalbility and performance as we reduce the amount of network hops as well as serialization overhead. We also reduce the number of moving parts which in itself simplifies our runtime architecture and our ability to scale. The other benefit is that we decentralize the Puma services from the facebook example and thus make the entire architecture significantly more scalable.
  8. He snippet of code shows the part of the code that generate the statistical information as the events comes in The template defines the fliter for the events. In the above example we will filter any event that is of type Data that has a false value in its “processed” attribute. For every event that match this filter the method eventListener will be called with the appropriate data object.
  9. Value gained: Avoid lockin to specific NoSQL API Performance – reduced network hops, serialization overhead Simplicity – less moving parts Scalability without compromising on consistency (Strict consistency at the front, eventual consistency for the long term data) JPA/Stanard API
  10. content based routing, workflow