SlideShare a Scribd company logo
1 of 13
May 06, 2014
Building a Lambda
Architecture with
Elasticsearch at Yieldbot
Richard Shea,
CTO
@shearic
David White,
Platform Architect
@dtabwhite
Batch computation layer
(canonical eg. Hadoop -> HBase)
Real-time computation layer
(canonical eg. Storm -> Cassandra)
Serving layer
(query HBase, query Cassandra, mix and return)
Slide 2
Lambda Architecture Summary
Clickstreams of Events
(pageviews, impressions, clicks, etc)
Events contain attributes
Aggregating Counts and Performance
Breakdowns by Several Dimensions
Slide 3
Our Use Case
Slide 4
Our Prior Approach
Two different types of systems
Two different access patterns
Query ability limited
Batch
(Hbase)
Realtime
(Redis)
Slide 5
Kafka
Persisted event queue
Consumers keep track of offset
Horizontally scalable, topics can be partitioned, etc.
Slide 6
Real-time Layer of Lambda with ES
Daily Index of “raw” events – each event is a document
Elasticsearch Kafka River to index
Real-time processing is trivial, just indexing events
Aggregation of Real-time info pushed to query-time
Slide 7
Batch Layer of Lambda with ES
Monthly Index of Aggregated Data Documents
Hourly Re-index events from archived, covers real-time
issues
Aggregate desires breakdowns into documents
When done, note most recent hour completed
Slide 8
Serving Layer of Lambda with ES
Query Aggregated Data Documents as much as possible
Query Raw events from last aggregated available to present
Combine Aggregated and Raw query results together and return
We use Node.js, natural fit
Slide 9
Why Elasticsearch?
- calculations query-time and flexible
- real-time is simple
Real-time
- some pre-calculation
- query-time ties it together
Batch
Serving
- queries are flexible
- batch and real-time query access patterns similar
Slide 10
More Elasticsearch Goodies
Kibana
- Mostly real-time events
- Aggregated documents useful too
Snapshotting for backups
Real-time data daily indexes are optimized
Slide 11
Future
ES Aggregations
Split cluster with Tribe Nodes
Aggregation via Spark
Slide 12
Good Lessons
Use index aliases
Build in operational plan to re-index
doc_values for raw events and high cardinality query
results
Thank You

More Related Content

What's hot

Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 

What's hot (20)

Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 

Viewers also liked

2012 ch p_vo-4kl
2012 ch p_vo-4kl2012 ch p_vo-4kl
2012 ch p_vo-4kl
Rosislide
 
Ch p 4-kl_12_maj_2010_test_klyuch
Ch p 4-kl_12_maj_2010_test_klyuchCh p 4-kl_12_maj_2010_test_klyuch
Ch p 4-kl_12_maj_2010_test_klyuch
Rosislide
 
4kl ch p_test-2008
4kl ch p_test-20084kl ch p_test-2008
4kl ch p_test-2008
Rosislide
 
2011 bel 4kl
2011 bel 4kl2011 bel 4kl
2011 bel 4kl
Rosislide
 
4kl ch o_test-2008
4kl ch o_test-20084kl ch o_test-2008
4kl ch o_test-2008
Rosislide
 
Zuch 2014 presentation GD12/11/2013
Zuch 2014 presentation GD12/11/2013Zuch 2014 presentation GD12/11/2013
Zuch 2014 presentation GD12/11/2013
Andrey Hrutba
 
Inventory management
Inventory managementInventory management
Inventory management
Larona Moleje
 
"Здоровый город" в Новосибирске
"Здоровый город" в Новосибирске"Здоровый город" в Новосибирске
"Здоровый город" в Новосибирске
Алексей Алексеев
 
Make Your Own Charting Library with d3
Make Your Own Charting Library with d3Make Your Own Charting Library with d3
Make Your Own Charting Library with d3
yieldbot
 
Bel 4 kl-10_maj_2010-test_klyuchove
Bel 4 kl-10_maj_2010-test_klyuchoveBel 4 kl-10_maj_2010-test_klyuchove
Bel 4 kl-10_maj_2010-test_klyuchove
Rosislide
 

Viewers also liked (20)

How we (Almost) Forgot Lambda Architecture and used Elasticsearch
How we (Almost) Forgot Lambda Architecture and used ElasticsearchHow we (Almost) Forgot Lambda Architecture and used Elasticsearch
How we (Almost) Forgot Lambda Architecture and used Elasticsearch
 
7가지 동시성 모델 람다아키텍처
7가지 동시성 모델  람다아키텍처7가지 동시성 모델  람다아키텍처
7가지 동시성 모델 람다아키텍처
 
람다아키텍처
람다아키텍처람다아키텍처
람다아키텍처
 
Elasticsearch Modeling - 정호욱
Elasticsearch Modeling - 정호욱Elasticsearch Modeling - 정호욱
Elasticsearch Modeling - 정호욱
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
2012 ch p_vo-4kl
2012 ch p_vo-4kl2012 ch p_vo-4kl
2012 ch p_vo-4kl
 
CORINTHIANS
CORINTHIANSCORINTHIANS
CORINTHIANS
 
Ch p 4-kl_12_maj_2010_test_klyuch
Ch p 4-kl_12_maj_2010_test_klyuchCh p 4-kl_12_maj_2010_test_klyuch
Ch p 4-kl_12_maj_2010_test_klyuch
 
Instructions by the symbol ppt
Instructions by the symbol pptInstructions by the symbol ppt
Instructions by the symbol ppt
 
Mapa capitulo 4 con audio
Mapa capitulo 4 con audioMapa capitulo 4 con audio
Mapa capitulo 4 con audio
 
4kl ch p_test-2008
4kl ch p_test-20084kl ch p_test-2008
4kl ch p_test-2008
 
2011 bel 4kl
2011 bel 4kl2011 bel 4kl
2011 bel 4kl
 
4kl ch o_test-2008
4kl ch o_test-20084kl ch o_test-2008
4kl ch o_test-2008
 
Zuch 2014 presentation GD12/11/2013
Zuch 2014 presentation GD12/11/2013Zuch 2014 presentation GD12/11/2013
Zuch 2014 presentation GD12/11/2013
 
Inventory management
Inventory managementInventory management
Inventory management
 
"Здоровый город" в Новосибирске
"Здоровый город" в Новосибирске"Здоровый город" в Новосибирске
"Здоровый город" в Новосибирске
 
Make Your Own Charting Library with d3
Make Your Own Charting Library with d3Make Your Own Charting Library with d3
Make Your Own Charting Library with d3
 
Data as Currency - OPS
Data as Currency - OPSData as Currency - OPS
Data as Currency - OPS
 
Real-time Intent Beyond Search
Real-time Intent Beyond SearchReal-time Intent Beyond Search
Real-time Intent Beyond Search
 
Bel 4 kl-10_maj_2010-test_klyuchove
Bel 4 kl-10_maj_2010-test_klyuchoveBel 4 kl-10_maj_2010-test_klyuchove
Bel 4 kl-10_maj_2010-test_klyuchove
 

Similar to Building a Lambda Architecture with Elasticsearch at Yieldbot

Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 

Similar to Building a Lambda Architecture with Elasticsearch at Yieldbot (20)

Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
re:Invent Recap Breakfast
re:Invent Recap Breakfastre:Invent Recap Breakfast
re:Invent Recap Breakfast
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
מיכאל
מיכאלמיכאל
מיכאל
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Samedi SQL Québec - La plateforme data de Azure
Samedi SQL Québec - La plateforme data de AzureSamedi SQL Québec - La plateforme data de Azure
Samedi SQL Québec - La plateforme data de Azure
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Building a Lambda Architecture with Elasticsearch at Yieldbot

  • 1. May 06, 2014 Building a Lambda Architecture with Elasticsearch at Yieldbot Richard Shea, CTO @shearic David White, Platform Architect @dtabwhite
  • 2. Batch computation layer (canonical eg. Hadoop -> HBase) Real-time computation layer (canonical eg. Storm -> Cassandra) Serving layer (query HBase, query Cassandra, mix and return) Slide 2 Lambda Architecture Summary
  • 3. Clickstreams of Events (pageviews, impressions, clicks, etc) Events contain attributes Aggregating Counts and Performance Breakdowns by Several Dimensions Slide 3 Our Use Case
  • 4. Slide 4 Our Prior Approach Two different types of systems Two different access patterns Query ability limited Batch (Hbase) Realtime (Redis)
  • 5. Slide 5 Kafka Persisted event queue Consumers keep track of offset Horizontally scalable, topics can be partitioned, etc.
  • 6. Slide 6 Real-time Layer of Lambda with ES Daily Index of “raw” events – each event is a document Elasticsearch Kafka River to index Real-time processing is trivial, just indexing events Aggregation of Real-time info pushed to query-time
  • 7. Slide 7 Batch Layer of Lambda with ES Monthly Index of Aggregated Data Documents Hourly Re-index events from archived, covers real-time issues Aggregate desires breakdowns into documents When done, note most recent hour completed
  • 8. Slide 8 Serving Layer of Lambda with ES Query Aggregated Data Documents as much as possible Query Raw events from last aggregated available to present Combine Aggregated and Raw query results together and return We use Node.js, natural fit
  • 9. Slide 9 Why Elasticsearch? - calculations query-time and flexible - real-time is simple Real-time - some pre-calculation - query-time ties it together Batch Serving - queries are flexible - batch and real-time query access patterns similar
  • 10. Slide 10 More Elasticsearch Goodies Kibana - Mostly real-time events - Aggregated documents useful too Snapshotting for backups Real-time data daily indexes are optimized
  • 11. Slide 11 Future ES Aggregations Split cluster with Tribe Nodes Aggregation via Spark
  • 12. Slide 12 Good Lessons Use index aliases Build in operational plan to re-index doc_values for raw events and high cardinality query results