SlideShare uma empresa Scribd logo
1 de 22
How to Architect Big Data Apps with the
Lambda Architecture
OCTOBER 2014
Altan Khendup – Big Data Architect
Ron Bodkin – Founder Think Big, a Teradata company
2
Real-Time
• Low latency
– Query response
– Data refresh
– End-to-end response
• … nanoseconds, milliseconds, seconds, or minutes
depending on your problem
• Two basic patterns
– Strategic insight: decision support
– Process execution: system of engagement/operational analytics
Copyright 2013-2014 Think Big, a Teradata
Company
3
• Many users looking to gain valuable insights from both
batch and real-time systems
• User Characteristics
– Do not always understand the complexities of tackling this
challenge
– Also want to use familiar/easy-to-use interfaces wherever
possible
– Want best practices about ways to integrate real-time
(current) and batch (historical)
– Often not aware of all the options and trade-offs among them
Real-time Demand Growing
© 2014 Teradata
4
• Lambda Architecture…
– Provides a common architectural pattern for discussion
– Provides a more clear picture of the complexities typically
found in most organizations
• Some challenges in tackling Lambda architecture
– Complete Lambda requires more than just a single system
- Typically requires multiple components
- E.g. Batch/cold storage via e.g. Hadoop, Real-time/current data
via e.g. Storm, Query via e.g. business analysis using a database
– Also some challenges in delivering results to the business
- Coordination is very difficult across the stack
- Quality results back to the organization very important
– Takes a lot of knowledge/expertise/technology to tackle
– Not typically a first step in Big Data implementation
Enter Lambda Architecture
© 2014 Teradata
5
Background of Lambda Architecture
Background
– Reference architecture for Big Data systems
– Designed by Nathan Marz (Twitter)
– Defined as a system that runs arbitrary functions on arbitrary
data
– “query = function(all data)”
Design Principles
– Human fault-tolerant, Immutability, Computable
Lambda Layers
– Batch - Contains the immutable, constantly growing master
dataset.
– Speed - Deals only with new data and compensates for the
high latency updates of the serving layer.
– Serving - Loads and exposes the combined view of data so
that they can be queried.
6
Overview of Lambda Architecture
7 © 2014 Teradata
USE CASE - MEDICAL
Every year, more than a million people from all 50 states
and nearly 150 countries come for care
Challenges in Medical Data
Health data tends to be “wide”, not “deep”
New data types are becoming more important
Unstructured
Real-time streaming
A challenge to generally move from retrospective “BI”
viewing to event-based and predictive analytics usage
Optimize an existing Natural Language Processing
pipeline in support of critical Colorectal Surgery
(Move to tens of thousands of documents processed)
Replace an existing free-text search facility used by
Clinical Web Service for colorectal cancer
(Move search to milliseconds)
10
Overall Architecture
11
• Current Storm throughput up to 1.5 million documents per hour
• Average of 140,000 HL7 messages actually processed per day with
average latency of 60 milliseconds from ingest to persistence
• Average of 50,000 documents passed through annotators per day
versus 5,000 historically
• Actual annotations of documents up to 6 times faster than previously
accomplished
• Free-text search use cases that took over 30 minutes on old
infrastructure completing in milliseconds in ElasticSearch
Operational Statistics
12
• Challenges
– Multiple layers
- Lots of events, data
– Complex
- Lots of different languages and data structures
– Difficult to maintain
- Lots of moving pieces/components/technologies
- Lots of changes for the business
• Need for Practical Lambda approach
– Based on real-world implementations
– Metadata model (events and data)
– Discrete data (query focused datasets)
– Data convergence (holistic query focused dataset)
Implementing Lambda
13
Active Executor Lambda Framework
Real Time and Lambda
15
 Real-Time isn’t free!
- 1 hour vs. 5 min vs. seconds
- And may not be meaningful anyhow
- Is there a robot or a human in the loop?
 Simpler Instantiations of Lambda
- Micro-Batch Feeds & Real-Time Queries
- Embarrassingly Parallel Speed Layer
- Transient Speed Layer
- … One database for Speed & Serving (RDBMS or NoSQL)
KISS
16
 Understanding consumer purchase behavior across more
than one touch point to drive holistic results
 Each channel for consumer marketing and engagement
has siloed applications and analytic tools
 Correlating behavior across channels to understand
customer journeys allows better engagement (e.g., web,
mobile, call center, in store, email, social)
 Common goals: increased response rates, increased
share of wallet, reduced churn, focus on high value
customers, increase customer satisfaction
 Challenges: data volumes, correlation/sessionization,
feature discovery
Use Case: Cross-Channel Behavior Analytics
17
 Many analytics use cases can be handled with update latencies of a
few minutes
 Micro-batching allows for dramatic efficiency improvements
- … can extend to updates per event with additional infrastructure
 Pre-aggregation (HBase, MPP, etc.) can serve many users
 Hadoop query (Hive 0.13+ / Tez, Impala etc.) emerging
Real-Time Queries Pattern
Micro-
batchQueue
Kafka etc Hadoop
HBase/
Teradata/H
ive…
Query/
Serving
Events
Web
server…
18
 Recommendations rely on
- recent activity (purchases, content viewed, product interest,
support issues)
- trends/fashion
- long-term propensity (relationship history, micro-segments,
social…)
 The opportunity is to integrate deep insight into
- Behavior
- Social graph
 Building product recommendations/person/next best offer
that’s maximally effective
 All A/B tested
Use Case: Recommendations
19
 Many operational use cases can be distributed across app server farm
 Batch computed views pushed to NoSQL
 Read NoSQL, update, respond & write to NoSQL can be done quickly
 No need for streaming analytics/computation
Embarrassingly Parallel Speed Layer Pattern
Micro-
batchQueue
Kafka etc
Hadoop
HBase/
Mongo…
NoSQL/
Speed
Events
Web
server…
20
Conclusions
 There are many kinds of real-time problems
 No one Big Data technology solves all the
problems
 Lambda architecture provides a powerful way to
solve the more sophisticated
 There are simpler approaches for simpler
problems…
 …which may be a step towards Lambda
Copyright 2013-2014 Think Big, a Teradata Company
21
We’re Hiring!
thinkbig.teradata.com
Booth #324
22
Altan Khendup (@madmongol)
Ron Bodkin (@ronbodkin)
Thank you!

Mais conteúdo relacionado

Mais procurados

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
 

Mais procurados (20)

Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Streaming Analytics
Streaming AnalyticsStreaming Analytics
Streaming Analytics
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
From SQL to NoSQL - StampedeCon 2015
From SQL to NoSQL  - StampedeCon 2015From SQL to NoSQL  - StampedeCon 2015
From SQL to NoSQL - StampedeCon 2015
 
LinkedIn2
LinkedIn2LinkedIn2
LinkedIn2
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 

Destaque

A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 

Destaque (9)

Url Shortening Services
Url Shortening ServicesUrl Shortening Services
Url Shortening Services
 
Business Continuity Planning in ServiceNow
Business Continuity Planning in ServiceNowBusiness Continuity Planning in ServiceNow
Business Continuity Planning in ServiceNow
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 

Semelhante a Data Apps with the Lambda Architecture - with Real Work Examples on Merging Batch and Real-Time Processing Presentation

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 

Semelhante a Data Apps with the Lambda Architecture - with Real Work Examples on Merging Batch and Real-Time Processing Presentation (20)

Big Data
Big DataBig Data
Big Data
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Lambda kappa architecture - the jury are still out
Lambda   kappa architecture - the jury are still outLambda   kappa architecture - the jury are still out
Lambda kappa architecture - the jury are still out
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 

Data Apps with the Lambda Architecture - with Real Work Examples on Merging Batch and Real-Time Processing Presentation

  • 1. How to Architect Big Data Apps with the Lambda Architecture OCTOBER 2014 Altan Khendup – Big Data Architect Ron Bodkin – Founder Think Big, a Teradata company
  • 2. 2 Real-Time • Low latency – Query response – Data refresh – End-to-end response • … nanoseconds, milliseconds, seconds, or minutes depending on your problem • Two basic patterns – Strategic insight: decision support – Process execution: system of engagement/operational analytics Copyright 2013-2014 Think Big, a Teradata Company
  • 3. 3 • Many users looking to gain valuable insights from both batch and real-time systems • User Characteristics – Do not always understand the complexities of tackling this challenge – Also want to use familiar/easy-to-use interfaces wherever possible – Want best practices about ways to integrate real-time (current) and batch (historical) – Often not aware of all the options and trade-offs among them Real-time Demand Growing © 2014 Teradata
  • 4. 4 • Lambda Architecture… – Provides a common architectural pattern for discussion – Provides a more clear picture of the complexities typically found in most organizations • Some challenges in tackling Lambda architecture – Complete Lambda requires more than just a single system - Typically requires multiple components - E.g. Batch/cold storage via e.g. Hadoop, Real-time/current data via e.g. Storm, Query via e.g. business analysis using a database – Also some challenges in delivering results to the business - Coordination is very difficult across the stack - Quality results back to the organization very important – Takes a lot of knowledge/expertise/technology to tackle – Not typically a first step in Big Data implementation Enter Lambda Architecture © 2014 Teradata
  • 5. 5 Background of Lambda Architecture Background – Reference architecture for Big Data systems – Designed by Nathan Marz (Twitter) – Defined as a system that runs arbitrary functions on arbitrary data – “query = function(all data)” Design Principles – Human fault-tolerant, Immutability, Computable Lambda Layers – Batch - Contains the immutable, constantly growing master dataset. – Speed - Deals only with new data and compensates for the high latency updates of the serving layer. – Serving - Loads and exposes the combined view of data so that they can be queried.
  • 6. 6 Overview of Lambda Architecture
  • 7. 7 © 2014 Teradata USE CASE - MEDICAL
  • 8. Every year, more than a million people from all 50 states and nearly 150 countries come for care Challenges in Medical Data Health data tends to be “wide”, not “deep” New data types are becoming more important Unstructured Real-time streaming A challenge to generally move from retrospective “BI” viewing to event-based and predictive analytics usage
  • 9. Optimize an existing Natural Language Processing pipeline in support of critical Colorectal Surgery (Move to tens of thousands of documents processed) Replace an existing free-text search facility used by Clinical Web Service for colorectal cancer (Move search to milliseconds)
  • 11. 11 • Current Storm throughput up to 1.5 million documents per hour • Average of 140,000 HL7 messages actually processed per day with average latency of 60 milliseconds from ingest to persistence • Average of 50,000 documents passed through annotators per day versus 5,000 historically • Actual annotations of documents up to 6 times faster than previously accomplished • Free-text search use cases that took over 30 minutes on old infrastructure completing in milliseconds in ElasticSearch Operational Statistics
  • 12. 12 • Challenges – Multiple layers - Lots of events, data – Complex - Lots of different languages and data structures – Difficult to maintain - Lots of moving pieces/components/technologies - Lots of changes for the business • Need for Practical Lambda approach – Based on real-world implementations – Metadata model (events and data) – Discrete data (query focused datasets) – Data convergence (holistic query focused dataset) Implementing Lambda
  • 14. Real Time and Lambda
  • 15. 15  Real-Time isn’t free! - 1 hour vs. 5 min vs. seconds - And may not be meaningful anyhow - Is there a robot or a human in the loop?  Simpler Instantiations of Lambda - Micro-Batch Feeds & Real-Time Queries - Embarrassingly Parallel Speed Layer - Transient Speed Layer - … One database for Speed & Serving (RDBMS or NoSQL) KISS
  • 16. 16  Understanding consumer purchase behavior across more than one touch point to drive holistic results  Each channel for consumer marketing and engagement has siloed applications and analytic tools  Correlating behavior across channels to understand customer journeys allows better engagement (e.g., web, mobile, call center, in store, email, social)  Common goals: increased response rates, increased share of wallet, reduced churn, focus on high value customers, increase customer satisfaction  Challenges: data volumes, correlation/sessionization, feature discovery Use Case: Cross-Channel Behavior Analytics
  • 17. 17  Many analytics use cases can be handled with update latencies of a few minutes  Micro-batching allows for dramatic efficiency improvements - … can extend to updates per event with additional infrastructure  Pre-aggregation (HBase, MPP, etc.) can serve many users  Hadoop query (Hive 0.13+ / Tez, Impala etc.) emerging Real-Time Queries Pattern Micro- batchQueue Kafka etc Hadoop HBase/ Teradata/H ive… Query/ Serving Events Web server…
  • 18. 18  Recommendations rely on - recent activity (purchases, content viewed, product interest, support issues) - trends/fashion - long-term propensity (relationship history, micro-segments, social…)  The opportunity is to integrate deep insight into - Behavior - Social graph  Building product recommendations/person/next best offer that’s maximally effective  All A/B tested Use Case: Recommendations
  • 19. 19  Many operational use cases can be distributed across app server farm  Batch computed views pushed to NoSQL  Read NoSQL, update, respond & write to NoSQL can be done quickly  No need for streaming analytics/computation Embarrassingly Parallel Speed Layer Pattern Micro- batchQueue Kafka etc Hadoop HBase/ Mongo… NoSQL/ Speed Events Web server…
  • 20. 20 Conclusions  There are many kinds of real-time problems  No one Big Data technology solves all the problems  Lambda architecture provides a powerful way to solve the more sophisticated  There are simpler approaches for simpler problems…  …which may be a step towards Lambda Copyright 2013-2014 Think Big, a Teradata Company
  • 22. 22 Altan Khendup (@madmongol) Ron Bodkin (@ronbodkin) Thank you!

Notas do Editor

  1. Lambda = architectural pattern to talk about the complexity of dealing with real-time and historical datasets Overall use Prescriptive/Predictive uses rely on some dimension of real-time Use cases CPG – consumer goods looking at what customers are doing in real-time and making adjustments Medical – real-time medical sensors and treatment and labs for critical patient care Financial – credit risk and transaction fraud Manufacturers – IoT/Telematics getting information from their plants and logistics, cross referencing to inventory, and making adjustments to supply chain
  2. General architecture that covers how Lambda works overall Able to address real-time and historical data Layers Speed – real-time/current data streams; spark, storm, etc. Batch – historical data layer Serving – ability to take the current data and historical and merge the results and provide that to the organization Real-world experience/strategy Do not tackle all of the data but rather necessary segments of business functionality called queries Data can be tackled per query hence the idea of “query focused datasets” or qfds Allows for more focused results/faster speed gains
  3. End goal An architecturally-driven, internally-owned technology stack that blends: An event-based processing fabric A real-time processing framework A multi-destination distillation hub “Classic” BI delivery techniques “Services-based” delivery techniques A “serendipitous” discovery environment Mutually supportive components that combine in delivering novel clinical solutions.
  4. The serving layer coordinates bringing this data together and creating a holistic view of the data Teradata understands some form of event and corresponding coordination of events to bring the data across the layers to the serving layer A general metadata model for data lineage and transformations Merge the data together into a holistic data set so that it can be served to consumers A context component that allows events, data, and requests to be held together Rules engine that allows for determinations based on sensing patterns Workflow/Dataflow for execution of necessary processing on data Save on the constant re-computation Snapshotted/versioned data Calculations done on these versions Can be worked with varios data structures and Hadoop components Full re-computation can be deferred and used to verify/replace specific snapshots