SlideShare a Scribd company logo
1 of 19
Why your company needs a
Unified Log
Unified Log London, 20th May 2015
Introducing myself
• Alex Dean
• Co-founder and technical lead at Snowplow,
the open-source event analytics platform
based here in London [1]
• Weekend writer of Unified Log Processing,
available on the Manning Early Access Program
[2]
[1] https://github.com/snowplow/snowplow
[2] http://manning.com/dean
So what’s a Unified Log?
A quick history lesson: the three eras of business data
processing [1]
1. The classic era, 1996+
2. The hybrid era, 2005+
3. The unified era, 2013+
[1] http://snowplowanalytics.com/blog/
2014/01/20/the-three-eras-of-business-data-processing/
The classic era of business data processing, 1996+
OWN DATA CENTER
Data warehouse
HIGH LATENCY
Point-to-point
connections
WIDE DATA
COVERAGE
CMS
Silo
CRM
Local loop Local loop
NARROW DATA SILOES LOW LATENCY LOCAL LOOPS
E-comm
Silo
Local loop
Management
reporting
ERP
Silo
Local loop
Silo
Nightly batch ETL process
FULL DATA
HISTORY
The hybrid era, 2005+
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
Local loop
LOW LATENCY LOCAL LOOPS
E-comm
Silo
Local loop
CRM
Local loop
SAAS VENDOR #2
Email
marketing
Local loop
ERP
Silo
Local loop
CMS
Silo
Local loop
SAAS VENDOR #1
NARROW DATA SILOES
Stream
processing
Product
rec’s
Micro-batch
processing
Systems
monitoring
Batch
processing
Data
warehouse
Management
reporting
Batch
processing
Ad hoc
analytics
Hadoop
SAAS VENDOR #3
Web
analytics
Local loop
Local loop Local loop
LOW LATENCY LOW LATENCY
HIGH LATENCY HIGH LATENCY
APIs
Bulk exports
The hybrid era: a surfeit of software vendors
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
Local loop
LOW LATENCY LOCAL LOOPS
E-comm
Silo
Local loop
CRM
Local loop
SAAS VENDOR #2
Email
marketing
Local loop
ERP
Silo
Local loop
CMS
Silo
Local loop
SAAS VENDOR #1
NARROW DATA SILOES
Stream
processing
Product
rec’s
Micro-batch
processing
Systems
monitoring
Batch
processing
Data
warehouse
Management
reporting
Batch
processing
Ad hoc
analytics
Hadoop
SAAS VENDOR #3
Web
analytics
Local loop
Local loop Local loop
LOW LATENCY LOW LATENCY
HIGH LATENCY HIGH LATENCY
APIs
Bulk exports
The hybrid era: company-wide reporting and
analytics ends up like Rashomon
The bandit’s story
vs.
The wife’s story
vs.
The samurai’s story
vs.
The woodcutter’s story
The hybrid era: the number of data integrations
is unsustainable
So how do we unravel the
hairball?
The unified era, 2013+
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
SOME LOW LATENCY LOCAL LOOPS
E-comm
Silo
CRM
SAAS VENDOR #2
Email
marketing
ERP
Silo
CMS
Silo
SAAS VENDOR #1
NARROW DATA SILOES
Streaming APIs /
web hooks
Unified log
LOW LATENCY WIDE DATA
COVERAGE
Archiving
Hadoop
< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >
FEW DAYS’
DATA HISTORY
Systems
monitoring
Eventstream
HIGH LATENCY LOW LATENCY
Product rec’s
Ad hoc
analytics
Management
reporting
Fraud
detection
Churn
prevention
APIs
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
SOME LOW LATENCY LOCAL LOOPS
E-comm
Silo
CRM
SAAS VENDOR #2
Email
marketing
ERP
Silo
CMS
Silo
SAAS VENDOR #1
NARROW DATA SILOES
Streaming APIs /
web hooks
Unified log
Archiving
Hadoop
< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >
Systems
monitoring
Eventstream
HIGH LATENCY LOW LATENCY
Product rec’s
Ad hoc
analytics
Management
reporting
Fraud
detection
Churn
prevention
APIs
The unified log is Amazon Kinesis, or Apache Kafka
• Amazon Kinesis, a
hosted AWS service
• Extremely similar
semantics to Kafka
• Apache Kafka, an append-
only, distributed, ordered
commit log
• Developed at LinkedIn to
serve as their
organization’s unified log
“Kafka is designed to allow a
single cluster to serve as the
central data backbone for a
large organization” [1]
[1] http://kafka.apache.org/
So what does a unified log give us?
A single version of the truth
Our truth is now upstream from the data warehouse
The hairball of point-to-point connections has been
unravelled
Local loops have been unbundled
1
2
3
4
What does a unified log let us do that we couldn’t do before?
Populating a unified log with
your company’s event streams
Real-time
management
reporting
To enable…
Holistic
systems
monitoring
Re-running
models from
Day 0
A/B testing
end-to-end
pipelines
Shipping
offline
models to RT
… anything requiring low
latency response /
holistic view of our
company’s data!
But garbage in, garbage out: it’s crucial to properly model the
event streams feeding into the unified log
Subject
Direct
Object
Indirect
Object
Verb
Event Context
Prep.
Object~
• We are working on a semantic model for events – an “event
grammar” at Snowplow [1]
• The event grammar borrows concepts from human language:
• A semantic model prevents business and technology assumptions
leaking in to the event stream – making it less brittle over time
[1] http://snowplowanalytics.com/blog/2013/08/12/
towards-universal-event-analytics-building-an-event-grammar/
We also need to store and version the schemas used to describe
our events, as these will change over time
Unified
log
Questions?
Questions?
http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
To meet up or chat, @alexcrdean on Twitter or
alex@snowplowanalytics.com
Manning Deal of the Day today!
Discount code: dotd052015au (50% off
just today)

More Related Content

What's hot

What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesAlexander Dean
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience Martin Zapletal
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...confluent
 
Big data meetup budapest adding data schemas to snowplow
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplowyalisassoon
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...HostedbyConfluent
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Lucas Jellema
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Matt Stubbs
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registryconfluent
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformMatt Stubbs
 
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...HostedbyConfluent
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020confluent
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...Lucas Jellema
 

What's hot (20)

What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registries
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
 
Big data meetup budapest adding data schemas to snowplow
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplow
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
 
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
 

Similar to Unified Log London (May 2015) - Why your company needs a unified log

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
(DAT309) Scaling Massive Content Stores with Amazon Aurora
(DAT309) Scaling Massive Content Stores with Amazon Aurora(DAT309) Scaling Massive Content Stores with Amazon Aurora
(DAT309) Scaling Massive Content Stores with Amazon AuroraAmazon Web Services
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Amazon Web Services
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Bootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkBootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkAlex Silva
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Lucas Jellema
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices ZalandoHayley
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugaljavier ramirez
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesMichael Stephenson
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 

Similar to Unified Log London (May 2015) - Why your company needs a unified log (20)

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
(DAT309) Scaling Massive Content Stores with Amazon Aurora
(DAT309) Scaling Massive Content Stores with Amazon Aurora(DAT309) Scaling Massive Content Stores with Amazon Aurora
(DAT309) Scaling Massive Content Stores with Amazon Aurora
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Bootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and SparkBootstrapping Microservices with Kafka, Akka and Spark
Bootstrapping Microservices with Kafka, Akka and Spark
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
AWS for Startups
AWS for StartupsAWS for Startups
AWS for Startups
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate PortugalBuilding a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Unified Log London (May 2015) - Why your company needs a unified log

  • 1. Why your company needs a Unified Log Unified Log London, 20th May 2015
  • 2. Introducing myself • Alex Dean • Co-founder and technical lead at Snowplow, the open-source event analytics platform based here in London [1] • Weekend writer of Unified Log Processing, available on the Manning Early Access Program [2] [1] https://github.com/snowplow/snowplow [2] http://manning.com/dean
  • 3. So what’s a Unified Log?
  • 4. A quick history lesson: the three eras of business data processing [1] 1. The classic era, 1996+ 2. The hybrid era, 2005+ 3. The unified era, 2013+ [1] http://snowplowanalytics.com/blog/ 2014/01/20/the-three-eras-of-business-data-processing/
  • 5. The classic era of business data processing, 1996+ OWN DATA CENTER Data warehouse HIGH LATENCY Point-to-point connections WIDE DATA COVERAGE CMS Silo CRM Local loop Local loop NARROW DATA SILOES LOW LATENCY LOCAL LOOPS E-comm Silo Local loop Management reporting ERP Silo Local loop Silo Nightly batch ETL process FULL DATA HISTORY
  • 6. The hybrid era, 2005+ CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email marketing Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-batch processing Systems monitoring Batch processing Data warehouse Management reporting Batch processing Ad hoc analytics Hadoop SAAS VENDOR #3 Web analytics Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  • 7. The hybrid era: a surfeit of software vendors CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email marketing Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-batch processing Systems monitoring Batch processing Data warehouse Management reporting Batch processing Ad hoc analytics Hadoop SAAS VENDOR #3 Web analytics Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  • 8. The hybrid era: company-wide reporting and analytics ends up like Rashomon The bandit’s story vs. The wife’s story vs. The samurai’s story vs. The woodcutter’s story
  • 9. The hybrid era: the number of data integrations is unsustainable
  • 10. So how do we unravel the hairball?
  • 11. The unified era, 2013+ CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-comm Silo CRM SAAS VENDOR #2 Email marketing ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log LOW LATENCY WIDE DATA COVERAGE Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > FEW DAYS’ DATA HISTORY Systems monitoring Eventstream HIGH LATENCY LOW LATENCY Product rec’s Ad hoc analytics Management reporting Fraud detection Churn prevention APIs
  • 12. CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-comm Silo CRM SAAS VENDOR #2 Email marketing ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > Systems monitoring Eventstream HIGH LATENCY LOW LATENCY Product rec’s Ad hoc analytics Management reporting Fraud detection Churn prevention APIs The unified log is Amazon Kinesis, or Apache Kafka • Amazon Kinesis, a hosted AWS service • Extremely similar semantics to Kafka • Apache Kafka, an append- only, distributed, ordered commit log • Developed at LinkedIn to serve as their organization’s unified log
  • 13. “Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization” [1] [1] http://kafka.apache.org/
  • 14. So what does a unified log give us? A single version of the truth Our truth is now upstream from the data warehouse The hairball of point-to-point connections has been unravelled Local loops have been unbundled 1 2 3 4
  • 15. What does a unified log let us do that we couldn’t do before? Populating a unified log with your company’s event streams Real-time management reporting To enable… Holistic systems monitoring Re-running models from Day 0 A/B testing end-to-end pipelines Shipping offline models to RT … anything requiring low latency response / holistic view of our company’s data!
  • 16. But garbage in, garbage out: it’s crucial to properly model the event streams feeding into the unified log Subject Direct Object Indirect Object Verb Event Context Prep. Object~ • We are working on a semantic model for events – an “event grammar” at Snowplow [1] • The event grammar borrows concepts from human language: • A semantic model prevents business and technology assumptions leaking in to the event stream – making it less brittle over time [1] http://snowplowanalytics.com/blog/2013/08/12/ towards-universal-event-analytics-building-an-event-grammar/
  • 17. We also need to store and version the schemas used to describe our events, as these will change over time Unified log
  • 19. Questions? http://snowplowanalytics.com https://github.com/snowplow/snowplow @snowplowdata To meet up or chat, @alexcrdean on Twitter or alex@snowplowanalytics.com Manning Deal of the Day today! Discount code: dotd052015au (50% off just today)

Editor's Notes

  1. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log