SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Why 
your 
company 
needs 
a 
Unified 
Log 
Span 
Conference, 
London, 
28th 
October 
2014
Introducing 
myself 
• Alex 
Dean 
• Co-­‐founder 
and 
technical 
lead 
at 
Snowplow, 
the 
open-­‐source 
event 
analyBcs 
plaCorm 
based 
here 
in 
London 
[1] 
• Weekend 
writer 
of 
Unified 
Log 
Processing, 
available 
on 
the 
Manning 
Early 
Access 
Program 
[2] 
[1] 
hNps://github.com/snowplow/snowplow 
[2] 
hNp://manning.com/dean
So 
what’s 
a 
Unified 
Log?
A 
quick 
history 
lesson: 
the 
three 
eras 
of 
business 
data 
processing 
[1] 
1. The 
classic 
era, 
1996+ 
2. The 
hybrid 
era, 
2005+ 
3. The 
unified 
era, 
2013+ 
[1] 
hNp://snowplowanalyBcs.com/blog/ 
2014/01/20/the-­‐three-­‐eras-­‐of-­‐business-­‐data-­‐processing/
The 
classic 
era 
of 
business 
data 
processing, 
1996+ 
OWN 
DATA 
CENTER 
NARROW 
DATA 
SILOES 
LOW 
LATENCY 
LOCAL 
LOOPS 
Point-­‐to-­‐point 
connec+ons 
HIGH 
LATENCY 
Data 
warehouse 
WIDE 
DATA 
COVERAGE 
CMS 
Silo 
CRM 
E-­‐comm 
Local 
loop 
Local 
loop 
Silo 
Local 
loop 
Management 
reporBng 
ERP 
Silo 
Local 
loop 
Silo 
Nightly 
batch 
ETL 
process 
FULL 
DATA 
HISTORY
The 
hybrid 
era, 
2005+ 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
Local 
loop 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐comm 
Silo 
Local 
loop 
CRM 
Local 
loop 
SAAS 
VENDOR 
#2 
Email 
markeBng 
Local 
loop 
ERP 
Silo 
Local 
loop 
CMS 
Silo 
Local 
loop 
SAAS 
VENDOR 
#1 
NARROW 
DATA 
SILOES 
Stream 
processing 
Product 
rec’s 
Micro-­‐batch 
processing 
Systems 
monitoring 
Batch 
processing 
Data 
warehouse 
Management 
reporBng 
Batch 
processing 
Hadoop 
Ad 
hoc 
analyBcs 
SAAS 
VENDOR 
#3 
Web 
analyBcs 
Local 
loop 
Local 
loop 
Local 
loop 
LOW 
LATENCY 
LOW 
LATENCY 
HIGH 
LATENCY 
HIGH 
LATENCY 
APIs 
Bulk 
exports
The 
hybrid 
era: 
a 
surfeit 
of 
soNware 
vendors 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
Local 
loop 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐comm 
Silo 
Local 
loop 
CRM 
Local 
loop 
SAAS 
VENDOR 
#2 
Email 
markeBng 
Local 
loop 
ERP 
Silo 
Local 
loop 
CMS 
Silo 
Local 
loop 
SAAS 
VENDOR 
#1 
NARROW 
DATA 
SILOES 
Stream 
processing 
Product 
rec’s 
Micro-­‐batch 
processing 
Systems 
monitoring 
Batch 
processing 
Data 
warehouse 
Management 
reporBng 
Batch 
processing 
Hadoop 
Ad 
hoc 
analyBcs 
SAAS 
VENDOR 
#3 
Web 
analyBcs 
Local 
loop 
Local 
loop 
Local 
loop 
LOW 
LATENCY 
LOW 
LATENCY 
HIGH 
LATENCY 
HIGH 
LATENCY 
APIs 
Bulk 
exports
The 
hybrid 
era: 
company-­‐wide 
reporQng 
and 
analyQcs 
ends 
up 
like 
Rashomon 
The 
bandit’s 
story 
vs. 
The 
wife’s 
story 
vs. 
The 
samurai’s 
story 
vs. 
The 
woodcuNer’s 
story
The 
hybrid 
era: 
the 
number 
of 
data 
integraQons 
is 
unsustainable
So 
how 
do 
we 
unravel 
the 
hairball?
The 
unified 
era, 
2013+ 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
SOME 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐comm 
Silo 
CRM 
SAAS 
VENDOR 
#2 
Email 
markeBng 
ERP 
Silo 
CMS 
Silo 
SAAS 
VENDOR 
#1 
NARROW 
DATA 
SILOES 
Streaming 
APIs 
/ 
web 
hooks 
LOW 
LATENCY 
WIDE 
DATA 
Unified 
log 
COVERAGE 
Archiving 
Hadoop 
< 
WIDE 
DATA 
COVERAGE 
> 
< 
FULL 
DATA 
HISTORY 
> 
FEW 
DAYS’ 
DATA 
HISTORY 
Systems 
monitoring 
Eventstream 
Ad 
hoc 
HIGH 
LATENCY 
LOW 
LATENCY 
Product 
rec’s 
analyBcs 
Management 
reporBng 
Fraud 
detecBon 
Churn 
prevenBon 
APIs
The 
unified 
log 
is 
Amazon 
Kinesis, 
or 
Apache 
KaVa 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
SOME 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐comm 
Silo 
CRM 
SAAS 
VENDOR 
#2 
Email 
markeBng 
ERP 
Silo 
CMS 
Silo 
SAAS 
VENDOR 
#1 
NARROW 
DATA 
SILOES 
Streaming 
APIs 
/ 
web 
hooks 
Unified 
log 
Archiving 
Hadoop 
< 
WIDE 
DATA 
COVERAGE 
> 
< 
FULL 
DATA 
HISTORY 
> 
Systems 
monitoring 
Eventstream 
Ad 
hoc 
HIGH 
LATENCY 
LOW 
LATENCY 
Product 
rec’s 
analyBcs 
Management 
reporBng 
Fraud 
detecBon 
Churn 
prevenBon 
APIs 
• Amazon 
Kinesis, 
a 
hosted 
AWS 
service 
• Extremely 
similar 
semanBcs 
to 
Kaba 
• Apache 
Kaba, 
an 
append-­‐ 
only, 
distributed, 
ordered 
commit 
log 
• Developed 
at 
LinkedIn 
to 
serve 
as 
their 
organizaBon’s 
unified 
log
“Kaba 
is 
designed 
to 
allow 
a 
single 
cluster 
to 
serve 
as 
the 
central 
data 
backbone 
for 
a 
large 
organizaBon” 
[1] 
[1] 
hNp://kaba.apache.org/
So 
what 
does 
a 
unified 
log 
give 
us? 
A 
single 
version 
of 
the 
truth 
Our 
truth 
is 
now 
upstream 
from 
the 
data 
warehouse 
The 
hairball 
of 
point-­‐to-­‐point 
connecQons 
has 
been 
unravelled 
Local 
loops 
have 
been 
unbundled 
1 
2 
3 
4
What 
does 
a 
unified 
log 
let 
us 
do 
that 
we 
couldn’t 
do 
before? 
PopulaQng 
a 
unified 
log 
with 
your 
company’s 
event 
streams 
Real-­‐Bme 
management 
reporBng 
To 
enable… 
HolisBc 
systems 
monitoring 
Re-­‐running 
models 
from 
Day 
0 
A/B 
tesBng 
end-­‐to-­‐end 
pipelines 
Shipping 
offline 
models 
to 
RT 
… 
anything 
requiring 
low 
latency 
response 
/ 
holis+c 
view 
of 
our 
company’s 
data!
But 
garbage 
in, 
garbage 
out: 
it’s 
crucial 
to 
properly 
model 
the 
event 
streams 
feeding 
into 
the 
unified 
log 
Subject 
Direct 
Object 
Indirect 
Verb 
Object 
Event 
Context 
Prep. 
~ 
Object 
• We 
are 
working 
on 
a 
semanBc 
model 
for 
events 
– 
an 
“event 
grammar” 
at 
Snowplow 
[1] 
• The 
event 
grammar 
borrows 
concepts 
from 
human 
language: 
• A 
semanBc 
model 
prevents 
business 
and 
technology 
assumpBons 
leaking 
in 
to 
the 
event 
stream 
– 
making 
it 
less 
briNle 
over 
Bme 
[1] 
hNp://snowplowanalyBcs.com/blog/2013/08/12/ 
towards-­‐universal-­‐event-­‐analyBcs-­‐building-­‐an-­‐event-­‐grammar/
We 
also 
need 
to 
store 
and 
version 
the 
schemas 
used 
to 
describe 
our 
events, 
as 
these 
will 
change 
over 
Qme 
Unified 
log
How 
are 
we 
embracing 
the 
unified 
log 
at 
Snowplow?
Some 
background: 
early 
on, 
we 
decided 
that 
Snowplow 
should 
be 
composed 
of 
a 
set 
of 
loosely 
coupled 
subsystems 
1. 
Trackers 
2. 
Collectors 
3. 
Enrich 
4. 
Storage 
5. 
AnalyBcs 
Generate 
event 
data 
from 
any 
environment 
Log 
raw 
events 
from 
trackers 
Validate 
and 
enrich 
raw 
events 
= 
Standardised 
data 
protocols 
Store 
enriched 
events 
ready 
for 
analysis 
Analyze 
enriched 
events 
These 
turned 
out 
to 
be 
criBcal 
to 
allowing 
us 
to 
evolve 
the 
above 
stack
Today 
almost 
all 
users/customers 
are 
running 
a 
batch-­‐based 
Snowplow 
configuraQon 
Hadoop-­‐ 
based 
enrichment 
Snowplow 
event 
tracking 
SDK 
Amazon 
S3 
Amazon 
Redshik 
HTTP-­‐based 
event 
collector 
• Batch-­‐based 
• Normally 
run 
overnight; 
The 
Snowplow 
batch-­‐based 
someBmes 
every 
4-­‐6 
hours 
flow 
uses 
Amazon 
S3 
as 
a 
“poor 
man’s” 
unified 
log
Can 
we 
implement 
Snowplow 
on 
top 
of 
Kinesis/KaVa? 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
SOME 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐comm 
Silo 
CRM 
SAAS 
VENDOR 
#2 
Email 
markeBng 
ERP 
Silo 
CMS 
Silo 
SAAS 
VENDOR 
#1 
NARROW 
DATA 
SILOES 
Streaming 
APIs 
/ 
web 
hooks 
Unified 
log 
Archiving 
Hadoop 
< 
WIDE 
DATA 
COVERAGE 
> 
< 
FULL 
DATA 
HISTORY 
> 
Systems 
monitoring 
Eventstream 
Ad 
hoc 
HIGH 
LATENCY 
LOW 
LATENCY 
Product 
rec’s 
analyBcs 
Management 
reporBng 
Fraud 
detecBon 
Churn 
prevenBon 
APIs
We 
are 
working 
on 
Amazon 
Kinesis 
support 
first; 
Apache 
KaVa 
will 
come 
later 
(using 
Apache 
Samza 
for 
stream 
processing) 
Scala 
Stream 
Collector 
Raw 
event 
stream 
Enrich 
Kinesis 
app 
Bad 
raw 
events 
stream 
Enriched 
event 
stream 
S3 
Redshik 
S3 
sink 
Kinesis 
app 
Redshik 
sink 
Kinesis 
app 
Snowplow 
Trackers 
= 
not 
yet 
released 
ElasBc-­‐ 
Search 
sink 
Kinesis 
app 
DynamoDB 
ElasBc-­‐ 
Search 
Event 
aggregator 
Kinesis 
app 
AnalyQcs 
on 
Read 
(for 
agile 
exploraBon 
of 
event 
stream, 
ML, 
audiBng, 
applying 
alternate 
models, 
reprocessing 
etc) 
AnalyQcs 
on 
Write 
(for 
dashboarding, 
audience 
segmentaBon, 
RTB, 
etc)
Live 
demo!
QuesQons? 
Discount 
code: 
spancNw 
(43% 
off 
all 
Manning 
eBooks 
for 
Span 
J) 
hNp://snowplowanalyBcs.com 
hNps://github.com/snowplow/snowplow 
@snowplowdata 
To 
meet 
up 
or 
chat, 
@alexcrdean 
on 
TwiNer 
or 
alex@snowplowanalyBcs.com

Mais conteúdo relacionado

Mais procurados

Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaZach Cox
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...confluent
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020confluent
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Lightbend
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...confluent
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience Martin Zapletal
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windowsconfluent
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixJerome Boulon
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 

Mais procurados (20)

Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 

Semelhante a Span Conference: Why your company needs a unified log

Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event datayalisassoon
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
No More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureNo More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureConSanFrancisco123
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endIan Massingham
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowAlexander Dean
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Amazon Web Services
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 

Semelhante a Span Conference: Why your company needs a unified log (20)

Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
No More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureNo More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application Infrastructure
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing Snowplow
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 

Último

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Último (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

Span Conference: Why your company needs a unified log

  • 1. Why your company needs a Unified Log Span Conference, London, 28th October 2014
  • 2. Introducing myself • Alex Dean • Co-­‐founder and technical lead at Snowplow, the open-­‐source event analyBcs plaCorm based here in London [1] • Weekend writer of Unified Log Processing, available on the Manning Early Access Program [2] [1] hNps://github.com/snowplow/snowplow [2] hNp://manning.com/dean
  • 3. So what’s a Unified Log?
  • 4. A quick history lesson: the three eras of business data processing [1] 1. The classic era, 1996+ 2. The hybrid era, 2005+ 3. The unified era, 2013+ [1] hNp://snowplowanalyBcs.com/blog/ 2014/01/20/the-­‐three-­‐eras-­‐of-­‐business-­‐data-­‐processing/
  • 5. The classic era of business data processing, 1996+ OWN DATA CENTER NARROW DATA SILOES LOW LATENCY LOCAL LOOPS Point-­‐to-­‐point connec+ons HIGH LATENCY Data warehouse WIDE DATA COVERAGE CMS Silo CRM E-­‐comm Local loop Local loop Silo Local loop Management reporBng ERP Silo Local loop Silo Nightly batch ETL process FULL DATA HISTORY
  • 6. The hybrid era, 2005+ CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-­‐comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email markeBng Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-­‐batch processing Systems monitoring Batch processing Data warehouse Management reporBng Batch processing Hadoop Ad hoc analyBcs SAAS VENDOR #3 Web analyBcs Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  • 7. The hybrid era: a surfeit of soNware vendors CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-­‐comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email markeBng Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-­‐batch processing Systems monitoring Batch processing Data warehouse Management reporBng Batch processing Hadoop Ad hoc analyBcs SAAS VENDOR #3 Web analyBcs Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  • 8. The hybrid era: company-­‐wide reporQng and analyQcs ends up like Rashomon The bandit’s story vs. The wife’s story vs. The samurai’s story vs. The woodcuNer’s story
  • 9. The hybrid era: the number of data integraQons is unsustainable
  • 10. So how do we unravel the hairball?
  • 11. The unified era, 2013+ CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-­‐comm Silo CRM SAAS VENDOR #2 Email markeBng ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks LOW LATENCY WIDE DATA Unified log COVERAGE Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > FEW DAYS’ DATA HISTORY Systems monitoring Eventstream Ad hoc HIGH LATENCY LOW LATENCY Product rec’s analyBcs Management reporBng Fraud detecBon Churn prevenBon APIs
  • 12. The unified log is Amazon Kinesis, or Apache KaVa CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-­‐comm Silo CRM SAAS VENDOR #2 Email markeBng ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > Systems monitoring Eventstream Ad hoc HIGH LATENCY LOW LATENCY Product rec’s analyBcs Management reporBng Fraud detecBon Churn prevenBon APIs • Amazon Kinesis, a hosted AWS service • Extremely similar semanBcs to Kaba • Apache Kaba, an append-­‐ only, distributed, ordered commit log • Developed at LinkedIn to serve as their organizaBon’s unified log
  • 13. “Kaba is designed to allow a single cluster to serve as the central data backbone for a large organizaBon” [1] [1] hNp://kaba.apache.org/
  • 14. So what does a unified log give us? A single version of the truth Our truth is now upstream from the data warehouse The hairball of point-­‐to-­‐point connecQons has been unravelled Local loops have been unbundled 1 2 3 4
  • 15. What does a unified log let us do that we couldn’t do before? PopulaQng a unified log with your company’s event streams Real-­‐Bme management reporBng To enable… HolisBc systems monitoring Re-­‐running models from Day 0 A/B tesBng end-­‐to-­‐end pipelines Shipping offline models to RT … anything requiring low latency response / holis+c view of our company’s data!
  • 16. But garbage in, garbage out: it’s crucial to properly model the event streams feeding into the unified log Subject Direct Object Indirect Verb Object Event Context Prep. ~ Object • We are working on a semanBc model for events – an “event grammar” at Snowplow [1] • The event grammar borrows concepts from human language: • A semanBc model prevents business and technology assumpBons leaking in to the event stream – making it less briNle over Bme [1] hNp://snowplowanalyBcs.com/blog/2013/08/12/ towards-­‐universal-­‐event-­‐analyBcs-­‐building-­‐an-­‐event-­‐grammar/
  • 17. We also need to store and version the schemas used to describe our events, as these will change over Qme Unified log
  • 18. How are we embracing the unified log at Snowplow?
  • 19. Some background: early on, we decided that Snowplow should be composed of a set of loosely coupled subsystems 1. Trackers 2. Collectors 3. Enrich 4. Storage 5. AnalyBcs Generate event data from any environment Log raw events from trackers Validate and enrich raw events = Standardised data protocols Store enriched events ready for analysis Analyze enriched events These turned out to be criBcal to allowing us to evolve the above stack
  • 20. Today almost all users/customers are running a batch-­‐based Snowplow configuraQon Hadoop-­‐ based enrichment Snowplow event tracking SDK Amazon S3 Amazon Redshik HTTP-­‐based event collector • Batch-­‐based • Normally run overnight; The Snowplow batch-­‐based someBmes every 4-­‐6 hours flow uses Amazon S3 as a “poor man’s” unified log
  • 21. Can we implement Snowplow on top of Kinesis/KaVa? CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-­‐comm Silo CRM SAAS VENDOR #2 Email markeBng ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > Systems monitoring Eventstream Ad hoc HIGH LATENCY LOW LATENCY Product rec’s analyBcs Management reporBng Fraud detecBon Churn prevenBon APIs
  • 22. We are working on Amazon Kinesis support first; Apache KaVa will come later (using Apache Samza for stream processing) Scala Stream Collector Raw event stream Enrich Kinesis app Bad raw events stream Enriched event stream S3 Redshik S3 sink Kinesis app Redshik sink Kinesis app Snowplow Trackers = not yet released ElasBc-­‐ Search sink Kinesis app DynamoDB ElasBc-­‐ Search Event aggregator Kinesis app AnalyQcs on Read (for agile exploraBon of event stream, ML, audiBng, applying alternate models, reprocessing etc) AnalyQcs on Write (for dashboarding, audience segmentaBon, RTB, etc)
  • 24. QuesQons? Discount code: spancNw (43% off all Manning eBooks for Span J) hNp://snowplowanalyBcs.com hNps://github.com/snowplow/snowplow @snowplowdata To meet up or chat, @alexcrdean on TwiNer or alex@snowplowanalyBcs.com