19. NOW200320011997 2006
Google published
whitepaper about solving
storage problems with
web indexing. Carafella
and Cutting implemented
the white paper as part of
the Nutch project
GFS
HISTORY OF HADOOP
Doug Cutting started to
develop first version of
Lucene at Yahoo!
START Cutting moved the NDFS
and MapReduce related
codebase under new
project called Hadoop
HADOOP
Cutting open sourced
Lucene and it was moved
under Apache Foundation
Mike Cafarella joined with
Cutting to start Apache
Nutch - project to index
whole internet.
OPEN SOURCED
28. CASE 1: EVENT SOURCING SQL-DATABASES
Working legacy systems that used
MySQL-database as a realtime data
storage.
No historical data saved ever.
Delete means delete
Update means update
We could touch the legacy code to
save the changes
But we don’t have to
31. KAFKA - DISTRIBUTED APPEND-ONLY LOG
Kafka was originally developed by
LinkedIn, open sourced 2011
Distributed, append-only log
Great tool for delivering reliably
millions of arbitrary formatted
messages
Scales by partitioning and adding new
nodes
(c) Ch.ko123 / CC BY 4.0
32. (c) Apache Spark
+ Fast writes (queue/log)
+ Fast reads (in-memory)
- Latency
- Reliable event delivery
is essential
KAPPA ARCHITECTURE
35. APACHE SPARK
Originally developed at the University
of California, Berkeley's AMPLab
General large-scale data processing
framework
Based on MapReduce architecture but
keeps intermediate results in memory
instead of saving them to slow disks
like Hadoop
(c) Ch.ko123 / CC BY 4.0
Supports lot’s of different data
sources
Programming APIs for Scala, Java or
Python
36. EKS-STACK
Elasticsearch is based on Lucene but
it’s more than just search engine, it
can be used to provide real time
analytics even for end users, it’s
usually used to store the aggregated
data
Kibana is great tool for the developers
and for internal use to discover and
analyze the data lying inside ES
Spark is used to process the events,
produce the needed aggregates and
ingest data into Elasticsearch so it can
be queried
43. New session:
started 07:17:09, duration 0s, OPEN
Existing session:
started 07:17:09, duration 5s, OPEN
Existing session:
started 07:17:09, duration 10s, OPEN
Existing session:
started 07:17:09, duration 14s,
paused 07:17:23, CLOSED
44. New session:
started 07:17:09, duration 0s, OPEN
Existing session:
started 07:17:09, duration 5s, OPEN
Existing session:
started 07:17:09, duration 10s, OPEN
Existing session:
started 07:17:09, duration 14s,
paused 07:17:23, CLOSED
45. You can find me at:
@theikkilap
teemu@emblica.fi
https://emblica.fi
Any questions?
Thanks!
Icons from Font Awesome project