SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Real-Time Analytics with Apache
Cassandra
Cassandra Day Munich, 9.2.2016
Guido Schmutz
Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://de.slideshare.net/gschmutz
Twitter: gschmutz
2
Our company.
© Trivadis – The Company3 2/11/16
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and and Open
Source technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
© Trivadis – The Company4 2/11/16
14 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1. Customer Use Case and Architecture
2. Cassandra Data Modeling
3. Cassandra for Timeseries Data
4. Titan:db for Graph Data
5
Customer Use Case and
Architecture
6
Data Science Lab @ Armasuisse W&T
W+T flagship project, standing
for innovation & tech transfer
Building capabilities in the
areas of:
• Social Media Intelligence
(SOCMINT)
• Big Data Technologies &
Architectures
Invest into new, innovative and not
widely-proven technology
• Batch / Real-time analysis
• NoSQL databases
• Text analysis (NLP)
• Graph Data
• …
3 Phases: June 2013 – June 2015
7
SOCMINT Demonstrator – Time Dimension
Major data model: Time
series (TS)
TS reflect user behaviors
over time
Activities correlate with
events
Anomaly detection
Event detection &
prediction
8
SOCMINT Demonstrator – Social Dimension
User-user networks (social
graphs);
Twitter: follower, retweet and
mention graphs
Who is central in a social
network?
Who has retweeted a given
tweet to whom?
9
SOCMINT Demonstrator - “Lambda Architecture” for Big
Data
Data
Collection
(Analytical)	Batch	Data	Processing
Batch
compute
Batch	Result	StoreData
Sources
Channel
Data
Access
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Real-Time	Result	
Store
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
10
SOCMINT Demonstrator – Frameworks & Components
in Use
Data
Collection
(Analytical)	Batch	Data	Processing
Batch
compute
Batch	Result	StoreData
Sources
Channel
Data
Access
Reports
Service
Analytic
Tools
Alerting
Tools
Social
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Real-Time	Result	
Store
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
11
Streaming Analytics Processing Pipeline
Kafka provides reliable and efficient queuing
Storm processes (rollups, counts)
Cassandrastores results at same speed
StoringProcessingQueuing
12
Twitter
Sensor 1
Twitter
Sensor 2
Twitter
Sensor 3
Visualizatio
n
Application
Visualizatio
n
Application
Cassandra Data Modeling
13
Cassandra Data Modelling
14
• Don’t think relational !
• Denormalize, Denormalize, Denormalize ….
• Rows are gigantic and sorted = one row is stored on one node
• Know your application/use cases => from query to model
• Index is not an afterthought, anymore => “index” upfront
• Control physical storage structure
“Static” Tables – “Skinny Row”
15
rowkey
CREATE TABLE skinny (rowkey text,
c1 text PRIMARY KEY,
c2 text,
c3 text,
PRIMARY KEY (rowkey));
Grows	up	to	Billion	of	Rows
rowkey-1 c1 c2 c3
value-c1 value-c2 value-c3
rowkey-2 c1 c3
value-c1 value-c3
rowkey-3 c1 c2 c3
value-c1 value-c2 value-c3
c1 c2 c3
Partition	Key
“Dynamic” Tables – “Wide Row”
16
rowkey
Billion	of	Rows
rowkey-1 ckey-1:c1 ckey-1:c2
value-c1 value-c2
rowkey-2
rowkey-3
CREATE TABLE wide (rowkey text,
ckey text,
c1 text,
c2 text,
PRIMARY KEY (rowkey, ckey) WITH CLUSTERING ORDER BY (ckey ASC);
ckey-2:c1 ckey-2:c2
value-c1 value-c2
ckey-3:c1 ckey-3:c2
value-c1 value-c2
ckey-1:c1 ckey-1:c2
value-c1 value-c2
ckey-2:c1 ckey-2:c2
value-c1 value-c2
ckey-1:c1 ckey-1:c2
value-c1 value-c2
ckey-2:c1 ckey-2:c2
value-c1 value-c2
ckey-3:c1 ckey-3:c2
value-c1 value-c2
1 2	Billion
Partition	Key Clustering Key
Cassandra for Timeseries Data
17
Know your application => From query to model
18
Show Timeline of
Tweets
Show Timeseries on
different levels of
aggregation
(resolution)
• Seconds
• Minute
• Hours
Show Timeline: Provide Raw Data (Tweets)
19
CREATE TABLE tweet (tweet_id bigint,
username text,
message text,
hashtags list<text>,
latitude double,
longitude double,
…
PRIMARY KEY(tweet_id));
• Skinny Row Table
• Holds the sensor raw data =>
Tweets
• Similar to a relational table
• Primary Key is the partition key
10000121 username message hashtags latitude longitude
gschmutz Getting	ready	for .. [cassandra,	nosql] 0 0
20121223 username message hashtags latitude longitude
DataStax The Speed	Factor	.. [BigData 0 0
tweet_id
Partition	Key Clustering Key
Show Timeline: Provide Raw Data (Tweets)
20
INSERT INTO tweet (tweet_id, username, message, hashtags, latitude,
longitude) VALUES (10000121, 'gschmutz', 'Getting ready for my talk about
using Cassandra for Timeseries and Graph Data', ['cassandra', 'nosql'],
0,0);
SELECT tweet_id, username, hashtags, message FROM tweet
WHERE tweet_id = 10000121 ;
tweet_id | username | hashtag | message
---------+----------+------------------------+----------------------------
10000121 | gschmutz | ['cassandra', 'nosql'] | Getting ready for ...
20121223 | DataStax | [’BigData’] | The Speed Factor ...
Partition	Key Clustering Key
Show Timeline: Provide Sequence of Events
21
CREATE TABLE tweet_timeline (
sensor_id text,
bucket_id text,
time_id timestamp,
tweet_id bigint,
PRIMARY KEY((sensor_id, bucket_id), time_id))
WITH CLUSTERING ORDER BY (time_id DESC);
Wide Row Table
bucket-id creates buckets
for columns
• SECOND-2015-10-14
ABC-001:SECOND-2015-10-14 10:00:02:tweet-id
10000121	
DEF-931:SECOND-2015-10-14 10:09:02:tweet-id
1003121343
09:12:09:tweet-id
1002111343
09:10:02:tweet-id
1001121343
Partition	Key Clustering Key
Show Timeline: Provide Sequence of Events
22
INSERT INTO tweet_timeline (sensor_id, bucket_id, time_id, tweet_id)
VALUES ('ABC-001', 'SECOND-2015-10-14', '2015-09-30 10:50:00', 10000121 );
SELECT * from tweet_timeline
WHERE sensor_id = 'ABC-001’ AND bucket_id = 'HOUR-2015-10'
AND key = 'ALL’ AND time_id <= '2015-10-14 12:00:00';
sensor_id | bucket_id | time_id | tweet_id
----------+-------------------+--------------------------+----------
ABC-001 | SECOND-2015-10-14 | 2015-10-14 11:53:00+0000 | 10020334
ABC-001 | SECOND-2015-10-14 | 2015-10-14 10:52:00+0000 | 10000334
ABC-001 | SECOND-2015-10-14 | 2015-10-14 10:51:00+0000 | 10000127
ABC-001 | SECOND-2015-10-14 | 2015-10-14 10:50:00+0000 | 10000121
Sorted	by	time_id
Partition	Key Clustering Key
Show Timeseries: Provide list of metrics
23
CREATE TABLE tweet_count (
sensor_id text,
bucket_id text,
key text,
time_id timestamp,
count counter,
PRIMARY KEY((sensor_id, bucket_id), key, time_id))
WITH CLUSTERING ORDER BY (key ASC, time_id DESC);
Wide Row Table
bucket-id creates buckets
for columns
• SECOND-2015-10-14
• HOUR-2015-10
• DAY-2015-10
ABC-001:HOUR-2015-10 ALL:10:00:count
1’550
ABC-001:DAY-2015-10 ALL:14-OCT:count
105’999
ALL:13-OCT:count
120’344
nosql:14-OCT:count
2’532
ALL:09:00:count
2’299
nosql:08:00:count
25
30d	*	24h	*	n	keys	=	n	*	720	cols
Partition	Key Clustering Key
Show Timeseries: Provide list of metrics
24
UPDATE tweet_count SET count = count + 1
WHERE sensor_id = 'ABC-001’ AND bucket_id = 'HOUR-2015-10'
AND key = 'ALL’ AND time_id = '2015-10-14 10:00:00';
SELECT * from tweet_count
WHERE sensor_id = 'ABC-001' AND bucket_id = 'HOUR-2015-10'
AND key = 'ALL' AND time_id >= '2015-10-14 08:00:00’;
sensor_id | bucket_id | key | time_id | count
----------+--------------+-----+--------------------------+-------
ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 12:00:00+0000 | 100230
ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 11:00:00+0000 | 102230
ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 10:00:00+0000 | 105430
ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 09:00:00+0000 | 203240
ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 08:00:00+0000 | 132230
Partition	Key Clustering Key
Titan:db & Cassandra for Graph
Data
25
Introduction to the Graph Model – Property Graph
Vertex (Node)
• Represent Entities
• Always have an ID
• Can contain properties (key-
value pairs)
Edge (Relationship)
• Lines between nodes
• may be directed or undirected
• Have IDs and properites
Properties
• Values about node or relationship
• Allow to add semantic to
relationships
User	1
Tweet	2
author
follow
retweet
User	2
Id:	16134540
name:	cloudera
location:	Palo	Alto
Id:	18898576
name:	gschmutz
location:	Berne
Id:	18898999
text:	CDH5	has	been..
time:	July	11	2015
time:	June	11	2015
key:	 value
26
since:	May	2012
Tweet	1
Id:	18898576
text:	Join	BigData..
time:	June	11	2015
author
Titan:db Architecture
27
http://thinkaurelius.github.io/titan/
TinkerPop 3 Stack
TinkerPop is a framework composed of
various interoperable components
Vendor independent (similar to JDBC for
RDBMS)
Core API defines Graph, Vertex, Edge, …
Gremlin traversal language is vendor-
independent way to query (traverse) a graph
Gremlin server can be leveraged to allow
over the wire communication with a
TinkerPop enabled graph system
http://tinkerpop.incubator.apache.org/
28
Gremlin Graph Traversal Engine
29
Language / System agostic: many graph languages for many
graph systems
Provided Traversal Engine: SPARQL or any other graph query
language on the Gremlin Traversal Machine
Native distributed execution: A Gremlin Traversal over an
OLAP Graph Processor (Hadoop / Spark)
Gremlin in Action – Creating the Graph
30
Gremlin in Action – Graph Traversal
31
Gremlin in Action – Graph Traversal (II)
32
Summary - Know your domain
Connectedness	 of	Datalow high
Document
Data
Store
Key-Value
Stores
Wide-
Column
Store
Graph
Databases
Relational
Databases
Guido Schmutz
Email: guido.schmutz@trivadis.com
+41 79 412 05 39
34

Mais conteúdo relacionado

Mais procurados

Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 
IoT Architecture - are traditional architectures good enough or do we need n...
 IoT Architecture - are traditional architectures good enough or do we need n... IoT Architecture - are traditional architectures good enough or do we need n...
IoT Architecture - are traditional architectures good enough or do we need n...Guido Schmutz
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data LösungenGuido Schmutz
 
Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Guido Schmutz
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Guido Schmutz
 
Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitGuido Schmutz
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at NubankDatabricks
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...DataWorks Summit
 
Blockchain and Apache NiFi
Blockchain and Apache NiFiBlockchain and Apache NiFi
Blockchain and Apache NiFiTimothy Spann
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraBig Data Spain
 
DNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataDNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataRolf Koski
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 

Mais procurados (20)

Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 
IoT Architecture - are traditional architectures good enough or do we need n...
 IoT Architecture - are traditional architectures good enough or do we need n... IoT Architecture - are traditional architectures good enough or do we need n...
IoT Architecture - are traditional architectures good enough or do we need n...
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in Echtzeit
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at Nubank
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
 
Blockchain and Apache NiFi
Blockchain and Apache NiFiBlockchain and Apache NiFi
Blockchain and Apache NiFi
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Make your data talk
Make your data talkMake your data talk
Make your data talk
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan Chhabra
 
DNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdataDNA - Einstein - Data science ja bigdata
DNA - Einstein - Data science ja bigdata
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 

Destaque

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialNeera Agarwal
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingGyula Fóra
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King Gyula Fóra
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
Stream Analytics in the Enterprise
Stream Analytics in the EnterpriseStream Analytics in the Enterprise
Stream Analytics in the EnterpriseJesus Rodriguez
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...Audrey Neveu
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingGuido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Web Services
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicerconfluent
 

Destaque (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics Tutorial
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King
 
Streaming Analytics
Streaming AnalyticsStreaming Analytics
Streaming Analytics
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
Stream Analytics in the Enterprise
Stream Analytics in the EnterpriseStream Analytics in the Enterprise
Stream Analytics in the Enterprise
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...The end of polling : why and how to transform a REST API into a Data Streamin...
The end of polling : why and how to transform a REST API into a Data Streamin...
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
 

Semelhante a Real Time Analytics with Apache Cassandra - Cassandra Day Munich

Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream AnalyticsImplementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream AnalyticsMarco Parenzan
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Datapepeborja
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure TechnologiesKoray Kocabas
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)Codit
 
Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015Damir Dobric
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxSumant Tambe
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaOCoderFest
 
LoQutus: A deep-dive into Microsoft Power BI
LoQutus: A deep-dive into Microsoft Power BILoQutus: A deep-dive into Microsoft Power BI
LoQutus: A deep-dive into Microsoft Power BILoQutus
 
Metaverse and Digital Twins on Enterprise-Public.pdf
Metaverse and Digital Twins on Enterprise-Public.pdfMetaverse and Digital Twins on Enterprise-Public.pdf
Metaverse and Digital Twins on Enterprise-Public.pdf湯米吳 Tommy Wu
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 

Semelhante a Real Time Analytics with Apache Cassandra - Cassandra Day Munich (20)

Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream AnalyticsImplementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
 
Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and Rx
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
LoQutus: A deep-dive into Microsoft Power BI
LoQutus: A deep-dive into Microsoft Power BILoQutus: A deep-dive into Microsoft Power BI
LoQutus: A deep-dive into Microsoft Power BI
 
Metaverse and Digital Twins on Enterprise-Public.pdf
Metaverse and Digital Twins on Enterprise-Public.pdfMetaverse and Digital Twins on Enterprise-Public.pdf
Metaverse and Digital Twins on Enterprise-Public.pdf
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 

Mais de Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Mais de Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Real Time Analytics with Apache Cassandra - Cassandra Day Munich

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Real-Time Analytics with Apache Cassandra Cassandra Day Munich, 9.2.2016 Guido Schmutz
  • 2. Guido Schmutz Working for Trivadis for more than 19 years Oracle ACE Director for Fusion Middleware and SOA Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Member of Trivadis Architecture Board Technology Manager @ Trivadis More than 25 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://de.slideshare.net/gschmutz Twitter: gschmutz 2
  • 3. Our company. © Trivadis – The Company3 2/11/16 Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and and Open Source technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. © Trivadis – The Company4 2/11/16 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 5. Agenda 1. Customer Use Case and Architecture 2. Cassandra Data Modeling 3. Cassandra for Timeseries Data 4. Titan:db for Graph Data 5
  • 6. Customer Use Case and Architecture 6
  • 7. Data Science Lab @ Armasuisse W&T W+T flagship project, standing for innovation & tech transfer Building capabilities in the areas of: • Social Media Intelligence (SOCMINT) • Big Data Technologies & Architectures Invest into new, innovative and not widely-proven technology • Batch / Real-time analysis • NoSQL databases • Text analysis (NLP) • Graph Data • … 3 Phases: June 2013 – June 2015 7
  • 8. SOCMINT Demonstrator – Time Dimension Major data model: Time series (TS) TS reflect user behaviors over time Activities correlate with events Anomaly detection Event detection & prediction 8
  • 9. SOCMINT Demonstrator – Social Dimension User-user networks (social graphs); Twitter: follower, retweet and mention graphs Who is central in a social network? Who has retweeted a given tweet to whom? 9
  • 10. SOCMINT Demonstrator - “Lambda Architecture” for Big Data Data Collection (Analytical) Batch Data Processing Batch compute Batch Result StoreData Sources Channel Data Access Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Real-Time Result Store Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest 10
  • 11. SOCMINT Demonstrator – Frameworks & Components in Use Data Collection (Analytical) Batch Data Processing Batch compute Batch Result StoreData Sources Channel Data Access Reports Service Analytic Tools Alerting Tools Social (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Real-Time Result Store Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest 11
  • 12. Streaming Analytics Processing Pipeline Kafka provides reliable and efficient queuing Storm processes (rollups, counts) Cassandrastores results at same speed StoringProcessingQueuing 12 Twitter Sensor 1 Twitter Sensor 2 Twitter Sensor 3 Visualizatio n Application Visualizatio n Application
  • 14. Cassandra Data Modelling 14 • Don’t think relational ! • Denormalize, Denormalize, Denormalize …. • Rows are gigantic and sorted = one row is stored on one node • Know your application/use cases => from query to model • Index is not an afterthought, anymore => “index” upfront • Control physical storage structure
  • 15. “Static” Tables – “Skinny Row” 15 rowkey CREATE TABLE skinny (rowkey text, c1 text PRIMARY KEY, c2 text, c3 text, PRIMARY KEY (rowkey)); Grows up to Billion of Rows rowkey-1 c1 c2 c3 value-c1 value-c2 value-c3 rowkey-2 c1 c3 value-c1 value-c3 rowkey-3 c1 c2 c3 value-c1 value-c2 value-c3 c1 c2 c3 Partition Key
  • 16. “Dynamic” Tables – “Wide Row” 16 rowkey Billion of Rows rowkey-1 ckey-1:c1 ckey-1:c2 value-c1 value-c2 rowkey-2 rowkey-3 CREATE TABLE wide (rowkey text, ckey text, c1 text, c2 text, PRIMARY KEY (rowkey, ckey) WITH CLUSTERING ORDER BY (ckey ASC); ckey-2:c1 ckey-2:c2 value-c1 value-c2 ckey-3:c1 ckey-3:c2 value-c1 value-c2 ckey-1:c1 ckey-1:c2 value-c1 value-c2 ckey-2:c1 ckey-2:c2 value-c1 value-c2 ckey-1:c1 ckey-1:c2 value-c1 value-c2 ckey-2:c1 ckey-2:c2 value-c1 value-c2 ckey-3:c1 ckey-3:c2 value-c1 value-c2 1 2 Billion Partition Key Clustering Key
  • 18. Know your application => From query to model 18 Show Timeline of Tweets Show Timeseries on different levels of aggregation (resolution) • Seconds • Minute • Hours
  • 19. Show Timeline: Provide Raw Data (Tweets) 19 CREATE TABLE tweet (tweet_id bigint, username text, message text, hashtags list<text>, latitude double, longitude double, … PRIMARY KEY(tweet_id)); • Skinny Row Table • Holds the sensor raw data => Tweets • Similar to a relational table • Primary Key is the partition key 10000121 username message hashtags latitude longitude gschmutz Getting ready for .. [cassandra, nosql] 0 0 20121223 username message hashtags latitude longitude DataStax The Speed Factor .. [BigData 0 0 tweet_id Partition Key Clustering Key
  • 20. Show Timeline: Provide Raw Data (Tweets) 20 INSERT INTO tweet (tweet_id, username, message, hashtags, latitude, longitude) VALUES (10000121, 'gschmutz', 'Getting ready for my talk about using Cassandra for Timeseries and Graph Data', ['cassandra', 'nosql'], 0,0); SELECT tweet_id, username, hashtags, message FROM tweet WHERE tweet_id = 10000121 ; tweet_id | username | hashtag | message ---------+----------+------------------------+---------------------------- 10000121 | gschmutz | ['cassandra', 'nosql'] | Getting ready for ... 20121223 | DataStax | [’BigData’] | The Speed Factor ... Partition Key Clustering Key
  • 21. Show Timeline: Provide Sequence of Events 21 CREATE TABLE tweet_timeline ( sensor_id text, bucket_id text, time_id timestamp, tweet_id bigint, PRIMARY KEY((sensor_id, bucket_id), time_id)) WITH CLUSTERING ORDER BY (time_id DESC); Wide Row Table bucket-id creates buckets for columns • SECOND-2015-10-14 ABC-001:SECOND-2015-10-14 10:00:02:tweet-id 10000121 DEF-931:SECOND-2015-10-14 10:09:02:tweet-id 1003121343 09:12:09:tweet-id 1002111343 09:10:02:tweet-id 1001121343 Partition Key Clustering Key
  • 22. Show Timeline: Provide Sequence of Events 22 INSERT INTO tweet_timeline (sensor_id, bucket_id, time_id, tweet_id) VALUES ('ABC-001', 'SECOND-2015-10-14', '2015-09-30 10:50:00', 10000121 ); SELECT * from tweet_timeline WHERE sensor_id = 'ABC-001’ AND bucket_id = 'HOUR-2015-10' AND key = 'ALL’ AND time_id <= '2015-10-14 12:00:00'; sensor_id | bucket_id | time_id | tweet_id ----------+-------------------+--------------------------+---------- ABC-001 | SECOND-2015-10-14 | 2015-10-14 11:53:00+0000 | 10020334 ABC-001 | SECOND-2015-10-14 | 2015-10-14 10:52:00+0000 | 10000334 ABC-001 | SECOND-2015-10-14 | 2015-10-14 10:51:00+0000 | 10000127 ABC-001 | SECOND-2015-10-14 | 2015-10-14 10:50:00+0000 | 10000121 Sorted by time_id Partition Key Clustering Key
  • 23. Show Timeseries: Provide list of metrics 23 CREATE TABLE tweet_count ( sensor_id text, bucket_id text, key text, time_id timestamp, count counter, PRIMARY KEY((sensor_id, bucket_id), key, time_id)) WITH CLUSTERING ORDER BY (key ASC, time_id DESC); Wide Row Table bucket-id creates buckets for columns • SECOND-2015-10-14 • HOUR-2015-10 • DAY-2015-10 ABC-001:HOUR-2015-10 ALL:10:00:count 1’550 ABC-001:DAY-2015-10 ALL:14-OCT:count 105’999 ALL:13-OCT:count 120’344 nosql:14-OCT:count 2’532 ALL:09:00:count 2’299 nosql:08:00:count 25 30d * 24h * n keys = n * 720 cols Partition Key Clustering Key
  • 24. Show Timeseries: Provide list of metrics 24 UPDATE tweet_count SET count = count + 1 WHERE sensor_id = 'ABC-001’ AND bucket_id = 'HOUR-2015-10' AND key = 'ALL’ AND time_id = '2015-10-14 10:00:00'; SELECT * from tweet_count WHERE sensor_id = 'ABC-001' AND bucket_id = 'HOUR-2015-10' AND key = 'ALL' AND time_id >= '2015-10-14 08:00:00’; sensor_id | bucket_id | key | time_id | count ----------+--------------+-----+--------------------------+------- ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 12:00:00+0000 | 100230 ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 11:00:00+0000 | 102230 ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 10:00:00+0000 | 105430 ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 09:00:00+0000 | 203240 ABC-001 | HOUR-2015-10 | ALL | 2015-10-14 08:00:00+0000 | 132230 Partition Key Clustering Key
  • 25. Titan:db & Cassandra for Graph Data 25
  • 26. Introduction to the Graph Model – Property Graph Vertex (Node) • Represent Entities • Always have an ID • Can contain properties (key- value pairs) Edge (Relationship) • Lines between nodes • may be directed or undirected • Have IDs and properites Properties • Values about node or relationship • Allow to add semantic to relationships User 1 Tweet 2 author follow retweet User 2 Id: 16134540 name: cloudera location: Palo Alto Id: 18898576 name: gschmutz location: Berne Id: 18898999 text: CDH5 has been.. time: July 11 2015 time: June 11 2015 key: value 26 since: May 2012 Tweet 1 Id: 18898576 text: Join BigData.. time: June 11 2015 author
  • 28. TinkerPop 3 Stack TinkerPop is a framework composed of various interoperable components Vendor independent (similar to JDBC for RDBMS) Core API defines Graph, Vertex, Edge, … Gremlin traversal language is vendor- independent way to query (traverse) a graph Gremlin server can be leveraged to allow over the wire communication with a TinkerPop enabled graph system http://tinkerpop.incubator.apache.org/ 28
  • 29. Gremlin Graph Traversal Engine 29 Language / System agostic: many graph languages for many graph systems Provided Traversal Engine: SPARQL or any other graph query language on the Gremlin Traversal Machine Native distributed execution: A Gremlin Traversal over an OLAP Graph Processor (Hadoop / Spark)
  • 30. Gremlin in Action – Creating the Graph 30
  • 31. Gremlin in Action – Graph Traversal 31
  • 32. Gremlin in Action – Graph Traversal (II) 32
  • 33. Summary - Know your domain Connectedness of Datalow high Document Data Store Key-Value Stores Wide- Column Store Graph Databases Relational Databases