SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Norikra: 
Stream Processing With SQL 
2014/09/13 
HadoopCon 2014 Taiwan 
Satoshi Tagomori (@tagomoris)
Satoshi Tagomori (@tagomoris) 
LINE Corporation 
Analytics Platform Team
THE ONE THING 
WHAT YOU MUST LEAN 
TODAY IS
Norikra
Norikra 
IS NOT 
Norika
Topics 
Basics of stream processing 
Stream processing with SQL 
Norikra overview 
Norikra queries 
Use cases in production
Stream Processing 
Less latency 
Less computing power 
No query schedule management
Data Flow And Latency 
data window 
query execution 
Batch Stream 
incremental 
query execution
Query For Stored Data 
table 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
At first, all data 
MUST be stored.
Query For Stored Data 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table 
WHERE v3=’x’ GROUP BY v1,v2 
table
Query For Stored Data 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table 
WHERE v3=’x’ GROUP BY v1,v2 
table 
SELECT v4,COUNT(*) 
FROM table 
WHERE v1 AND v2 GROUP BY v4
Query For Stored Data 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table 
WHERE v3=’x’ GROUP BY v1,v2 
table 
SELECT v4,COUNT(*) 
FROM table 
WHERE v1 AND v2 GROUP BY v4 
“All data” means 
“data that will not be used”.
Query For Stream Data 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
SELECT v4,COUNT(*) 
FROM table.win:xxx 
WHERE v1 AND v2 GROUP BY v4 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6
Query For Stream Data 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
SELECT v4,COUNT(*) 
FROM table.win:xxx 
WHERE v1 AND v2 GROUP BY v4 
v1,v2,v3 
v1,v2,v4 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6
Query For Stream Data 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
SELECT v4,COUNT(*) 
FROM table.win:xxx 
WHERE v1 AND v2 GROUP BY v4 
v1,v2,v3 
v1,v2,v3,v4,v5,v6 v1,v2,v4 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6
Query For Stream Data 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
SELECT v4,COUNT(*) 
FROM table.win:xxx 
WHERE v1 AND v2 GROUP BY v4 
v1,v2,v3 
v1,v2,v4 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
All data will be discarded 
right after insertion. 
(Bye-bye storage system maintenance!)
Incremental Calculation 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
internal data (memory) 
v1 v2 COUNT 
TRUE TRUE 0 
TRUE FALSE 1 
FALSE TRUE 33 
FALSE FALSE 2
Incremental Calculation 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
internal data (memory) 
v1 v2 COUNT 
TRUE TRUE 1 
TRUE FALSE 1 
FALSE TRUE 33 
FALSE FALSE 2
Incremental Calculation 
v1,v2,v3,v4,v5,v6 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
internal data (memory) 
v1 v2 COUNT 
TRUE TRUE 1 
TRUE FALSE 1 
FALSE TRUE 34 
FALSE FALSE 2
Incremental Calculation 
SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2 
stream 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
v1,v2,v3,v4,v5,v6 
internal data (memory) 
v1 v2 COUNT 
TRUE TRUE 1 
TRUE FALSE 2 
FALSE TRUE 37 
FALSE FALSE 3 
memory can store 
internal data
Data Window 
Target time (or size) range of queries 
Batch 
FROM-TO: WHERE dt >= ‘2014-09-13 13:30:00‘ 
AND dt < ‘2014-09-13 14:20:00’ 
Stream 
“Calculate this query every 50 minutes” 
Extended SQL required SELECT v1,v2,COUNT(*) 
FROM table.win:xxx 
WHERE v3=’x’ GROUP BY v1,v2
Stream Processing With SQL 
Esper: Java library to process stream 
needs to be implemented in Java 
daemon code 
With schema for data/query 
OSS under GPLv2 
http://esper.codehaus.org/
Esper EPL 
Select values of height and weight 
for all events with age larger than 30 
SELECT height, weight 
FROM tbl 
WHERE age > 30
Esper EPL 
Count records group by height value 
for events with age larger than 30 
SELECT height, COUNT(*) AS c 
FROM tbl 
WHERE age > 30 
GROUP BY height 
This query doesn’t 
ever produce results
Esper EPL 
Count records group by height value 
for events with age larger than 30 
per every 1 hour 
SELECT height, COUNT(*) AS c 
FROM tbl.win:time_batch(1 hour) 
WHERE age > 30 
GROUP BY height
With/without Schema 
Schema-full data: 
strict schema: predefined fields w/ types (or reject) 
schema on read: try to read known fields (or ignore) 
Schema-less data: 
Any field (or ignore), any type (implicit/explicit conversion) 
fit for services under development: 
All internet services including us!
Stream Processing & Schema 
Queries first, data second 
for all stream processing 
Queries automatically know what fields to query 
schema-less (mixed) 
data stream 
fields subset 
for query A 
fields subset 
for query B 
query A 
query B 
events from 
API endpoint 
events from 
billing service 
events of service X 
TO BE
break.
Norikra: 
Schema-less Stream Processing with SQL 
Server software, runs on JVM 
Open source software (GPLv2) 
http://norikra.github.io/ 
https://github.com/norikra/norikra
Norikra: 
Schema-less event stream: 
Add/Remove data fields whenever you want 
SQL: 
No more restarts to add/remove queries 
w/ JOINs, w/ SubQueries 
w/ UDF (in Java/Ruby from rubygem) 
Truly Complex events: 
Nested Hash/Array, accessible directly from SQL 
HTTP RPC w/ JSON or MessagePack (fluentd plugin available!)
How To Setup Norikra: 
Install JRuby 
download jruby.tar.gz, extract it and export $PATH 
use rbenv 
rbenv install jruby-1.7.xx 
rbenv shell jruby-.. 
Install Norikra 
gem install norikra 
Execute Norikra server 
norikra start
Norikra Interface: 
Command line: norikra-client 
norikra-client target open ... 
norikra-client query add ... 
tail -f ... | norikra-client event send ... 
WebUI 
show status 
show/add/remove queries 
HTTP API 
JSON, MessagePack
Norikra Queries: (1) 
SELECT name, age 
FROM events 
target
Norikra Queries: (1) 
{“name”:”tagomoris”, 
“age”:34, “address”:”Tokyo”, 
“corp”:”LINE”, “current”:”Taipei”} 
SELECT name, age 
FROM events 
{“name”:”tagomoris”,”age”:34}
Norikra Queries: (1) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”Taipei”} 
without “age” 
SELECT name, age 
FROM events 
nothing
Norikra Queries: (2) 
{“name”:”tagomoris”, 
“age”:34, “address”:”Tokyo”, 
“corp”:”LINE”, “current”:”Taipei”} 
SELECT name, age 
FROM events 
WHERE current=”Taipei” 
{“name”:”tagomoris”,”age”:34}
Norikra Queries: (2) 
{“name”:”hadoop”, 
“age”:99, “address”:”Somewhere”, 
“corp”:”ASF”, “current”:”Elsewhere”} 
SELECT name, age 
FROM events 
WHERE current=”Taipei” 
nothing
Norikra Queries: (3) 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age
Norikra Queries: (3) 
{“name”:”tagomoris”, 
“age”:34, “address”:”Tokyo”, 
“corp”:”LINE”, “current”:”Taipei”} 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age 
every 5 mins 
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
Norikra Queries: (4) 
{“name”:”tagomoris”, 
“age”:34, “address”:”Tokyo”, 
“corp”:”LINE”, “current”:”Taipei”} 
SELECT age, COUNT(*) as cnt 
FROM 
events.win:time_batch(5 mins) 
GROUP BY age 
SELECT max(age) as max 
FROM 
events.win:time_batch(5 mins) 
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... 
{“max”:51} 
every 5 mins
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:34, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”Taipei”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:34, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”Taipei”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT user.age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY user.age
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:34, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”Taipei”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT user.age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
WHERE current=”Taipei” 
AND attend.$0 AND attend.$1 
GROUP BY user.age
break. 
next: use cases
Use case 1: 
External API call reports for partners (LINE) 
External API call for LINE Business Connect 
LINE backend sends requests to partner’s API 
endpoint using users’ messages 
http://developers.linecorp.com/blog/?p=3386
Use case 1: 
External API call reports for partners (LINE) 
API error response summaries 
http://developers.linecorp.com/blog/?p=3386
Use case 1: 
External API call reports for partners (LINE) 
channel 
gateway 
partner’s 
server 
logs 
query 
results 
MySQL Mail 
SELECT 
channelId 
AS 
channel_id, 
reason, 
detail, 
count(*) 
AS 
error_count, 
min(timestamp) 
AS 
first_timestamp, 
max(timestamp) 
AS 
last_timestamp 
FROM 
api_error_log.win:time_batch(60 
sec) 
GROUP 
BY 
channelId,reason,detail 
HAVING 
count(*) 
> 
0 
http://developers.linecorp.com/blog/?p=3386
Use case 2: 
Prompt reports for Ad service console 
Prompt reports with Norikra + Fixed reports with Hive 
app 
serverapp 
serverapp 
server 
app 
serverapp 
serverapp 
server 
Fluentd 
HDFS 
console 
service 
execute hive query 
(daily) 
fetch query results 
(frequently) 
impression 
logs
Use case 2: 
Prompt reports for Ad service console 
Hive query for fixed reports 
SELECT 
yyyymmdd, 
hh, 
campaign_id, 
region, 
lang, 
COUNT(*) 
AS 
click, 
COUNT(DISTINCT 
member_id) 
AS 
uu 
FROM 
( 
SELECT 
yyyymmdd, 
hh, 
get_json_object(log, 
'$.campaign.id') 
AS 
campaign_id, 
get_json_object(log, 
'$.member.region') 
AS 
region, 
get_json_object(log, 
'$.member.lang') 
AS 
lang, 
get_json_object(log, 
'$.member.id') 
AS 
member_id 
FROM 
applog 
WHERE 
service='myservice' 
AND 
yyyymmdd='20140913' 
AND 
get_json_object(log, 
'$.type')='click' 
) 
x 
GROUP 
BY 
yyyymmdd, 
hh, 
campaign_id, 
region, 
lang
Use case 2: 
Prompt reports for Ad service console 
Norikra query for prompt reports 
SELECT 
campaign.id 
AS 
campaign_id, 
member.region 
AS 
region, 
member.lang 
AS 
lang, 
COUNT(*) 
AS 
click, 
COUNT(DISTINCT 
member.id) 
AS 
uu 
FROM 
myservice.win:time_batch(1 
hours) 
WHERE 
type="click" 
GROUP 
BY 
campaign.id, 
member.region, 
member.lang
Use case 3: 
Realtime access dashboard on Google Platform 
Access log visualization 
Count using Norikra (2-step), Store on Google BigQuery 
Dashboard on Google Spreadsheet + Apps Script 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw
Use case 3: 
Realtime access dashboard on Google Platform 
Server 
Fluentd 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw 
ngnix 
access log 
access logs 
to BigQuery 
norikra query results 
norikra query to aggregate node 
to aggregate locally
Use case 3: 
Realtime access dashboard on Google Platform 
Fluentd 
logs to store 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw 
ngnix 
70 servers, 120,000 requests/sec (or more!) 
ngngninxix ngngninxix ngngninxix ngngninxix 
ngngninxix ngngninxix ngngninxix ngngninxix ngnix 
Google 
BigQuery 
Google 
Spreadsheet 
+ Apps script 
... 
counts per host 
total count
More queries, more simplicity 
and less latency. 
Thanks! 
photo: by my co-workers
See also: 
http://norikra.github.io/ 
“Stream processing and Norikra” 
http://www.slideshare.net/tagomoris/stream-processing-and-norikra 
“Batch processing and Stream processing by SQL” 
http://www.slideshare.net/tagomoris/hcj2014-sql 
“Log analysis systems and its designs in LINE Corp 2014 Early” 
http://www.slideshare.net/tagomoris/log-analysis-system-and-its-designs-in-line- 
corp-2014-early 
“Norikra in Action” 
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
HA? Distributed? 
NO! 
I have some idea, but I have no time to implement it 
There are no needs for HA/Distributed processing
Data flow & API? 
Use Fluentd!
Scalability? 
10,000 - 100,000 events/sec 
on 2CPU 8Core server
Storm or Norikra? 
Simple and fixed workload for huge traffic 
Use Storm! 
Complex and fragile workload for non-huge traffic 
Use Norikra!

Mais conteúdo relacionado

Mais procurados

Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Konrad Malawski
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsStephane Manciot
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...InfluxData
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraPiotr Kolaczkowski
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidYousun Jeong
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustAltinity Ltd
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 

Mais procurados (20)

Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Scala+data
Scala+dataScala+data
Scala+data
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druid
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 

Destaque

Norikra in Action (ver. 2014 spring)
Norikra in Action (ver. 2014 spring)Norikra in Action (ver. 2014 spring)
Norikra in Action (ver. 2014 spring)SATOSHI TAGOMORI
 
fluent-plugin-norikra #fluentdcasual
fluent-plugin-norikra #fluentdcasualfluent-plugin-norikra #fluentdcasual
fluent-plugin-norikra #fluentdcasualSATOSHI TAGOMORI
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 
BigQuery, Fluentd and tagomoris #gcpja
BigQuery, Fluentd and tagomoris #gcpjaBigQuery, Fluentd and tagomoris #gcpja
BigQuery, Fluentd and tagomoris #gcpjaSATOSHI TAGOMORI
 
運用とデータ分析の遠くて近い関係、ISUCONを添えて
運用とデータ分析の遠くて近い関係、ISUCONを添えて運用とデータ分析の遠くて近い関係、ISUCONを添えて
運用とデータ分析の遠くて近い関係、ISUCONを添えてSATOSHI TAGOMORI
 
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知
Norikra + Fluentd+ Elasticsearch + Kibana リアルタイムストリーミング処理ログ集計による異常検知Norikra + Fluentd+ Elasticsearch + Kibana リアルタイムストリーミング処理ログ集計による異常検知
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知daisuke-a-matsui
 
Reporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseReporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseAlain Ganuchaud
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
Norikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LT
Norikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LTNorikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LT
Norikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LTMasahiro Nagano
 
Hadoop and Kerberos
Hadoop and KerberosHadoop and Kerberos
Hadoop and KerberosYuta Imai
 

Destaque (14)

Norikra in Action (ver. 2014 spring)
Norikra in Action (ver. 2014 spring)Norikra in Action (ver. 2014 spring)
Norikra in Action (ver. 2014 spring)
 
fluent-plugin-norikra #fluentdcasual
fluent-plugin-norikra #fluentdcasualfluent-plugin-norikra #fluentdcasual
fluent-plugin-norikra #fluentdcasual
 
Invitation for v1.0.0
Invitation for v1.0.0Invitation for v1.0.0
Invitation for v1.0.0
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
BigQuery, Fluentd and tagomoris #gcpja
BigQuery, Fluentd and tagomoris #gcpjaBigQuery, Fluentd and tagomoris #gcpja
BigQuery, Fluentd and tagomoris #gcpja
 
Norikra in action
Norikra in actionNorikra in action
Norikra in action
 
運用とデータ分析の遠くて近い関係、ISUCONを添えて
運用とデータ分析の遠くて近い関係、ISUCONを添えて運用とデータ分析の遠くて近い関係、ISUCONを添えて
運用とデータ分析の遠くて近い関係、ISUCONを添えて
 
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知
Norikra + Fluentd+ Elasticsearch + Kibana リアルタイムストリーミング処理ログ集計による異常検知Norikra + Fluentd+ Elasticsearch + Kibana リアルタイムストリーミング処理ログ集計による異常検知
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知
 
Reporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseReporting Large Environment Zabbix Database
Reporting Large Environment Zabbix Database
 
Fluentd and WebHDFS
Fluentd and WebHDFSFluentd and WebHDFS
Fluentd and WebHDFS
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Norikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LT
Norikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LTNorikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LT
Norikraで作るPHPの例外検知システム YAPC::Asia Tokyo 2015 LT
 
Hadoop and Kerberos
Hadoop and KerberosHadoop and Kerberos
Hadoop and Kerberos
 

Semelhante a Norikra: Stream Processing with SQL

Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemSages
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your dataGerald Muecke
 
Kick Your Database to the Curb
Kick Your Database to the CurbKick Your Database to the Curb
Kick Your Database to the CurbBill Bejeck
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusInfluxData
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleMariaDB plc
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for CassandraEdward Capriolo
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Scalding Big (Ad)ta
Scalding Big (Ad)taScalding Big (Ad)ta
Scalding Big (Ad)tab0ris_1
 
Visual sedimentation - IEEE VIS 2013 Atlanta
Visual sedimentation - IEEE VIS 2013 AtlantaVisual sedimentation - IEEE VIS 2013 Atlanta
Visual sedimentation - IEEE VIS 2013 AtlantaSamuel Huron
 
(NET301) New Capabilities for Amazon Virtual Private Cloud
(NET301) New Capabilities for Amazon Virtual Private Cloud(NET301) New Capabilities for Amazon Virtual Private Cloud
(NET301) New Capabilities for Amazon Virtual Private CloudAmazon Web Services
 
Moving beyond moving bytes
Moving beyond moving bytesMoving beyond moving bytes
Moving beyond moving bytesSuneel Marthi
 
Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...
Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...
Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...Flink Forward
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQLPeter Eisentraut
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryDataWorks Summit
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区yiditushe
 

Semelhante a Norikra: Stream Processing with SQL (20)

Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your data
 
Kick Your Database to the Curb
Kick Your Database to the CurbKick Your Database to the Curb
Kick Your Database to the Curb
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Scalding Big (Ad)ta
Scalding Big (Ad)taScalding Big (Ad)ta
Scalding Big (Ad)ta
 
Visual sedimentation - IEEE VIS 2013 Atlanta
Visual sedimentation - IEEE VIS 2013 AtlantaVisual sedimentation - IEEE VIS 2013 Atlanta
Visual sedimentation - IEEE VIS 2013 Atlanta
 
(NET301) New Capabilities for Amazon Virtual Private Cloud
(NET301) New Capabilities for Amazon Virtual Private Cloud(NET301) New Capabilities for Amazon Virtual Private Cloud
(NET301) New Capabilities for Amazon Virtual Private Cloud
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Moving beyond moving bytes
Moving beyond moving bytesMoving beyond moving bytes
Moving beyond moving bytes
 
Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...
Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...
Flink Forward Berlin 2017: Joey Frazee, Suneel Marthi - Moving Beyond Moving ...
 
Xbfs HPDC'2019
Xbfs HPDC'2019Xbfs HPDC'2019
Xbfs HPDC'2019
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQL
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
 

Mais de SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd SeasonSATOSHI TAGOMORI
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToSATOSHI TAGOMORI
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 

Mais de SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 

Último

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Norikra: Stream Processing with SQL

  • 1. Norikra: Stream Processing With SQL 2014/09/13 HadoopCon 2014 Taiwan Satoshi Tagomori (@tagomoris)
  • 2. Satoshi Tagomori (@tagomoris) LINE Corporation Analytics Platform Team
  • 3. THE ONE THING WHAT YOU MUST LEAN TODAY IS
  • 5. Norikra IS NOT Norika
  • 6. Topics Basics of stream processing Stream processing with SQL Norikra overview Norikra queries Use cases in production
  • 7. Stream Processing Less latency Less computing power No query schedule management
  • 8. Data Flow And Latency data window query execution Batch Stream incremental query execution
  • 9. Query For Stored Data table v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 At first, all data MUST be stored.
  • 10. Query For Stored Data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table WHERE v3=’x’ GROUP BY v1,v2 table
  • 11. Query For Stored Data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table WHERE v3=’x’ GROUP BY v1,v2 table SELECT v4,COUNT(*) FROM table WHERE v1 AND v2 GROUP BY v4
  • 12. Query For Stored Data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table WHERE v3=’x’ GROUP BY v1,v2 table SELECT v4,COUNT(*) FROM table WHERE v1 AND v2 GROUP BY v4 “All data” means “data that will not be used”.
  • 13. Query For Stream Data v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6
  • 14. Query For Stream Data v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3 v1,v2,v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6
  • 15. Query For Stream Data v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3 v1,v2,v3,v4,v5,v6 v1,v2,v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6
  • 16. Query For Stream Data v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3 v1,v2,v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 All data will be discarded right after insertion. (Bye-bye storage system maintenance!)
  • 17. Incremental Calculation v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 0 TRUE FALSE 1 FALSE TRUE 33 FALSE FALSE 2
  • 18. Incremental Calculation v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 1 TRUE FALSE 1 FALSE TRUE 33 FALSE FALSE 2
  • 19. Incremental Calculation v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 1 TRUE FALSE 1 FALSE TRUE 34 FALSE FALSE 2
  • 20. Incremental Calculation SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 1 TRUE FALSE 2 FALSE TRUE 37 FALSE FALSE 3 memory can store internal data
  • 21. Data Window Target time (or size) range of queries Batch FROM-TO: WHERE dt >= ‘2014-09-13 13:30:00‘ AND dt < ‘2014-09-13 14:20:00’ Stream “Calculate this query every 50 minutes” Extended SQL required SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2
  • 22. Stream Processing With SQL Esper: Java library to process stream needs to be implemented in Java daemon code With schema for data/query OSS under GPLv2 http://esper.codehaus.org/
  • 23. Esper EPL Select values of height and weight for all events with age larger than 30 SELECT height, weight FROM tbl WHERE age > 30
  • 24. Esper EPL Count records group by height value for events with age larger than 30 SELECT height, COUNT(*) AS c FROM tbl WHERE age > 30 GROUP BY height This query doesn’t ever produce results
  • 25. Esper EPL Count records group by height value for events with age larger than 30 per every 1 hour SELECT height, COUNT(*) AS c FROM tbl.win:time_batch(1 hour) WHERE age > 30 GROUP BY height
  • 26. With/without Schema Schema-full data: strict schema: predefined fields w/ types (or reject) schema on read: try to read known fields (or ignore) Schema-less data: Any field (or ignore), any type (implicit/explicit conversion) fit for services under development: All internet services including us!
  • 27. Stream Processing & Schema Queries first, data second for all stream processing Queries automatically know what fields to query schema-less (mixed) data stream fields subset for query A fields subset for query B query A query B events from API endpoint events from billing service events of service X TO BE
  • 28.
  • 30.
  • 31. Norikra: Schema-less Stream Processing with SQL Server software, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra
  • 32. Norikra: Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby from rubygem) Truly Complex events: Nested Hash/Array, accessible directly from SQL HTTP RPC w/ JSON or MessagePack (fluentd plugin available!)
  • 33. How To Setup Norikra: Install JRuby download jruby.tar.gz, extract it and export $PATH use rbenv rbenv install jruby-1.7.xx rbenv shell jruby-.. Install Norikra gem install norikra Execute Norikra server norikra start
  • 34. Norikra Interface: Command line: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack
  • 35. Norikra Queries: (1) SELECT name, age FROM events target
  • 36. Norikra Queries: (1) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Taipei”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:34}
  • 37. Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Taipei”} without “age” SELECT name, age FROM events nothing
  • 38. Norikra Queries: (2) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Taipei”} SELECT name, age FROM events WHERE current=”Taipei” {“name”:”tagomoris”,”age”:34}
  • 39. Norikra Queries: (2) {“name”:”hadoop”, “age”:99, “address”:”Somewhere”, “corp”:”ASF”, “current”:”Elsewhere”} SELECT name, age FROM events WHERE current=”Taipei” nothing
  • 40. Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 41. Norikra Queries: (3) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Taipei”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
  • 42. Norikra Queries: (4) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Taipei”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age SELECT max(age) as max FROM events.win:time_batch(5 mins) {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... {“max”:51} every 5 mins
  • 43. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Taipei”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 44. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Taipei”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age
  • 45. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Taipei”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Taipei” AND attend.$0 AND attend.$1 GROUP BY user.age
  • 47. Use case 1: External API call reports for partners (LINE) External API call for LINE Business Connect LINE backend sends requests to partner’s API endpoint using users’ messages http://developers.linecorp.com/blog/?p=3386
  • 48. Use case 1: External API call reports for partners (LINE) API error response summaries http://developers.linecorp.com/blog/?p=3386
  • 49. Use case 1: External API call reports for partners (LINE) channel gateway partner’s server logs query results MySQL Mail SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestamp FROM api_error_log.win:time_batch(60 sec) GROUP BY channelId,reason,detail HAVING count(*) > 0 http://developers.linecorp.com/blog/?p=3386
  • 50. Use case 2: Prompt reports for Ad service console Prompt reports with Norikra + Fixed reports with Hive app serverapp serverapp server app serverapp serverapp server Fluentd HDFS console service execute hive query (daily) fetch query results (frequently) impression logs
  • 51. Use case 2: Prompt reports for Ad service console Hive query for fixed reports SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang
  • 52. Use case 2: Prompt reports for Ad service console Norikra query for prompt reports SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region, member.lang
  • 53. Use case 3: Realtime access dashboard on Google Platform Access log visualization Count using Norikra (2-step), Store on Google BigQuery Dashboard on Google Spreadsheet + Apps Script http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw
  • 54. Use case 3: Realtime access dashboard on Google Platform Server Fluentd http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix access log access logs to BigQuery norikra query results norikra query to aggregate node to aggregate locally
  • 55. Use case 3: Realtime access dashboard on Google Platform Fluentd logs to store http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix 70 servers, 120,000 requests/sec (or more!) ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngnix Google BigQuery Google Spreadsheet + Apps script ... counts per host total count
  • 56. More queries, more simplicity and less latency. Thanks! photo: by my co-workers
  • 57. See also: http://norikra.github.io/ “Stream processing and Norikra” http://www.slideshare.net/tagomoris/stream-processing-and-norikra “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Log analysis systems and its designs in LINE Corp 2014 Early” http://www.slideshare.net/tagomoris/log-analysis-system-and-its-designs-in-line- corp-2014-early “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
  • 58. HA? Distributed? NO! I have some idea, but I have no time to implement it There are no needs for HA/Distributed processing
  • 59. Data flow & API? Use Fluentd!
  • 60. Scalability? 10,000 - 100,000 events/sec on 2CPU 8Core server
  • 61. Storm or Norikra? Simple and fixed workload for huge traffic Use Storm! Complex and fragile workload for non-huge traffic Use Norikra!