SlideShare a Scribd company logo
1 of 29
Download to read offline
How to make
Norikra perfect
Stream Processing Casual Talks #1 #streamctjp
Jul 22, 2016
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.
1. How Norikra is perfect
2. How to make Norikra more perfect
http://norikra.github.io/
Norikra:

Schema-less Stream Processing using SQL
• Server software, written in JRuby, runs on JVM
• Open source software (GPLv2)
• http://norikra.github.io/
• https://github.com/norikra/norikra
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”San Diego”
AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
{“user.age":35,"cnt":5},

{"user.age":36,"cnt":8}, ...
How Norikra is Perfect
• Ultra fast bootstrap
• Schema on read
• Handling complex (nested) events
• Dynamic query registration/unregistration
• Simple Web UI
• Data connector: Fluentd
• Extensible: UDF/Listener plugins
• Performance: good enough for small/middle site
Schema on Read
• Query first, Data next
• Query must know what it requires
• field names, types of fields, ...
• Platform can ingest any data into processor.

Query can fetch events which matches required
schema.
schema-less (mixed)
data stream
fields subset
for query A
fields subset
for query B
query A
query B
events from
billing service
events from
API endpoint
Architecture
Norikra Server (on JVM)
Esper Instance (Query Engine)
Type Definition

Manager
Output Event
Pool
Norikra Engine
RPC Server

mizuno (Jetty + Rack)
Rack RPC Handler
Norikra

Client
msgpack-
rpc-over-http
For details :)
• Norikra: Stream Processing with SQL

http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql
• Norikra: SQL Stream Processing in Ruby

http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby
• Norikra in Action

http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
• Landscape of Norikra Features

http://www.slideshare.net/tagomoris/norikra-meetup-features
• Norikra Recent Updates

http://www.slideshare.net/tagomoris/norikra-recent-updates
Recent Updates
• v1.4.0: Jul 19, 2016
• Add support for "-D" and "-agentlib" of JVM
• Update msgpack version
• Previous release v1.3.1: May 7, 2015
• Explained in "Norikra Recent Updates" slide
IS IT REALLY PERFECT!?
Good & Bad
• Good for startup:

Fast bootstrap, SQL, Web UI, Fluentd plugins, 

Handling complex events, ...
• Good for middle:

Dynamic query registration, Dynamic UDF loading,

Good performance enough for middle (10k events/sec),

Schema on read, ...
• Bad for big players:

No Distribution, No High availability,

Uncontrollable JVM/Esper behavior (CPU&Memory)
Tentative name:
Perfect Norikra
Perfect Norikra
• All features of Norikra
• Including "Ultra fast bootstrap"
• Compatible RPC API w/ original Norikra
• Distributed execution on any scheduler
• YARN? Mesos? or ...?
• Automatic failover & retry for failures (HA)
• Automated optimization for load balancing
• Dynamic scaling out

from 1 to 100 nodes - without any restarts/retries
Rough Sketch
RPC Server
RPC Handler
Type Definition Manager
Query Compiler
DAG Optimizer / Deoptimizer
DAG Executor
Event Router
Event Buffer
Queries
Events
Events
master node
processor node
Rough Sketch
• Brand new query executor
• SQL Parser
• Query compiler into DAG
• SQL operators as sub-DAGs (inspired by TimeStream)
• DAG executor
• Brand new dataflow manager / nodes
• Sync/Async data replication
• Barriers for event stream (inspired by Flink)
• Versioned routing/distribution
Dynamic Scaling Out
• Processing nodes are stateful
• state: limited by available memory size
• growing stream size -> memory overflow :-(
• Scaling strategy must be dynamic
• restarting queries (of static scaling) increases
latency
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 3nodes 3nodes
memory usage per node
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
memory overflow - CRASH!
Burst Traffic - failure
3nodes 3nodes 3nodes
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 3nodes 6nodes6nodes
Crash
Recovery
• After crash, restart the query w/ increased # of nodes
• After restart, query re-reads all data of that window
• After recovery, all nodes back to realtime calculation
Crash & Recovery Strategy(1)
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
Crash & Recovery Strategy(2)
3nodes 3nodes 6nodes6nodes
Crash
Recovery
• Pros: Very easy to implement
• Cons: Requires all data stored (distributed filesystem?)
• Cons: Hard to know # of nodes for increasing traffic
• Cons: Recovery state requires more nodes than normal state
Dynamic Scaling Out strategy(1)
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 5nodes5nodes 6nodes
intermediate result
3nodes
merge results

for final result
• Before crash, increase # of processing nodes
• Queries always produces intermediate results w/ # of distribution
• Query results should be produced by merging intermediate results
Dynamic Scaling Out strategy(2)
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 5nodes5nodes 6nodes
intermediate result
3nodes
merge results

for final result
• Pros: Less latency, less computing power
• Cons: All operator must support such calculation

- SQL !
For Dynamic Scaling Out
• De-optimization of operators
• Virtual nodes for routing
• ... and many others
Hard things
• Resource monitoring & limitation
• Multi-tenancy
• UDF and sandbox
• Queries without aggregations
Why not on Spark or Flink?
• Because of schema-less event processing

- it requires dataflow controlled by query manager
• Because of dynamic scaling

- it requires brand new dataflow layer
No Bytes Implemented :P
Stay Tuned!
We are hiring! by Treasure Data

More Related Content

What's hot

Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 

What's hot (20)

Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Docker and Fluentd (revised)
Docker and Fluentd (revised)Docker and Fluentd (revised)
Docker and Fluentd (revised)
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 

Viewers also liked (6)

Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 

Similar to How to Make Norikra Perfect

Similar to How to Make Norikra Perfect (20)

Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
RedisConf17 - Redis in High Traffic Adtech Stack
RedisConf17 - Redis in High Traffic Adtech StackRedisConf17 - Redis in High Traffic Adtech Stack
RedisConf17 - Redis in High Traffic Adtech Stack
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
 
How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 

More from SATOSHI TAGOMORI

More from SATOSHI TAGOMORI (13)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

How to Make Norikra Perfect

  • 1. How to make Norikra perfect Stream Processing Casual Talks #1 #streamctjp Jul 22, 2016 Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  • 3.
  • 4. 1. How Norikra is perfect 2. How to make Norikra more perfect
  • 6. Norikra:
 Schema-less Stream Processing using SQL • Server software, written in JRuby, runs on JVM • Open source software (GPLv2) • http://norikra.github.io/ • https://github.com/norikra/norikra
  • 7. SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } {“user.age":35,"cnt":5},
 {"user.age":36,"cnt":8}, ...
  • 8. How Norikra is Perfect • Ultra fast bootstrap • Schema on read • Handling complex (nested) events • Dynamic query registration/unregistration • Simple Web UI • Data connector: Fluentd • Extensible: UDF/Listener plugins • Performance: good enough for small/middle site
  • 9. Schema on Read • Query first, Data next • Query must know what it requires • field names, types of fields, ... • Platform can ingest any data into processor.
 Query can fetch events which matches required schema. schema-less (mixed) data stream fields subset for query A fields subset for query B query A query B events from billing service events from API endpoint
  • 10. Architecture Norikra Server (on JVM) Esper Instance (Query Engine) Type Definition Manager Output Event Pool Norikra Engine RPC Server mizuno (Jetty + Rack) Rack RPC Handler Norikra Client msgpack- rpc-over-http
  • 11. For details :) • Norikra: Stream Processing with SQL
 http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql • Norikra: SQL Stream Processing in Ruby
 http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby • Norikra in Action
 http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring • Landscape of Norikra Features
 http://www.slideshare.net/tagomoris/norikra-meetup-features • Norikra Recent Updates
 http://www.slideshare.net/tagomoris/norikra-recent-updates
  • 12. Recent Updates • v1.4.0: Jul 19, 2016 • Add support for "-D" and "-agentlib" of JVM • Update msgpack version • Previous release v1.3.1: May 7, 2015 • Explained in "Norikra Recent Updates" slide
  • 13. IS IT REALLY PERFECT!?
  • 14. Good & Bad • Good for startup:
 Fast bootstrap, SQL, Web UI, Fluentd plugins, 
 Handling complex events, ... • Good for middle:
 Dynamic query registration, Dynamic UDF loading,
 Good performance enough for middle (10k events/sec),
 Schema on read, ... • Bad for big players:
 No Distribution, No High availability,
 Uncontrollable JVM/Esper behavior (CPU&Memory)
  • 16. Perfect Norikra • All features of Norikra • Including "Ultra fast bootstrap" • Compatible RPC API w/ original Norikra • Distributed execution on any scheduler • YARN? Mesos? or ...? • Automatic failover & retry for failures (HA) • Automated optimization for load balancing • Dynamic scaling out
 from 1 to 100 nodes - without any restarts/retries
  • 17. Rough Sketch RPC Server RPC Handler Type Definition Manager Query Compiler DAG Optimizer / Deoptimizer DAG Executor Event Router Event Buffer Queries Events Events master node processor node
  • 18. Rough Sketch • Brand new query executor • SQL Parser • Query compiler into DAG • SQL operators as sub-DAGs (inspired by TimeStream) • DAG executor • Brand new dataflow manager / nodes • Sync/Async data replication • Barriers for event stream (inspired by Flink) • Versioned routing/distribution
  • 19. Dynamic Scaling Out • Processing nodes are stateful • state: limited by available memory size • growing stream size -> memory overflow :-( • Scaling strategy must be dynamic • restarting queries (of static scaling) increases latency
  • 20. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 3nodes 3nodes memory usage per node
  • 21. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 memory overflow - CRASH! Burst Traffic - failure 3nodes 3nodes 3nodes
  • 22. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 3nodes 6nodes6nodes Crash Recovery • After crash, restart the query w/ increased # of nodes • After restart, query re-reads all data of that window • After recovery, all nodes back to realtime calculation Crash & Recovery Strategy(1)
  • 23. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 Crash & Recovery Strategy(2) 3nodes 3nodes 6nodes6nodes Crash Recovery • Pros: Very easy to implement • Cons: Requires all data stored (distributed filesystem?) • Cons: Hard to know # of nodes for increasing traffic • Cons: Recovery state requires more nodes than normal state
  • 24. Dynamic Scaling Out strategy(1) Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 5nodes5nodes 6nodes intermediate result 3nodes merge results
 for final result • Before crash, increase # of processing nodes • Queries always produces intermediate results w/ # of distribution • Query results should be produced by merging intermediate results
  • 25. Dynamic Scaling Out strategy(2) Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 5nodes5nodes 6nodes intermediate result 3nodes merge results
 for final result • Pros: Less latency, less computing power • Cons: All operator must support such calculation
 - SQL !
  • 26. For Dynamic Scaling Out • De-optimization of operators • Virtual nodes for routing • ... and many others
  • 27. Hard things • Resource monitoring & limitation • Multi-tenancy • UDF and sandbox • Queries without aggregations
  • 28. Why not on Spark or Flink? • Because of schema-less event processing
 - it requires dataflow controlled by query manager • Because of dynamic scaling
 - it requires brand new dataflow layer
  • 29. No Bytes Implemented :P Stay Tuned! We are hiring! by Treasure Data