6. Norikra:
Schema-less Stream Processing using SQL
• Server software, written in JRuby, runs on JVM
• Open source software (GPLv2)
• http://norikra.github.io/
• https://github.com/norikra/norikra
7. SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”San Diego”
AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
{“user.age":35,"cnt":5},
{"user.age":36,"cnt":8}, ...
8. How Norikra is Perfect
• Ultra fast bootstrap
• Schema on read
• Handling complex (nested) events
• Dynamic query registration/unregistration
• Simple Web UI
• Data connector: Fluentd
• Extensible: UDF/Listener plugins
• Performance: good enough for small/middle site
9. Schema on Read
• Query first, Data next
• Query must know what it requires
• field names, types of fields, ...
• Platform can ingest any data into processor.
Query can fetch events which matches required
schema.
schema-less (mixed)
data stream
fields subset
for query A
fields subset
for query B
query A
query B
events from
billing service
events from
API endpoint
10. Architecture
Norikra Server (on JVM)
Esper Instance (Query Engine)
Type Definition
Manager
Output Event
Pool
Norikra Engine
RPC Server
mizuno (Jetty + Rack)
Rack RPC Handler
Norikra
Client
msgpack-
rpc-over-http
11. For details :)
• Norikra: Stream Processing with SQL
http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql
• Norikra: SQL Stream Processing in Ruby
http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby
• Norikra in Action
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
• Landscape of Norikra Features
http://www.slideshare.net/tagomoris/norikra-meetup-features
• Norikra Recent Updates
http://www.slideshare.net/tagomoris/norikra-recent-updates
12. Recent Updates
• v1.4.0: Jul 19, 2016
• Add support for "-D" and "-agentlib" of JVM
• Update msgpack version
• Previous release v1.3.1: May 7, 2015
• Explained in "Norikra Recent Updates" slide
14. Good & Bad
• Good for startup:
Fast bootstrap, SQL, Web UI, Fluentd plugins,
Handling complex events, ...
• Good for middle:
Dynamic query registration, Dynamic UDF loading,
Good performance enough for middle (10k events/sec),
Schema on read, ...
• Bad for big players:
No Distribution, No High availability,
Uncontrollable JVM/Esper behavior (CPU&Memory)
16. Perfect Norikra
• All features of Norikra
• Including "Ultra fast bootstrap"
• Compatible RPC API w/ original Norikra
• Distributed execution on any scheduler
• YARN? Mesos? or ...?
• Automatic failover & retry for failures (HA)
• Automated optimization for load balancing
• Dynamic scaling out
from 1 to 100 nodes - without any restarts/retries
17. Rough Sketch
RPC Server
RPC Handler
Type Definition Manager
Query Compiler
DAG Optimizer / Deoptimizer
DAG Executor
Event Router
Event Buffer
Queries
Events
Events
master node
processor node
18. Rough Sketch
• Brand new query executor
• SQL Parser
• Query compiler into DAG
• SQL operators as sub-DAGs (inspired by TimeStream)
• DAG executor
• Brand new dataflow manager / nodes
• Sync/Async data replication
• Barriers for event stream (inspired by Flink)
• Versioned routing/distribution
19. Dynamic Scaling Out
• Processing nodes are stateful
• state: limited by available memory size
• growing stream size -> memory overflow :-(
• Scaling strategy must be dynamic
• restarting queries (of static scaling) increases
latency
22. Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 3nodes 6nodes6nodes
Crash
Recovery
• After crash, restart the query w/ increased # of nodes
• After restart, query re-reads all data of that window
• After recovery, all nodes back to realtime calculation
Crash & Recovery Strategy(1)
23. Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
Crash & Recovery Strategy(2)
3nodes 3nodes 6nodes6nodes
Crash
Recovery
• Pros: Very easy to implement
• Cons: Requires all data stored (distributed filesystem?)
• Cons: Hard to know # of nodes for increasing traffic
• Cons: Recovery state requires more nodes than normal state
24. Dynamic Scaling Out strategy(1)
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 5nodes5nodes 6nodes
intermediate result
3nodes
merge results
for final result
• Before crash, increase # of processing nodes
• Queries always produces intermediate results w/ # of distribution
• Query results should be produced by merging intermediate results
25. Dynamic Scaling Out strategy(2)
Query: COUNT(DISTINCT uid) per 1day
7/1 7/2 7/3 7/4
3nodes 5nodes5nodes 6nodes
intermediate result
3nodes
merge results
for final result
• Pros: Less latency, less computing power
• Cons: All operator must support such calculation
- SQL !
26. For Dynamic Scaling Out
• De-optimization of operators
• Virtual nodes for routing
• ... and many others
27. Hard things
• Resource monitoring & limitation
• Multi-tenancy
• UDF and sandbox
• Queries without aggregations
28. Why not on Spark or Flink?
• Because of schema-less event processing
- it requires dataflow controlled by query manager
• Because of dynamic scaling
- it requires brand new dataflow layer