An introduction and status update on Redis' upcoming new data structure - Stream - that is not unlike a log, has some Apache Kafka-like thingamagigs and can be also used for time series data
2. Who We Are
Open source. The leading in-memory database platform,
supporting any high performance operational, analytics or
hybrid use case.
The open source home and commercial provider of Redis
Enterprise (Redise
) technology, platform, products & services.
@itamarhaber, Technology Evangelisthello I am
3. Redis Streams
âą 1st class Redis citizens
âą An abstract data type that is not unlike a log
âą Designed with time series data in mind
âą Provide some "Kafkaesque" messaging abilities
This session is about
4. https://redis.io (emphasis for session context)
Redis is an open source (BSD licensed), in-memory data
structure store, used as a database, cache and message
broker. It supports data structures such as strings,
hashes, lists, sets, sorted sets with range queries,
bitmaps, hyperloglogs and geospatial indexes with
radius queries. Redis has built-in replication, Lua
scripting, LRU eviction, transactions and different levels
of on-disk persistence, and provides high availability via
Redis Sentinel and automatic partitioning with Redis
Cluster.
â
5. 1. REmote DIctionary Server
2. / rÉdÉȘs/, pronounced âred-issâ
3. OSS (BSD3), https://github.com/antirez/redis
4. In-memory, but with optional disk persistence
5. By Salvatore Sanfilippo @antirez circa 27/2/09
6. DSL4ADT: A Domain Specific Language (DSL) for
Abstract Data Types (ADT)
7. Designed for performance and simplicity
Redis is
6. Necessity is the mother of invention
There ain't no such thing as a free lunch
The existing (i.e. lists, sorted sets, PubSub) isn't
"good enough" for things like:
âą Log-like data patterns
âą At-least-once messaging with fan-out
And listpacks, radix trees & reading Kafka :)
Why invent yet another Redis thingamajig?
â
â
7. A storage abstraction that is:
âą Append-only, can be truncated
âą A sequence of records ordered by time
A Logical Log is:
âą Based on a logical offset, i.e. time (vs. bytes)
âą Therefore time range queries
âą Made up of in-memory data structures, naturally
The Log is (hardly a new thing)
8. A data stream is a sequence of elements. Consider:
âą Real time sensor readings, e.g. particle colliders
âą IoT, e.g. the irrigation of avocado groves
âą User activity in an application
âą âŠ
âą Messages in distributed systems
Logging streams of semi-structured data
9. A distributed system is a model in which
components located on networked computers
communicate and coordinate their actions by
passing messages
Distributed Computing, Wikipedia
Includes: client-server, 3/n-tier, peer to peer, SOA,
micro- & nanoservices, FaaS & serverlessâŠ
A side note about Distributed Systems
â
10. There are only two hard problems in
distributed systems:
2. Exactly-once delivery
1. Guaranteed order of messages
2. Exactly-once delivery
Mathias Verraes, on Twitter
An observation
â
11. Fact #1: you can choose one and only one:
âą At-most-once delivery, i.e. "shoot and forget"
âą At-least-once delivery, i.e. explicit ack
Fact #2: exactly-once delivery doesn't exist
Observation: order is usually important (duh)
Refresher on message delivery semantics
12. Consider the non-exhaustive list at taskqueues.com
âą 17 message brokers, including: Apache Kafka,
NATS, RabbitMQ and Redis
âą 17 queue solutions, including: Celery, Kue,
Laravel, Sidekiq, Resque and RQ <- all these use
Redis as their backend btw ;)
And that without considering protocol-based etc...
This isn't exactly a new challenge
13. Redis (in general and) Streams (in particular) are:
âą Everywhere, from the IoT's edge to the cloud
âą Blazing fast, massive throughput
âą Usable from all(most) languages and platforms
(IoT microcontrollers included)
Note: apropos IoT, they are great async buffers
So again, why "reinvent hot water"?
14. A stream is a sequence of entries (records). It:
âą Is "sharded" by key ("topic")
âą Has 1+ producers
âą Has 0+ consumers
âą Can provide at-most- or at-least-once semantics
âą Enables stream processing/real time pipelines
(as opposed to batch)
Redis Streams "formalism"
15. A picture of a stream
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Producer
Consumer 1
position
Consumer 2
position
Next entry
("*")
16. Every entry has a unique ID that is its logical offset.
The ID is in following format:
<epoch-milliseconds>-<sequence>
Note: each ID part is a 64-bit unsigned integer
An entry also has one or more ordered field-value
pairs, allowing for total abstraction (the empty
string is a valid field name, good for time series).
Entries in the Stream
17. Streamz Demo
Or how I'm graphing my laptop's CPU and battery temperatures
using only bash, iStats, redis-cli, redis-server, docker, grafana &
a browser
https://github.com/itamarhaber/streams-cpubattmp
21. # And the usual Redis goodness, e.g. TX
redis> MULTI
...
# Or server-side processing
redis> EVAL "return 'Lua Rocks!'" 0
...
# Or your own custom module
redis> MODULE LOAD <your-module-here>
OK
22. A consumer of a stream gets all entries in order,
and will eventually become a bottleneck.
Possible workarounds:
âą Add a "type" field to each record - that's dumb
âą Shard the stream to multiple keys - meh
âą Have the consumer dispatch entries as jobs in
queues⊠GOTO 10
The problem with scaling consumers
23. ⊠allow multiple consumers to cooperate in
processing messages arriving in a stream, so that
each consumers in a given group takes a subset
of the messages.
Shifts the complexity of recovering from consumer
failures and group management to the Redis server
Consumer Groups
â
24. We are here :)
âą Groups are named and are explicitly (!) created:
XGROUP CREATE temps agg $
âą Consumers are also named, and each gets only a
subset of the stream:
XREAD-GROUP GROUP agg CONSUMER
escher-01 STREAMS temps >
âą XACK/NOACK in XREAD, XCLAIM, XPENDING...
Group orientation
25. Presently OSS Redis Streams are:
âą Partially implemented
â Existing commands are relatively stable
â Some API corners still missing, e.g. XTRIM
â Consumer Groups are getting real
âą A part of the unstable branch
âą Expected to be GA as v5.0 during April 2018
Up to date status (Jan 26th)
26. âą From your browser: https://try.redis.io
âą Or download it: https://redis.io/download
âą Or clone it: https://github.com/antirez/redis
âą Or dockerize it: docker run -it redis
âą Or try Redis Enterprise by https://redislabs.com
Next, try Redis yourself!
27. âą The Redis Manifesto https://github.com/antirez/redis/blob/unstable/MANIFESTO
âą Salvatore's blog posts http://antirez.com/news/114 and http://antirez.com/news/116
âą Salvatore's Streams demo https://www.youtube.com/watch?v=ELDzy9lCFHQ
âą RCP 11 - The stream data type https://github.com/redis/redis-rcp/blob/master/RCP11.md
âą Reddit discussion
https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_structure_for_redis_lets
_design_it/
âą Hacker News discussion https://news.ycombinator.com/item?id=15384396
âą Consumer groups specification
https://gist.github.com/antirez/68e67f3251d10f026861be2d0fe0d2f4
âą Consumer groups API https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
& https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
âą Redis Streams and the Unified Log https://brandur.org/redis-streams
âą Introduction to Redis Streams
https://hackernoon.com/introduction-to-redis-streams-133f1c375cd3
References
28. Join us next month at
Redis Day Tel Aviv
Thank you,
Questions?