One of the most popular use cases for Apache Druid is building data applications. Data applications exist to deliver data into the hands of everyone on a team in a business, and are used by these teams to make faster, better decisions. To fulfill this role, they need to support granular drill down, because the devil is in the details, but also be extremely fast, because otherwise people won't use them!
In this talk, Gian Merlino will cover:
*The unique technical challenges of powering data-driven applications
*What attributes of Druid make it a good platform for data applications
*Some real-world data applications powered by Druid
5. Druid in the wild
5
100+ billion rows/day
1+ trillion rows, 1+ year retained
100s of servers
sub-second to few seconds query latency
mix of streaming and batch ingest
6. Online DB pattern
6
Data lakeData
source
Data mover
(Apache Kafka,
Apache Airflow, etc)
Data
source
Data
source
Query
engine
Query
engine
Query
engine
Pure
storage
Data lake
Data lake
Direct to lake
Online DB
Stores an
optimized
copy
Online
app
7. 7
● Scale-out, fault-tolerant
architecture
● No downtime for software updates
● No downtime for data
management
● Heavily optimized storage format
● Integrated storage format and
query engine
Druid as an online DB
8. Data apps
● Interactive query speeds
● Always online
● Fresh data from streams
● Quality of service
● Price/performance
8
9. Interactive query speeds
9
Secondary indexes
Operate on
compressed data Late materializationCompression
INDEX
[0,1,2](11100000)
[3,4] (00011000)
[5,6,7](0000111)
DATA
0
0
0
1
1
2
2
2
DICT
DC = 0
LA = 1
SF = 2
12. Fresh data from streams
12
Coordinator
Apache
ZooKeeper
Master server
Historical Indexer Historical Indexer
Data server
Deep storage
Broker
Query server
Streaming
data
Batch
data
15. Price/performance
Data sourced from: Correia, José & Costa, Carlos & Santos, Maribel. (2019). Challenging SQL-on-Hadoop Performance with Apache Druid.
vs. leading open source SQL engines
18. Stay in touch
18
@druidio
Join the community
(Mailing lists, Slack, meetups)
https://druid.apache.org/community/
Follow the Druid project on Twitter!
19. Time for questions
@gianmerlino
19
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.