6. We have a few tricks.
●
What if we had a table that recorded 1 row per
client that tracked all the counts of transactions
for each client?
id
client_id
count_transactions
1
131
34406416
2
132
10587625
3
133
85095
What if we wired this table up to a SQL parser?
7. Mondrian!
●
●
●
●
Robust aggregate table interface
Auto recognizes aggregate tables via naming
convention
Queries are directed to the correct table
If aggregate tables are missed, fall back to fact
table
●
Can define multiple aggregate tables / fact table
●
Also has an intelligent segment cache
8. But theres a problem.
●
●
SELECT count(distinct(user_id) FROM transactions
Aggregate tables rely on properties of addition
operations
●
distinct(set_1) + distinct(set_2) != distinct(set_1 + set_2)
●
We have no choice but to query our fact table.
11. Much Cluster, Many Computer
●
●
●
All of these solutions are using distributed
systems to query lots of data quickly
Querying 100 million rows on a single computer
is not fast on current hardware
And we are projecting to have a lot more than
100 million rows this year
16. Nope!
●
●
●
●
●
Vertica has no indexes
Vertica has “projections” which are similar to a
materialized view
Projections are transparent to the query (like an
index)
Projections are used to optimize JOIN, GROUP
BY, and other sorts of queries
Provides a tool to autobuild projections based
on query analysis
17. Tradeoffs
Columnar
Row Based
●
Slow single row read
●
Fast single row read
●
Slow single row write
●
Fast single row write
●
Fast aggreagtes
●
Slow aggregates
●
Compression (5-10x)
●
No compression
18. Distributed
●
Data split among servers
●
Horizontal scaling
●
Data is compressed, so its stored in memory
●
Node failure is tolerated
●
Network IO is important
20. So you just drop it in, right?
●
6 or 7 gems needed updates
●
Had to roll an activerecord driver
●
AreL saved us from a lot of pain
●
●
●
Still some SQL problems (database drop,
multirow insert)
Lots of DevOps help needed
Currently deployed to sand and qa, hitting
production soon!