Implementing Highly Performant Distributed Aggregates

Brought to you by
Implementing Highly
Performant Distributed
Aggregates
Michał Jadwiszczak
Junior Software Engineer at ScyllaDB

Why distribute aggregates?
How are aggregate queries executed currently?
■ Coordinator downloads all required data from other nodes
■ Then locally aggregates the whole table
Reasons to do the distribution
■ This is slow and very ineﬃcient
■ The workload is accumulated on a single machine

Sequential aggregate
SELECT COUNT(*) FROM keyspace.table
p0..p1
p2..p3
p4..p5
p6..p7

Get data
p0..p1
p2..p3
p4..p5
p6..p7

p0..p1
p2..p3
p4..p5
p6..p7
Return(p2..p3)
Return(p4..p5)
Return(p6..p7)

COUNT( )
p0..p1
p2..p3
p4..p5
p6..p7
Return(p2..p3)
Return(p4..p5)
Return(p6..p7)

Distributed aggregate
p0..p1
p2..p3
p4..p5
p6..p7

SELECT
COUNT(p2..p3)
p0..p1
p4..p5
p6..p7
SELECT
COUNT(p4..p5)
SELECT
COUNT(p6..p7)
p2..p3

Return c1
Return c2
Return c3
COUNT( )
p2..p3
COUNT( )
p4..p5
COUNT( )
p6..p7
COUNT( )
p0..p1

REDUCE( )
c0
c1
c2
c3

Distributing the aggregates
The parallelization of aggregates works in a pretty straightforward way
■ Coordinator which receives the query turns into super-coordinator
■ Super-coordinator splits the query into subqueries for other nodes
● Each node receive query only for data it holds
● Each partition range exists in only one subquery to avoid data duplication
■ Each participating node executes its subquery and returns a partial result
● Partial result is a state of an aggregate without ﬁnalizing it
■ Super-coordinator receives the results, reduces them and ﬁnalizes

Distributing aggregates
Scylla not only distributed the work at the cluster level. The nodes also distribute the query to
their shards. Each shard again gets a query to only its data, so it doesn’t have to
communicate with other ones.
All native aggregates have their distributed implementation.
Scylla uses them by default without user need to think about it.
Native aggregates:
■ COUNT
■ SUM
■ AVG
■ MAX
■ MIN

User-Defined Aggregates
UDA consists of:
■ Required fields
● State function
● State type
■ Optional fields
● Final function
● Initial condition
CREATE AGGREGATE my_aggregate(int)
SFUNC state_f
STYPE tuple<int, bigint>
FINALFUNC final_f
INITCOND (0, 0);

Distributed User-Defined Aggregates
UDA consists of:
■ Required fields
● State function
● State type
■ Optional fields
● Reduce function
● Final function
● Initial condition
CREATE AGGREGATE my_aggregate(int)
SFUNC state_f
STYPE tuple<int, bigint>
REDUCEFUNC reduce_f
FINALFUNC final_f
INITCOND (0, 0);

Example of UDA
How many people follows Elon
Musk on Twitter?
Operations:
■ full scan table
■ iterate over each list
CREATE TABLE twitter_users (
id bigint,
name text,
follows list<bigint>,
PRIMARY KEY(id)
)

Example of UDA
CREATE FUNCTION count_if_follows_elon(acc bigint, follows list<bigint>)
RETURNS NULL ON NULL INPUT
RETURNS bigint
LANGUAGE lua
AS $$
for _, f in ipairs(follows) do
if f == 100 then
return acc + 1
end
end
return acc
$$
CREATE FUNCTION reduce_followers_count(acc1 bigint, acc2 bigint)
RETURNS NULL ON NULL INPUT
RETURNS bigint
LANGUAGE lua
AS $$
return acc1 + acc2
$$

Example of UDA
CREATE AGGREGATE count_elon_followers(list<bigint>)
SFUNC count_if_follows_elon
STYPE bigint
REDUCEFUNC reduce_followers_count
INITCOND 0

Reducing the results
■ Each coordinator reduces results from its shards and sends the result to
super-coordinator
■ Super-coordinator is responsible for reducing partial results from other
coordinators
■ Reduction is done be exciting proper function
■ All of the partial results are stored in memory (for now!)

What if the subquery fails?
Parallelization service provides retrying system in case of subquery failure.
■ When the subquery fails
● Super-coordinator acts as a its coordinator …
● and executes it
This way, the query is more likely to succeed, for instance by selecting another,
available replica.

Limitations
Main limitations are only queries
without WHERE and GROUP BY
clauses can be distributed.
Because select statement can’t be
serialized, we are representing it as
forward_request.
struct forward_request {
struct aggregation_info {
db::functions::function_name name;
std::vector<sstring> column_names;
};
query::read_command cmd;
dht::partition_range_vector pr;
db::consistency_level cl;
lowres_clock::time_point timeout;
std::vector<aggregation_info> aggregation_infos;
};

Benchmarks
Speciﬁcation:
■ 3 nodes cluster on AWS
■ Instances: i3.4xlarge (16 vCPUs, memory 112 GiB)
■ Each result is average of 10 queries
■ Queries executed one by one with “BYPASS CACHE” option
Data:
■ 50 millions rows
■ “Twitter accounts” table

Benchmarks - Native aggregates
Aggregation COUNT(*) SUM(id) MIN(id)
Sequential 408s 411s 407s
Distributed 23s 24s 23s

Benchmarks - User-Deﬁned Aggregates
Aggregation count_elon_followers(follows)
Sequential 1450s
Distributed 78s

Benchmarks - CPU Usage
Sequential aggregate Distributed aggregate

Benchmarks - Transmitted packets
Sequential aggregate Distributed aggregate

Brought to you by
Michał Jadwiszczak
michal.jadwiszczak@scylladb.com

Implementing Highly Performant Distributed Aggregates

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Implementing Highly Performant Distributed Aggregates

Semelhante a Implementing Highly Performant Distributed Aggregates (20)

Mais de ScyllaDB

Mais de ScyllaDB (20)

Último

Último (20)

Implementing Highly Performant Distributed Aggregates