About the How and Why of taking Lucene and Elasticsearch and turning it into a Relational Database.
Talk I gave at Search User Group Berlin September Meetup http://www.meetup.com/de/Search-UG-Berlin/events/224765731/
3. Crate.io
THe company
• Founded in 2013 in Dornbirn/Austria!
• Offices in Dornbirn, Berlin, San
Francisco!
• Team of 14 People (with and without
strong austrian dialect)!
• won Techrunch Disrupt startup
battlefield
4. SQL Database
TABLES
• Table == Tuple Store!
• Primary-Key -> Tuple!
• Index == B-Tree!
• allowing for equality and range
queries O(log(N))!
• sorting!
• Query Planner + Engine for LOCAL
query execution
5. LUCENE INDEX
• Inverted Index!
• equality queries!
• range queries!
• fulltext search with analyzed
queries!
• NO Sorting!
• Stored Fields!
• docValues / FieldCache
6. SQL TABLE
LUCENE INDEX
CREATE TABLE t (
id int primary key,
name string,
marks array(float),
text string index using fulltext
) clustered into 5 shards
with (number_of_replicas=1)
• 1 TABLE
• S shards, each with R replicas!
• metadata in cluster state!
• 1 SHARD
• 1 Lucene Index (inverted index +
stored documents/fields)!
• field mappings!
• field caches
7. SQL TABLE
LUCENE INDEX
Components
• Inverted Index!
• Translog (WAL)!
• “Tuple Store” - Stored Fields!
• Lucene Field Data !
• DocValues (on disk)
8. SQL TABLE
LUCENE INDICES
• Differences to Relational
Databases
• DISTRIBUTED!
• 2 different indices needed for all
operations!
• inverted index not suited for all
kinds of queries!
• persistence is expensive!
• limited schema altering
9. SQL TABLE
LUCENE INDICES
• Differences to Relational
Databases
• DISTRIBUTED!
• 2 different indices needed for all
operations!
• inverted index not suited for all
kinds of queries!
• persistence is expensive!
• limited schema altering!
• no pull based database cursor (yet)
10. Crate
Features
• Distributed SQL Database written in
Java (7)!
• accessible via HTTP & TCP (for java
clients only)!
• Graphical Admin Interface!
• Blob Storage!
• Plugin Infrastructure!
• Clients available in Java, Python, Ruby,
PHP, Scala, Node.js, Erlang!
• Runs on Docker, AWS, GCE, Mesos, …
12. Crate
SQL
• subset of ANSI SQL with extensions!
• arrays and nested objects!
• different types!
• Information Schema!
• Cluster and Node State exposed via
tables !
• Partitioned Tables!
• speaking JDBC, ODBC, SQLAlchemy,
Activerecord, PHP-PDO/DBAL
14. Crate
SQL … NOT
• JOINS underway!
• no subselects, foreign key
constraints yet!
• no sessions, no client cursor!
• no transactions
15. CRATE -
A RELATIONAL DATABASE
• Relational Algebra
• SQL statement!
• Tree of Relational Operators!
• Mostly Tables == Leaves!
• ES!
• Single table operations only!
!
• No simple SQL wrapper around ES
Query DSL
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
16. Querying crate
• Query Engine
• node based query execution!
• directly to Lucene indices!
• circumventing ES query execution
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
17. SQL TABLE
LUCENE INDEX
INSERT INTO t (id, name, marks,
text)
VALUES (
42,
format(‘%d - %s’, 42, ‘Zaphod’),
[1.5, 4.6],
‘this is a quite full text!’)
ON DUPLICATE KEY UPDATE
name=‘DUPLICATE’;
• INSERT INTO
• insert values are validated by their
configured types!
• types are guessed for new
columns!
• primary key and routing values
extracted!
• JSON _source is created: !
• {“id”: 42 “name”: “42 - Zaphod”,
“marks”: [1.5, 4.6], “text”:”this is
a quite full text!”}
18. SQL TABLE
LUCENE INDEX
{
“id”: 42
“name”: “42 - Zaphod”,
“marks”: [1.5, 4.6],
“text”: “this is a quite full text!”
}
• INSERT INTO
• request is routed by “id” column to
node containing shard!
• row stored on shard
20. Querying crate
• Sorting and Grouping
• inverted index not enough!
• per document values (DocValues)
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
21. Querying crate
• “Simple” SELECT - QTF
• Extract Fields to SELECT!
• Route to shards / Lucene Indices!
• Open and keep Lucene Reader in query
context!
• Only collect Doc/Row identifier (and all
necessary fields for sorting)!
• merge separate results on handler!
• apply limit/offset!
• fetch all fields!
• evaluate expressions!
• return Object[][]
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
22. Querying crate
• INTERNAL PAGING
• Problems with big result sets /
high offsets!
• Need to fetch LIMIT + OFFSET
from every shard!
• Execution starts at TOP Relation!
• trickles down to tables (Lucene
Indices)!
• Hybrid of push and pull based
data flow
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 1
OFFSET 10000000;
23. Querying crate
• GROUP BY - AGGREGATIONS
• Aggregation Framework developed
parallel to Elasticsearch aggregations!
• ES - 2 phase aggregations
(HyperLogLog, Moving Averages,
Percentiles …)!
• online algorithms on partial data
(mergeable) necessary!
• https://github.com/elastic/
elasticsearch/issues/4915
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
24. Querying crate
• GROUP BY - AGGREGATIONS
• split in 3 phases!
• partial aggregation executed on
each shard in parallel!
• partial result distributed to
“Reduce” nodes by hashing the
group keys!
• final aggregation on handler/
reducer!
• merge on handler
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
25. Querying crate
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
[1,2,2]
[2,3,7]
[4,9,42]
[1, 7, 9]
[2,3,4]
6
[1]
[2]
3
3
Shards
REDUCER
HANDLER
26. Querying crate
• GROUP BY - AGGREGATIONS
• Row Authority by hashing!
• split huge datasets!
• expensive intermediate
aggregation states possible
(COUNT DISTINCT)
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
27. FINALLY
• GETTING RELATIONAL…
• still in transition!
• more relational operators to come!
• JOINs are underway!
• CROSS JOINS already “work”