"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
MariaDB and Cassandra Interoperability
1. #CASSANDRA13
Colin Charles | colin@mariadb.org | SkySQL Ab | http://mariadb.org/
@bytebot on Twitter | http://bytebot.net/blog/
MariaDB and Cassandra Interoperability
2. #CASSANDRA13
whoami
*Work on MariaDB today
*Formerly of MySQL AB (acquired Sun Microsystems)
*Worked on The Fedora Project & OpenOffice.org previously
*Monty Program Ab is a major sponsor of MariaDB
*SkySQL & Monty Program Ab merge
*MariaDB governed by MariaDB Foundation
3. #CASSANDRA13
What we will discuss today...
*What is MariaDB?
*MariaDB Architecture
*The Cassandra Storage Engine (CassandraSE)
*Data & Command Mapping
*Use Cases
*Benchmarks
*Conclusions
4. #CASSANDRA13
What is MariaDB?
*Community developed, feature enhanced, backward compatible MySQL
*Drop-in replacement to MySQL
*Shipped in many Linux distributions as a default
*Enhanced features: threadpool, table elimination, optimizer changes
(subqueries materialize!), group commit in the replication binary log,
HandlerSocket, SphinxSE, multi-source replication, dynamic columns
7. #CASSANDRA13
Dynamic Columns
*Store a different set of columns for every row in the table
*Basically a blob with handling functions (GET, CREATE, ADD, DELETE,
EXISTS, LIST, JSON)
*Dynamic columns can be nested
*You can request rows in JSON format
*You can now name dynamic columns as well
INSERT INTO tbl SET
dyncol_blob=COLUMN_CREATE("column_name", "value");
8. #CASSANDRA13
Cassandra background
*Distributed key/value store (limited range scan support),
optionally flexible schema (pre-defined “static” columns,
ad-hoc dynamic columns), automatic sharding/
replication, eventual consistency
*Column families are like “tables”
*Row key -> column mapping
*Supercolumns are not supported
9. #CASSANDRA13
CQL at work
cqlsh> CREATE KEYSPACE mariadbtest
... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1};
cqlsh> use mariadbtest;
cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key, data1 varchar, data2 bigint ) with compactstorage;
cqlsh:mariadbtest> insert into cf1 (pk, data1,data2) values ('row1', 'data-in-cassandra', 1234);
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where pk='row1';
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where data2=1234;
Bad Request: No indexed columns present in by-columns clause with Equal operator
cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2';
Bad Request: line 1:34 missing EOF at 'or'
10. #CASSANDRA13
CQL
*Looks like SQL at first glance
*No joins or subqueries
*No GROUP BY, ORDER BY must be able to use available indexes
*WHERE clause must represent an index lookup
*Simple goal of the Cassandra Storage Engine? Provide a “view” of
Cassandra’s data from MariaDB
11. #CASSANDRA13
Getting started
*Get MariaDB 10.0.3 from https://downloads.mariadb.org/
*Load the Cassandra plugin
- From SQL:
MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so';
- Or start it from my.cnf
[mysqld]
...
plugin-load=ha_cassandra.so
12. #CASSANDRA13
Is everything ok?
*Check to see that it is loaded - SHOW PLUGINS
MariaDB [(none)]> show plugins;
+--------------------+--------+-----------------+-----------------+---------+
| Name | Status | Type | Library | License |
+--------------------+--------+-----------------+-----------------+---------+
...
| CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL |
+--------------------+--------+-----------------+-----------------+---------+
13. #CASSANDRA13
Create an SQL table which is a view of a column family
MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113';
MariaDB [test]> create table t2 (pk varchar(36) primary key,
-> data1 varchar(60),
-> data2 bigint
-> ) engine=cassandra
-> keyspace='mariadbtest'
-> thrift_host='10.196.2.113'
-> column_family='cf1';
*thrift_host can be set per-table
*@@cassandra_default_thrift_host allows to re-point the table to different node
dynamically, and not change table DDL when Cassandra IP changes
14. #CASSANDRA13
Potential issues
*SELinux/AuditD blocks the connection
ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1]
*Disable SELinux: echo 0 > /selinux/enforce | service auditd stop
*Cassandra 1.2 with Column Families (CFs) without “COMPACT
STORAGE” attribute (pre-CQL3)
ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in
keyspace mariadbtest
*Thrift based-clients no longer work, broke Pig as well (https://
issues.apache.org/jira/browse/CASSANDRA-5234); we’ll update this
soon
15. #CASSANDRA13
Accessing Cassandra data from MariaDB
*Get data from Cassandra
MariaDB [test]> select * from t2;
+------+-------------------+-------+
| pk | data1 | data2 |
+------+-------------------+-------+
| row1 | data-in-cassandra | 1234 |
+------+-------------------+-------+
*Insert data into Cassandra
MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123);
*Ensure Cassandra sees inserted data
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
row2 | data-from-mariadb | 123
16. #CASSANDRA13
Data mapping between Cassandra and SQL
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
17. #CASSANDRA13
Data mapping between Cassandra and SQL
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
*Table must have a primary key
- name/type must match Cassandra’s rowkey
18. #CASSANDRA13
Data mapping between Cassandra and SQL
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
*MariaDB table represents Cassandra’s Column Family
- can use any table name, column_family=... specifies CF
*Table must have a primary key
- name/type must match Cassandra’s rowkey
*Columns map to Cassandra’s static columns
- name must be same as in Cassandra, datatypes must match, can be subset of CF’s columns
19. #CASSANDRA13
Datatype mapping
Cassandra MariaDB
blob BLOB, VARBINARY(n)
ascii BLOB, VARCHAR(n), use charset=latin1
text BLOB, VARCHAR(n), use charset=utf8
varint VARBINARY(n)
int INT
bigint BIGINT, TINY, SHORT
uuid CHAR(36) (text in MariaDB)
timestamp TIMESTAMP (second), TIMESTAMP(6) (microsecond), BIGINT
boolean BOOL
float FLOAT
double DOUBLE
decimal VARBINARY(n)
counter BIGINT
20. #CASSANDRA13
Dynamic columns revisited
*Cassandra supports “dynamic column families”, can access ad-hoc
columns
create table tbl
(
rowkey type PRIMARY KEY
column1 type,
...
dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes
) engine=cassandra keyspace=... column_family=...;
insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2'));
select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;
21. #CASSANDRA13
All data mapping is safe
*CassandraSE will refuse incorrect mappings (throw errors)
create table t3 (pk varchar(60) primary key, no_such_field int)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra'
create table t3 (pk varchar(60) primary key, data1 double)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.UTF8Type'
23. #CASSANDRA13
SELECT command mapping
*MariaDB has a SQL interpreter
*CassandraSE supports lookups and scans
*Can now do:
- arbitrary WHERE clauses
- JOINs between Cassandra tables and MariaDB tables (BKA
supported)
24. #CASSANDRA13
Batched Key Access is fast!
select max(l_extendedprice) from orders, lineitem where
o_orderdate between $DATE1 and $DATE2 and
l_orderkey=o_orderkey
25. #CASSANDRA13
DML command mapping
*No SQL semantics
- INSERT overwrites rows
- UPDATE reads, then writes (have you updated what you read?)
- DELETE reads, then writes (can’t be sure if/what you’ve deleted)
*CassandraSE doesn’t make it SQL!
26. #CASSANDRA13
CassandraSE use cases
*Collect massive amounts of data like web page hits
*Collect massive amounts of data from sensors
*Updates are non-conflicting
- keyed by UUIDs, timestamps
*Reads are served with one lookup
*Good for certain kinds of data (though moving from SQL entirely may be
difficult)
27. #CASSANDRA13
Access Cassandra data from SQL
*Send an update to Cassandra
- be a sensor
*Get a piece of data from Cassandra
- This webpage was last viewed by...
- Last known position of this user was...
- You are user number n of n-thousands...
28. #CASSANDRA13
From MariaDB...
*Want a table that is:
- auto-replicated
- fault-tolerant
- very fast
*Get Cassandra and create a CassandraSE table
29. #CASSANDRA13
A possibly unique use
*MariaDB ships the CONNECT storage engine (XML, ODBC, etc.)
*You can CONNECT to Oracle (via ODBC), join results from Cassandra
(via CassandraSE) and have all your results sit in InnoDB
- yes, collaboration between Oracle, Cassandra and MariaDB is
possible today
*Remember to turn on engine condition pushdown
30. #CASSANDRA13
CassandraSE non-use cases
*Huge, sift through all data joins?
- use Pig
*Bulk data transfer to/from Cassandra Cluster?
- use Sqoop
*A replacement for InnoDB?
- remember no full SQL semantics, InnoDB is useful for myriad
reasons
37. #CASSANDRA13
Cassandra SE internals
*Developed against Cassandra 1.1
*Uses Thrift API
- cannot stream CQL resultset in 1.1
- cannot use secondary indexes
*Only supports AllowAllAuthenticator (Cassandra 1.2 has username/password authentication)
*In Cassandra 1.2
- “CQL Binary Protocol” with streaming
- CASSANDRA-5234: Thrift can only read CFs “WITH COMPACT STORAGE”
38. #CASSANDRA13
Running this on localhost
*Use vagrant, Ubuntu (12.04), DataStax Cassandra (1.1)
*http://julien.duponchelle.info/Cassandra-MariaDB-Virtual-Box.html
*Its nice to be able to run this locally, but beyond testing, there’s nothing
great from this
39. #CASSANDRA13
Really running this (on EC2)
*Use http://www.datastax.com/docs/1.2/install/install_ami
*minimum is m1.large instance
*--clustername MyCluster --totalnodes 1 --version community