3. SkySQL
•Leading provider of open source
databases, services and
solutions
•Home for the founders and the
original developers of the core
of MySQL
•The creators of MariaDB, the
drop-off, innovative
replacement of MySQL
Friday, 3 May 13
4. What is Big Data?
http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpg
Friday, 3 May 13
5. PAGE
Big Data!
Big data is a collection of data
sets so large and complex that it
becomes difficult to process
using on-hand database
management tools or traditional
data processing applications.
5
http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpg
Friday, 3 May 13
6. PAGE
Big Data By Structure
6
Unstructured
•Store everything you have/you find
•In any format and shape
•You do not know how to use it, but it may
come handy
•Storing unstructured data is usually cheaper than
storing it in a more structured datastore
•Does not fit well in a relational database
•Examples:
•Text: Plain text, documents, web content,
messages
•Bitmap: Image, audio, video
•Typical approach:
•Mining, pattern recognition, tagging
•Usually batch analysis
Structured
•Store only what you need
•In a good format, ready to be used
•You should already know how to use it, or at
least what it means
•Storing structured data is quite expensive
•Raw data, indexing, denormalisation,
aggregation
•Arelational database is still the best choice
•Examples:
•Machine-Generated Data (MGD)
•Tags, counters, sales
•Typical approach:
•BI tools, reporting
•Real time analysis change data capture
Friday, 3 May 13
7. PAGE
Unstructured
•Store everything you have/you find
•In any format and shape
•You do not know how to use it, but it may
come handy
•Storing unstructured data is usually cheaper than
storing it in a more structured datastore
•Does not fit well in a relational database
•Examples:
•Text: Plain text, documents, web content,
messages
•Bitmap: Image, audio, video
•Typical approach:
•Mining, pattern recognition, tagging
•Usually batch analysis
Structured
•Store only what you need
•In a good format, ready to be used
•You should already know how to use it, or at
least what it means
•Storing structured data is quite expensive
•Raw data, indexing, denormalisation,
aggregation
•Arelational database is still the best choice
•Examples:
•Machine-Generated Data (MGD)
•Tags, counters, sales
•Typical approach:
•BI tools, reporting
•Real time analysis change data capture
Big Data By Structure
7
Friday, 3 May 13
8. PAGE
How “Big” is Big Data?
•Data Factors
•Size
•Speed to collect/
generate
•Variety
•Resources
•Administrators
•Developers
•Infrastructure
•Growth
•Collection
•Processing
•Availability
•To whom?
•For how long?
•In which format?
•Aggregated
•Detailed
8
Friday, 3 May 13
9. PAGE
How to manage Big Data
•Collection - Storage -Archive
•Load - Transform -Analyze
•Access - Explore - Utilize
9
http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-it
Friday, 3 May 13
10. Big Data with MySQL
http://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpg
Friday, 3 May 13
14. PAGE
MyISAM
•Static, dynamic and compressed
format
•Multiple key cache, CACHE INDEX
and LOAD INDEX
•Compressed tables
•Horizontal partitioning (manual)
•External locking
14
Friday, 3 May 13
15. PAGE
InnoDB/XtraDB
•Data Load
•Pre-order data
•Split data into chunks
•unique_checks = 0;
•foreign_key_checks = 0;
•sql_log_bin = 0;
•innodb_autoinc_lock_mode = 2;
•Compression and block size
•Persistent optimizer stats
•innodb_stats_persistent
•innodb_stats_auto_recalc
15
SET GLOBAL innodb_file_per_table = 1;
SET GLOBAL innodb_file_format = Barracuda;
CREATE TABLE t1
( c1 INT PRIMARY KEY,
c2 VARCHAR(255) )
ROW_FORMAT = COMPRESSED
KEY_BLOCK_SIZE = 8;
LOAD DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t1;
Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec)
Records: 134217728 Deleted: 0 Skipped: 0 Warnings: 0
LOAD DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t2;
Query OK, 134217728 rows affected (25 min 20.75 sec)
Records: 134217728 Deleted: 0 Skipped: 0 Warnings: 0
Friday, 3 May 13
16. PAGE
Partitioning (MySQL 5.6)
•Partitioning Types
•RANGE, LIST, RANGE COLUMN,
HASH, LINEAR HASH, KEY LINEAR
KEY, sub-partitions
•Partition and lock pruning
•Use of INDEX and DATA
DIRECTORY
•PARTITIONADD, DROP,
REORGANIZE, COALESCE,
TRUNCATE, EXCHANGE,
REBUILD, OPTIMIZE, CHECK,
ANALYZE, REPAIR
16
CREATE TABLE t1 ( c1 INT, c2 DATE )
PARTITION BY RANGE( YEAR( c2 ) )
SUBPARTITION BY HASH ( TO_DAYS( c2 ) )
( PARTITION p0 VALUES LESS THAN (1990) (
SUBPARTITION s0
DATA DIRECTORY = '/disk0/data'
INDEX DIRECTORY = '/disk0/idx',
SUBPARTITION s1
DATA DIRECTORY = '/disk1/data'
INDEX DIRECTORY = '/disk1/idx' ),...
ALTER TABLE t1
EXCHANGE PARTITION p3 WITH TABLE t2;
-- Range and List partitions
ALTER TABLE t1 REORGANIZE PARTITION
p0,p1,p2,p3 INTO (
PARTITION m0 VALUES LESS THAN (1980),
PARTITION m1 VALUES LESS THAN (2000));
-- Hash and Key partitions
ALTER TABLE t1 COALESCE PARTITION 10;
ALTER TABLE t1 ADD PARTITION PARTITIONS 5;
Friday, 3 May 13
17. PAGE
MariaDB Optimizer
•Multi-Range Read (MRR)*
•Index Merge / Sort intersection
•Batch KeyAccess*
•Block hash join
•Cost-based choice of range vs.
index_merge
•ORDER BY ... LIMIT <limit>*
•MariaDB 10
•Subqueries
•Semi-join*
•Materialization*
•subquery cache
•LIMIT ... ROWS EXAMINED
<limit>
17
(*) - Available in MySQL 5.6
Friday, 3 May 13
18. PAGE
Virtual & Dynamic Columns
VIRTUAL COLUMNS
•For InnoDB, MyISAM andAria
•PERSISTENT (stored) or VIRTUAL
(generated)
18
CREATE TABLE t1 (
c1 INT NOT NULL,
c2 VARCHAR(32),
c3 INT AS
( c1 MOD 10 ) VIRTUAL,
c4 VARCHAR(5) AS
( LEFT(B,5) ) PERSISTENT);
DYNAMIC COLUMNS
•Implement a schemaless,
document store
•COLUMN_ CREATE,ADD, GET, LIST,
JSON, EXISTS, CHECK, DELETE
•Nested colums are allowed
•Main datatypes are allowed
•Max 1GB documents
CREATE TABLE assets (
item_name VARCHAR(32) PRIMARY KEY,
dynamic_cols BLOB );
INSERT INTO assets VALUES (
'MariaDB T-shirt',
COLUMN_CREATE( 'color', 'blue',
'size', 'XL' ) );
INSERT INTO assets VALUES (
'Thinkpad Laptop',
COLUMN_CREATE( 'color', 'black',
'price', 500 ) );
Friday, 3 May 13
19. PAGE
Cassandra Storage Engine
•Column Family == Table
•Rowkey, static and dynamic
columns allowed
•Batch key access support
SET cassandra_default_thrift_host =
'192.168.0.10'
CREATE TABLE cassandra_tbl (
rowkey INT PRIMARY KEY,
col1 VARCHAR(25),
col2 BIGINT,
dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes )
ENGINE = cassandra
KEYSPACE = 'cassandra_key_space'
COLUMN_FAMILY = 'column_family_name';
19
Friday, 3 May 13
20. PAGE
Connect Storage Engine
•Any file format as MySQLTABLE:
•ODBC
•Text, XML, *ML
•Excel,Access etc.
•MariaDB CREATE TABLE options
•Multi-file table
•TableAutocreation
•Condition push down
•Read/Write and Multi Storage Engine Join
•CREATE INDEX
20
CREATE TABLE handout
ENGINE = CONNECT
TABLE_TYPE = XML
FILE_NAME = 'handout.htm'
HEADER = yes OPTION_LIST =
'name = TABLE,
coltype = HTML,
attribute =
(border=1;cellpadding=5)';
Friday, 3 May 13
25. PAGE
Join us at the Solutions Day
•Cassandra and Connect Storage Engine
•Map/Reduce approach - Proxy optimisation
•Multiple protocols and more
25
Friday, 3 May 13