SlideShare a Scribd company logo
1 of 38
Download to read offline
Real-Time Loading from
MySQL to Hadoop
Featuring Continuent Tungsten
Robert Hodges, CEO
©Continuent 2014
Introducing Continuent

©Continuent 2014

2
Introducing Continuent

•

The leading provider of clustering and
replication for open source DBMS

•

Our Product: Continuent Tungsten

• Clustering - Commercial-grade HA, performance
scaling and data management for MySQL

• Replication - Flexible, high-performance data
movement

©Continuent 2014

3
Quick Continuent Facts

•

Largest Tungsten installation processes over
700 million transactions daily on 225
terabytes of data

•

Tungsten Replicator was application of the
year at the 2011 MySQL User Conference

•

Wide variety of topologies including MySQL,
Oracle, Vertica, and MongoDB are in
production now

•

MySQL to Hadoop deployments are now in
progress with multiple customers

©Continuent 2014

4
Selected Continuent Customers

23

©Continuent 2014

5
Five Minute Hadoop
Introduction

©Continuent 2014

6
What Is Hadoop, Exactly?

a.A distributed file system
b.A method of processing massive quantities
of data in parallel

c.The Cutting family’s stuffed elephant
d.All of the above

©Continuent 2014

7
Hadoop Distributed File System
hadoop	

command

Find 	

file

NameNode	

(directory)

Java	

Client

Hive
Read	

block(s)

Pig

DataNodes (replicated data)
©Continuent 2014

8
Map/Reduce
Acme,2013,4.75!
Spitze,2013,25.00!
Acme,2013,55.25!
Excelsior,2013,1.00!
Spitze,2013,5.00

Acme,60.00!
Excelsior,1.00!
Spitze,30.00

MAP

REDUCE
Spitze,2014,60.00!
Spitze,2014,9.50!
Acme,2014,1.00!
Acme,2014,4.00!
Excelsior,2014,1.00!
Excelsior,2014,9.00

©Continuent 2014

Acme,5.00!
Excelsior,10.00!
Spitze,69.50

MAP

9

Acme,65.00!
Excelsior,11.00!
Spitze,99.50
Typical MySQL to Hadoop Use Case
Hive	

(Analytics)

Initial Load?
Changes?
Materialized 	

views?
App changes?
Transaction
Processing

©Continuent 2014

App load?
Latency?
10

Hadoop
Cluster
Options for Loading Data

Manual	

Loading

Sqoop
CSV	

Files

©Continuent 2014

Tungsten	

Replicator
Sqoop

11
Comparing Methods in Detail
Manual via
CSV
Process
Incremental
Loading
Latency

Sqoop

Tungsten
Replicator

Manual/
Scripted

Manual/
Scripted

Fully
automated

Possible with Requires DDL
DDL changes
changes
Full-load

Intermittent

Fully
supported
Real-time

Extraction
Full and partial Low-impact
Full table scan
table scans
binlog scan
Requirements
©Continuent 2014

12
Replicating MySQL Data
to Hadoop using
Tungsten Replicator

©Continuent 2014

13
What is Tungsten Replicator?
A real-time,
high-performance,
open source database
replication engine
!

GPL V2 license - 100% open source	

Download from https://code.google.com/p/tungsten-replicator/	

Annual support subscription available from Continuent

“Golden Gate® without the Price Tag”
©Continuent 2014

14
Tungsten Replicator Overview
Master
Replicator

Extract
transactions
from log
DBMS	

Logs

(Transactions + Metadata)

Slave

Replicator

THL
(Transactions + Metadata)

Apply

©Continuent 2014

THL

15
Tungsten Replicator 3.0  Hadoop

•
•

Extract from MySQL or Oracle

•
•
•
•
•

Provision using Sqoop or parallel extraction

©Continuent 2014

Base Hadoop plus commercial distributions:
Cloudera and HortonWorks

Automatic replication of incremental changes
Transformation to preferred HDFS formats
Schema generation for Hive
Tools for generating materialized views
16
Basic MySQL to Hadoop Replication
Access via Hive
MySQL

binlog_format=row

Tungsten Slave
Replicator

hadoop

MySQL	

Binlog

Tungsten Master
Replicator

hadoop

Master-Side Filtering	

* pkey - Fill in pkey info	

* colnames - Fill in names	

* cdc - Add update type and
schema/table info	

* source - Add source DBMS	

* replicate - Subset tables to
be replicated

Load raw CSV to HDFS
(e.g., via LOAD DATA to
Hive)

Extract from
MySQL binlog

©Continuent 2014

CSV	

CSV	

CSV	

CSV	

Files
CSV	

Files
Files
Files
Files

17

Hadoop	

Cluster
Hadoop Data Loading - Gory Details
(Generate
Table
Definitions)
Replicator
Transactions
from master

Base Tables
Base Tables
Materialized Views

hadoop
Write data
to CSV

CSV	

CSV	

CSV	

Files
Files
Files
Javascript load
script	

e.g. hadoop.js

©Continuent 2014

Staging	

Staging	

Staging
Tables
Tables
“Tables”

Load using
hadoop
command

(Generate
Table
Definitions)

18

(Run Map/
Reduce)
Demo #1
!

Replicating sysbench data

©Continuent 2014

19
Viewing MySQL Data
in Hadoop

©Continuent 2014

20
Generating Staging Table Schema
$ ddlscan -template ddl-mysql-hive-0.10-staging.vm !
-user tungsten -pass secret !
-url jdbc:mysql:thin://logos1:3306/db01 -db db01!
...!
DROP TABLE IF EXISTS db01.stage_xxx_sbtest;!
!
CREATE EXTERNAL TABLE db01.stage_xxx_sbtest!
(!
tungsten_opcode STRING ,!
tungsten_seqno INT ,!
tungsten_row_id INT ,!
id INT ,!
k INT ,!
c STRING ,!
pad STRING)!
ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' ESCAPED BY ''!
LINES TERMINATED BY 'n'!
STORED AS TEXTFILE LOCATION '/user/tungsten/staging/db01/sbtest';
©Continuent 2014

21
Generating Base Table Schema
$ ddlscan -template ddl-mysql-hive-0.10.vm -user tungsten !
-pass secret -url jdbc:mysql:thin://logos1:3306/db01 -db db01!
...!
DROP TABLE IF EXISTS db01.sbtest;!
!
CREATE TABLE db01.sbtest!
(!
id INT ,!
k INT ,!
c STRING ,!
pad STRING )!
;!

©Continuent 2014

22
Creating a Materialized View in Theory
Log #1

Log #2

...

Log #N

MAP	

Sort by key(s), transaction order

REDUCE	

Emit last row per key if not a delete
©Continuent 2014

23
Creating a Materialized View in Hive
$ hive!
...!
hive ADD FILE /home/rhodges/github/continuent-tools-hadoop/bin/
tungsten-reduce;!
hive FROM ( !
SELECT sbx.*!
FROM db01.stage_xxx_sbtest sbx!
DISTRIBUTE BY id !
SORT BY id,tungsten_seqno,tungsten_row_id!
) map1!
INSERT OVERWRITE TABLE db01.sbtest!
SELECT TRANSFORM(!
tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad)!
USING 'perl tungsten-reduce -k id -c
tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad'!
AS id INT,k INT,c STRING,pad STRING;!
...

MAP

REDUCE

©Continuent 2014

24
Comparing MySQL and Hadoop Data
$ export TUNGSTEN_EXT_LIBS=/usr/lib/hive/lib!
...!
$ /opt/continuent/tungsten/bristlecone/bin/dc !
-url1 jdbc:mysql:thin://logos1:3306/db01 !
-user1 tungsten -password1 secret !
-url2 jdbc:hive2://localhost:10000 !
-user2 'tungsten' -password2 'secret' -schema db01 !
-table sbtest -verbose -keys id !
-driver org.apache.hive.jdbc.HiveDriver!
22:33:08,093 INFO DC - Data comparison utility!
...!
22:33:24,526 INFO Tables compare OK!

©Continuent 2014

25
Doing it all at once
$ git clone !
https://github.com/continuent/continuent-toolshadoop.git!
!

$ cd continuent-tools-hadoop!
!

$ bin/load-reduce-check !
-U jdbc:mysql:thin://logos1:3306/db01 !
-s db01 --verbose

©Continuent 2014

26
Demo #2
!

Constructing and Checking a
Materialized View

©Continuent 2014

27
Scaling It Up!

©Continuent 2014

28
MySQL to Hadoop Fan-In Architecture
Masters

Slaves
Replicator
m1 (master)
RBR

Replicator
Replicator

m1 (slave)

m2 (master)

m2 (slave)
m3 (slave)

RBR
Replicator
m3 (master)
RBR
©Continuent 2014

29

Hadoop	

Cluster	

(many nodes)
Integration with Provisioning
MySQL

Access via Hive

(Initial provisioning run)

Sqoop/ETL

Tungsten Master
MySQL	

Binlog

Tungsten Slave

hadoop

hadoop

binlog_format=row

©Continuent 2014

CSV	

CSV	

CSV	

CSV	

Files
CSV	

Files
Files
Files
Files

30

Hadoop	

Cluster
On-Demand Provisioning via Parallel
Extract
Access via Hive
MySQL

binlog_format=row

Tungsten Slave
Replicator

hadoop

MySQL	

Binlog

Tungsten Master
Replicator

hadoop

Master-Side Filtering	

* pkey - Fill in pkey info	

* colnames - Fill in names	

* cdc - Add update type and
schema/table info	

* source - Add source DBMS	

* replicate - Subset tables to
be replicated	

(other filters as needed)	


Extract from
MySQL tables

©Continuent 2014

CSV	

CSV	

CSV	

CSV	

Files
CSV	

Files
Files
Files
Files
Load raw CSV to HDFS
(e.g., via LOAD DATA to
Hive)

31

Hadoop	

Cluster
Tungsten Replicator Roadmap

•
•
•

Parallel CSV file loading

•
•

Replication out of Hadoop

©Continuent 2014

Partition loaded data by commit time
Data formats and tools to support additional
Hadoop clients as well as HBase

Integration with emerging real-time analytics
based on HDFS (Impala, Spark/Shark,
Stinger,...)

32
Getting Started with
Continuent Tungsten

©Continuent 2014

33
Where Is Everything?

•

Tungsten Replicator 3.0 builds are now available on
code.google.com
http://code.google.com/p/tungsten-replicator/

•

Replicator 3.0 documentation is available on
Continuent website
http://docs.continuent.com/tungsten-replicator-3.0/
deployment-hadoop.html

•

Tungsten Hadoop tools are available on GitHub
https://github.com/continuent/continuent-tools-hadoop

Contact Continuent for support
©Continuent 2014

34
Commercial Terms

•
•

•

Replicator features are open source (GPL V2)
Investment Elements

•
•
•

POC / Development (Walk Away Option)
Production Deployment
Annual Support Subscription

Governing Principles

•
•

©Continuent 2014

Annual Subscription Required
More Upfront Investment - Less Annual Subscription

35
We Do Clustering Too!
GonzoPortal.com

Tungsten clusters combine offthe-shelf open source MySQL
servers into data services with:

apache
/php

!

• 24x7 data access
• Scaling of load on replicas
• Simple management commands
!

...without app changes or data
migration
Amazon
US West
©Continuent 2014

36

Connector

Connector
In Conclusion: Tungsten Offers...

•

Fully automated, real-time replication from MySQL
into Hadoop

•

Support for automatic transformation to HDFS data
formats and creation of full materialized views

•

Positions users to take advantage of evolving realtime features in Hadoop

©Continuent 2014

37
560 S. Winchester Blvd., Suite 500
San Jose, CA 95128
Tel +1 (866) 998-3642
Fax +1 (408) 668-1009
e-mail: sales@continuent.com

Our Blogs:
http://scale-out-blog.blogspot.com
http://mcslp.wordpress.com
http://www.continuent.com/news/blogs

Continuent Web Page:	

http://www.continuent.com	

!

Tungsten Replicator 2.0:	

http://code.google.com/p/tungsten-replicator	

©Continuent 2014

More Related Content

What's hot

Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsContinuent
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Set Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationSet Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationContinuent
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11Hortonworks
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Guy Harrison
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Geographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersGeographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersContinuent
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirContinuent
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 

What's hot (20)

Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
Set Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationSet Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle Replication
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Geographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersGeographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL Clusters
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud Air
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 

Similar to Real-Time Loading from MySQL to Hadoop with Continuent Tungsten

Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftContinuent
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopDataWorks Summit
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Continuent
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Continuent
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseSankar H
 
Hadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High AvailabilityHadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High AvailabilityAlex Dorman
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 

Similar to Real-Time Loading from MySQL to Hadoop with Continuent Tungsten (20)

Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Hadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High AvailabilityHadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High Availability
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
מיכאל
מיכאלמיכאל
מיכאל
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 

More from Continuent

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondContinuent
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraContinuent
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Continuent
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Continuent
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverContinuent
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Continuent
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardContinuent
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaContinuent
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesContinuent
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterContinuent
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMIContinuent
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMIContinuent
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProContinuent
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingContinuent
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLContinuent
 

More from Continuent (20)

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data Warehouses
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a Cluster
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMI
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a Pro
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & Troubleshooting
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSL
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Real-Time Loading from MySQL to Hadoop with Continuent Tungsten

  • 1. Real-Time Loading from MySQL to Hadoop Featuring Continuent Tungsten Robert Hodges, CEO ©Continuent 2014
  • 3. Introducing Continuent • The leading provider of clustering and replication for open source DBMS • Our Product: Continuent Tungsten • Clustering - Commercial-grade HA, performance scaling and data management for MySQL • Replication - Flexible, high-performance data movement ©Continuent 2014 3
  • 4. Quick Continuent Facts • Largest Tungsten installation processes over 700 million transactions daily on 225 terabytes of data • Tungsten Replicator was application of the year at the 2011 MySQL User Conference • Wide variety of topologies including MySQL, Oracle, Vertica, and MongoDB are in production now • MySQL to Hadoop deployments are now in progress with multiple customers ©Continuent 2014 4
  • 7. What Is Hadoop, Exactly? a.A distributed file system b.A method of processing massive quantities of data in parallel c.The Cutting family’s stuffed elephant d.All of the above ©Continuent 2014 7
  • 8. Hadoop Distributed File System hadoop command Find file NameNode (directory) Java Client Hive Read block(s) Pig DataNodes (replicated data) ©Continuent 2014 8
  • 10. Typical MySQL to Hadoop Use Case Hive (Analytics) Initial Load? Changes? Materialized views? App changes? Transaction Processing ©Continuent 2014 App load? Latency? 10 Hadoop Cluster
  • 11. Options for Loading Data Manual Loading Sqoop CSV Files ©Continuent 2014 Tungsten Replicator Sqoop 11
  • 12. Comparing Methods in Detail Manual via CSV Process Incremental Loading Latency Sqoop Tungsten Replicator Manual/ Scripted Manual/ Scripted Fully automated Possible with Requires DDL DDL changes changes Full-load Intermittent Fully supported Real-time Extraction Full and partial Low-impact Full table scan table scans binlog scan Requirements ©Continuent 2014 12
  • 13. Replicating MySQL Data to Hadoop using Tungsten Replicator ©Continuent 2014 13
  • 14. What is Tungsten Replicator? A real-time, high-performance, open source database replication engine ! GPL V2 license - 100% open source Download from https://code.google.com/p/tungsten-replicator/ Annual support subscription available from Continuent “Golden Gate® without the Price Tag” ©Continuent 2014 14
  • 15. Tungsten Replicator Overview Master Replicator Extract transactions from log DBMS Logs (Transactions + Metadata) Slave Replicator THL (Transactions + Metadata) Apply ©Continuent 2014 THL 15
  • 16. Tungsten Replicator 3.0 Hadoop • • Extract from MySQL or Oracle • • • • • Provision using Sqoop or parallel extraction ©Continuent 2014 Base Hadoop plus commercial distributions: Cloudera and HortonWorks Automatic replication of incremental changes Transformation to preferred HDFS formats Schema generation for Hive Tools for generating materialized views 16
  • 17. Basic MySQL to Hadoop Replication Access via Hive MySQL binlog_format=row Tungsten Slave Replicator hadoop MySQL Binlog Tungsten Master Replicator hadoop Master-Side Filtering * pkey - Fill in pkey info * colnames - Fill in names * cdc - Add update type and schema/table info * source - Add source DBMS * replicate - Subset tables to be replicated Load raw CSV to HDFS (e.g., via LOAD DATA to Hive) Extract from MySQL binlog ©Continuent 2014 CSV CSV CSV CSV Files CSV Files Files Files Files 17 Hadoop Cluster
  • 18. Hadoop Data Loading - Gory Details (Generate Table Definitions) Replicator Transactions from master Base Tables Base Tables Materialized Views hadoop Write data to CSV CSV CSV CSV Files Files Files Javascript load script e.g. hadoop.js ©Continuent 2014 Staging Staging Staging Tables Tables “Tables” Load using hadoop command (Generate Table Definitions) 18 (Run Map/ Reduce)
  • 19. Demo #1 ! Replicating sysbench data ©Continuent 2014 19
  • 20. Viewing MySQL Data in Hadoop ©Continuent 2014 20
  • 21. Generating Staging Table Schema $ ddlscan -template ddl-mysql-hive-0.10-staging.vm ! -user tungsten -pass secret ! -url jdbc:mysql:thin://logos1:3306/db01 -db db01! ...! DROP TABLE IF EXISTS db01.stage_xxx_sbtest;! ! CREATE EXTERNAL TABLE db01.stage_xxx_sbtest! (! tungsten_opcode STRING ,! tungsten_seqno INT ,! tungsten_row_id INT ,! id INT ,! k INT ,! c STRING ,! pad STRING)! ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' ESCAPED BY ''! LINES TERMINATED BY 'n'! STORED AS TEXTFILE LOCATION '/user/tungsten/staging/db01/sbtest'; ©Continuent 2014 21
  • 22. Generating Base Table Schema $ ddlscan -template ddl-mysql-hive-0.10.vm -user tungsten ! -pass secret -url jdbc:mysql:thin://logos1:3306/db01 -db db01! ...! DROP TABLE IF EXISTS db01.sbtest;! ! CREATE TABLE db01.sbtest! (! id INT ,! k INT ,! c STRING ,! pad STRING )! ;! ©Continuent 2014 22
  • 23. Creating a Materialized View in Theory Log #1 Log #2 ... Log #N MAP Sort by key(s), transaction order REDUCE Emit last row per key if not a delete ©Continuent 2014 23
  • 24. Creating a Materialized View in Hive $ hive! ...! hive ADD FILE /home/rhodges/github/continuent-tools-hadoop/bin/ tungsten-reduce;! hive FROM ( ! SELECT sbx.*! FROM db01.stage_xxx_sbtest sbx! DISTRIBUTE BY id ! SORT BY id,tungsten_seqno,tungsten_row_id! ) map1! INSERT OVERWRITE TABLE db01.sbtest! SELECT TRANSFORM(! tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad)! USING 'perl tungsten-reduce -k id -c tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad'! AS id INT,k INT,c STRING,pad STRING;! ... MAP REDUCE ©Continuent 2014 24
  • 25. Comparing MySQL and Hadoop Data $ export TUNGSTEN_EXT_LIBS=/usr/lib/hive/lib! ...! $ /opt/continuent/tungsten/bristlecone/bin/dc ! -url1 jdbc:mysql:thin://logos1:3306/db01 ! -user1 tungsten -password1 secret ! -url2 jdbc:hive2://localhost:10000 ! -user2 'tungsten' -password2 'secret' -schema db01 ! -table sbtest -verbose -keys id ! -driver org.apache.hive.jdbc.HiveDriver! 22:33:08,093 INFO DC - Data comparison utility! ...! 22:33:24,526 INFO Tables compare OK! ©Continuent 2014 25
  • 26. Doing it all at once $ git clone ! https://github.com/continuent/continuent-toolshadoop.git! ! $ cd continuent-tools-hadoop! ! $ bin/load-reduce-check ! -U jdbc:mysql:thin://logos1:3306/db01 ! -s db01 --verbose ©Continuent 2014 26
  • 27. Demo #2 ! Constructing and Checking a Materialized View ©Continuent 2014 27
  • 29. MySQL to Hadoop Fan-In Architecture Masters Slaves Replicator m1 (master) RBR Replicator Replicator m1 (slave) m2 (master) m2 (slave) m3 (slave) RBR Replicator m3 (master) RBR ©Continuent 2014 29 Hadoop Cluster (many nodes)
  • 30. Integration with Provisioning MySQL Access via Hive (Initial provisioning run) Sqoop/ETL Tungsten Master MySQL Binlog Tungsten Slave hadoop hadoop binlog_format=row ©Continuent 2014 CSV CSV CSV CSV Files CSV Files Files Files Files 30 Hadoop Cluster
  • 31. On-Demand Provisioning via Parallel Extract Access via Hive MySQL binlog_format=row Tungsten Slave Replicator hadoop MySQL Binlog Tungsten Master Replicator hadoop Master-Side Filtering * pkey - Fill in pkey info * colnames - Fill in names * cdc - Add update type and schema/table info * source - Add source DBMS * replicate - Subset tables to be replicated (other filters as needed) Extract from MySQL tables ©Continuent 2014 CSV CSV CSV CSV Files CSV Files Files Files Files Load raw CSV to HDFS (e.g., via LOAD DATA to Hive) 31 Hadoop Cluster
  • 32. Tungsten Replicator Roadmap • • • Parallel CSV file loading • • Replication out of Hadoop ©Continuent 2014 Partition loaded data by commit time Data formats and tools to support additional Hadoop clients as well as HBase Integration with emerging real-time analytics based on HDFS (Impala, Spark/Shark, Stinger,...) 32
  • 33. Getting Started with Continuent Tungsten ©Continuent 2014 33
  • 34. Where Is Everything? • Tungsten Replicator 3.0 builds are now available on code.google.com http://code.google.com/p/tungsten-replicator/ • Replicator 3.0 documentation is available on Continuent website http://docs.continuent.com/tungsten-replicator-3.0/ deployment-hadoop.html • Tungsten Hadoop tools are available on GitHub https://github.com/continuent/continuent-tools-hadoop Contact Continuent for support ©Continuent 2014 34
  • 35. Commercial Terms • • • Replicator features are open source (GPL V2) Investment Elements • • • POC / Development (Walk Away Option) Production Deployment Annual Support Subscription Governing Principles • • ©Continuent 2014 Annual Subscription Required More Upfront Investment - Less Annual Subscription 35
  • 36. We Do Clustering Too! GonzoPortal.com Tungsten clusters combine offthe-shelf open source MySQL servers into data services with: apache /php ! • 24x7 data access • Scaling of load on replicas • Simple management commands ! ...without app changes or data migration Amazon US West ©Continuent 2014 36 Connector Connector
  • 37. In Conclusion: Tungsten Offers... • Fully automated, real-time replication from MySQL into Hadoop • Support for automatic transformation to HDFS data formats and creation of full materialized views • Positions users to take advantage of evolving realtime features in Hadoop ©Continuent 2014 37
  • 38. 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: sales@continuent.com Our Blogs: http://scale-out-blog.blogspot.com http://mcslp.wordpress.com http://www.continuent.com/news/blogs Continuent Web Page: http://www.continuent.com ! Tungsten Replicator 2.0: http://code.google.com/p/tungsten-replicator ©Continuent 2014