SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1
Insert Picture Here
MySQL Applier for
Apache Hadoop
Real-Time Event Streaming to HDFS
Mats Kindahl
Neha Kumari
Shubhangi Garg
2013-09-21
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decision. The development,
release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Safe Harbor Statement
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.3
Presentation Outline
●
Why Big Data?
●
Working with Big Data
●
MySQL Applier for Hadoop
●
Road map
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.4
Why Big Data?
● Reporting
● Predefined data
● Viewing history
● Past occurrences
● Using Sales Data
● Typically in database
● Analytics
● Data mining
● Predicting future
● Trends
● Using all available data
● Sales
● Click stream
● Likes/Tweets
Traditional Approach Big Data
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.5
Why Big Data?
●
Web Recommendations
●
Sentiment Analysis
●
Marketing Campaign Analysis
●
Customer Churn Modeling
●
Fraud Detection
●
Research and Development
●
Risk Modeling
●
Machine Learning
90% with Pilot Projects
at end of 2012
Poor Data Costs
35% in Annual
Revenues
10% Improvement
in Data Usability
Drives $2bn in
Revenue
Source: http://wikibon.org/blog/big-data-statistics/
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.6
Why Hadoop?
●
Scales to thousands of nodes
●
Combines data from multiple
sources
●
Handles unstructured data
●
Run queries against all of the
data
●
Runs on commodity servers
●
Easy to set up
●
Affordable
●
Fault-tolerant
●
File block replication
●
Self-healing
●
Map/Reduce
●
Distributed processing model
●
Good for large data sets
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.7
Example Use-Case: On-Line Retail
Browsing
Recommendations Recommendations
Updates
Preferences
Brands “Liked”
Web Logs
Page Views
Comments
Customers
Purchase
History
Purchases
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.8
Big Data Lifecycle
Decide
Organize
Acquire
Applier
Analyze
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.9
Hadoop Tools: In the Lifecycle
Apache Sqoop
MySQL Applier for Hadoop
Apache Flume
Apache Drill
Apache Hive
Apache Pig
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.10
Hadoop Tools: Apache Sqoop
●
Apache top-level project
●
Part of Hadoop project
●
Developed by Cloudera
●
Bulk data import and export
●
Between Hadoop HDFS and external data stores
●
Support JDBC connector architecture
●
Supports plug-ins for specific functionality
●
“Fast-path” connector for MySQL
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.11
Hadoop Tools: Apache Sqoop
Sqoop
Job
Sqoop
Job
Sqoop
Job
Sqoop
Job
Sqoop
Job
Hadoop Cluster
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.12
Hadoop Tools: Apache Flume
●
Apache top-level project
●
Part of Hadoop project
●
Collecting log data
●
Various sources: Avro, Thrift, Syslog, Netcat
●
Can aggregate and consolidate data
●
Data typically sent to HDFS
●
Can store data in other “sinks” as well
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.13
Hadoop Tools: Apache Flume
Source Sink
HDFS
Channel
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.14
New Tool: MySQL Applier for Hadoop
● Using Binlog API
● Proof of concept
● Replication from MySQL to HDFS
● Exploit replication protocol
● Read server binary log
● Fetches changes from MySQL
● Using Binary Log API
● Row-based replication
● Caveat: DDL not handled
● Stores changes into HDFS
● Consumable by other tools
● Caveat: only row inserts
● Considering update/delete
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.15
New Tool: MySQL Applier for Hadoop
HDFS
Binlog
API
libhdfs
Binary Log
Events
MySQL Applier for Hadoop
Timestamp
Primary Key
Data
Decode
Row
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.16
MySQL Applier for Hadoop:
Requirements
● MySQL 5.6 or later
● Available at http://dev.mysql.com/downloads/mysql
● MySQL Applier for Hadoop
● Available at http://labs.mysql.com
● Apache Hadoop 1.0.4 or later
● Available at http://hadoop.apache.org/releases.html
● Apache Hive or other Hadoop Tool for analysis
● Available at http://hive.apache.org/releases.html
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.17
Hadoop Applier for Hadoop:
Mapping Rows
● Timestamp column is added first in table
● Timestamp from binary log
INSERT INTO test.tbl VALUES 
  (23456,'Sanjai','Feldhoffer'),
  (23457,'Manohar','Kakkar'),
  (23458,'Christ','Kalefeld'),
  (23459,'Gretta','Varker'),
  (23460,'Masato','Steinauer'),
  (23461,'Baruch','Uchoa');
1379361681,23456,Sanjai,Feldhoffer
1379361685,23457,Manohar,Kakkar
1379361692,23458,Christ,Kalefeld
1379361693,23459,Gretta,Varker
1379361699,23460,Masato,Steinauer
1379361703,23461,Baruch,Uchoa
MySQL HDFS
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.18
Hadoop Applier for Hadoop:
Using Hive
● Does not handle DDL
● Create table manually as above
● MySQL Applier field and row delimiter can be controlled
­­field­delimiter
­­row­delimiter
CREATE TABLE tbl (
  user_id INT PRIMARY KEY,
  first CHAR(60), last CHAR(60)
)
CREATE TABLE tbl (
  ts INT,
  user_id INT,
  first STRING, last STRING
) ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  STORED AS TEXTFILE 
SQL HDFS
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.19
Hadoop Applier for Hadoop
● Start MySQL Applier for Hadoop
happlier ­­field­delimiter=, 
  mysql://root@example.com hdfs://example.com:9000
● Inserts written to files in warehouse directory
● Default: /user/hive/warehouse
● MySQL Table: test.tbl
HDFS: /user/hive/warehouse/test.db/tbl/datafile1.txt
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.20
Hadoop Applier for Hadoop:
Update and Delete?
● Batch import using Sqoop
● Transfer all data each time
● If changes are small, bandwidth is
wasted
Sqoop
Hadoop Rack
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.21
Hadoop Applier for Hadoop:
Update and Delete?
● Batch import using Sqoop
● Transfer all data each time
● If changes are small, bandwidth is
wasted
● Incremental import using Applier
● Only changes imported
● Bandwidth is used efficiently
● … but what about updates and
deletes?
Applier
Hadoop Rack
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.22
Hadoop Applier for Hadoop:
Update and Delete?
● Problem:
● HDFS is append-only
● Rows inserted are appended to file
● How can rows be updated or deleted?
● Idea:
● Rows updated are appended to file
● Rows have primary key
● Row contain after-image and timestamp of update
● For each primary key, pick row with latest timestamp
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.23
Hadoop Applier for Hadoop:
Update and Delete?
Applier
Hadoop Rack
● Timestamped rows to HDFS
● After image for updates
● Flag deletes
● Customized HiveQL queries
SELECT … FROM tbl
WHERE ts = MAX(ts)
GROUP BY key
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.24
Hadoop Applier for Hadoop:
Update and Delete?
Clean
DirtyApplier
Cleaning
Job
Hadoop Rack
● Timestamped rows to HDFS
● After image for updates
● Flag deletes
● Special “cleaning“ job
● Read dirty files
● Write clean files
● Moving data inside rack use
bandwidth efficiently
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.25
MySQL and Hadoop:
Resources and Information
●
MySQL and Hadoop: Guide to Big Data Integration
http://www.mysql.com/why-mysql/white-papers/mysql-and-hadoop-guide-to-
big-data-integration
●
MySQL Applier for Hadoop
http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html
●
Developer Blogs
●
Mats Kindahl: http://mysqlmusings.blogspot.com
●
Shubhangi Garg: http://innovating-technology.blogspot.in
●
Neha Kumari: http://nehakumari19.blogspot.in
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.26
Thank you!

Mais conteúdo relacionado

Mais procurados

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...spinningmatt
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...Sergey Lukjanov
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 

Mais procurados (20)

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 

Destaque

Set Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopSet Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopContinuent
 
Anatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsAnatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsIvan Zoratti
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Continuent
 
Sharding and Scale-out using MySQL Fabric
Sharding and Scale-out using MySQL FabricSharding and Scale-out using MySQL Fabric
Sharding and Scale-out using MySQL FabricMats Kindahl
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL FabricMats Kindahl
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
Building Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL FabricBuilding Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL FabricMats Kindahl
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityIvan Zoratti
 
MariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL MeetupMariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL MeetupIvan Zoratti
 

Destaque (9)

Set Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopSet Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into Hadoop
 
Anatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsAnatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale Internals
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
 
Sharding and Scale-out using MySQL Fabric
Sharding and Scale-out using MySQL FabricSharding and Scale-out using MySQL Fabric
Sharding and Scale-out using MySQL Fabric
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL Fabric
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
Building Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL FabricBuilding Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL Fabric
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
 
MariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL MeetupMariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL Meetup
 

Semelhante a MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS

2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverviewjdijcks
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...TheInevitableCloud
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderainevitablecloud
 
Replicate data between environments
Replicate data between environmentsReplicate data between environments
Replicate data between environmentsDLT Solutions
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientistpasalapudi
 
MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014
MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014
MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014Dave Stokes
 
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Dave Stokes
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data InfrastructureTrivadis
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Milomir Vojvodic
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 

Semelhante a MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS (20)

2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview2013 05 Oracle big_dataapplianceoverview
2013 05 Oracle big_dataapplianceoverview
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Replicate data between environments
Replicate data between environmentsReplicate data between environments
Replicate data between environments
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientist
 
MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014
MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014
MySQL 5.7 New Features to Exploit -- PHPTek/Chicago MySQL User Group May 2014
 
Glusterfs and Hadoop
Glusterfs and HadoopGlusterfs and Hadoop
Glusterfs and Hadoop
 
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 

Mais de Mats Kindahl

Elastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStackElastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStackMats Kindahl
 
MySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL ServersMySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL ServersMats Kindahl
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMats Kindahl
 
Replication Tips & Trick for SMUG
Replication Tips & Trick for SMUGReplication Tips & Trick for SMUG
Replication Tips & Trick for SMUGMats Kindahl
 
Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHPMats Kindahl
 
Replication Tips & Tricks
Replication Tips & TricksReplication Tips & Tricks
Replication Tips & TricksMats Kindahl
 
MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011Mats Kindahl
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesMats Kindahl
 
Mysteries of the binary log
Mysteries of the binary logMysteries of the binary log
Mysteries of the binary logMats Kindahl
 

Mais de Mats Kindahl (10)

Why rust?
Why rust?Why rust?
Why rust?
 
Elastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStackElastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStack
 
MySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL ServersMySQL Fabric: Easy Management of MySQL Servers
MySQL Fabric: Easy Management of MySQL Servers
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
Replication Tips & Trick for SMUG
Replication Tips & Trick for SMUGReplication Tips & Trick for SMUG
Replication Tips & Trick for SMUG
 
Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHP
 
Replication Tips & Tricks
Replication Tips & TricksReplication Tips & Tricks
Replication Tips & Tricks
 
MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011MySQL Binary Log API Presentation - OSCON 2011
MySQL Binary Log API Presentation - OSCON 2011
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
 
Mysteries of the binary log
Mysteries of the binary logMysteries of the binary log
Mysteries of the binary log
 

Último

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS

  • 1. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1 Insert Picture Here MySQL Applier for Apache Hadoop Real-Time Event Streaming to HDFS Mats Kindahl Neha Kumari Shubhangi Garg 2013-09-21
  • 2. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Safe Harbor Statement
  • 3. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.3 Presentation Outline ● Why Big Data? ● Working with Big Data ● MySQL Applier for Hadoop ● Road map
  • 4. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.4 Why Big Data? ● Reporting ● Predefined data ● Viewing history ● Past occurrences ● Using Sales Data ● Typically in database ● Analytics ● Data mining ● Predicting future ● Trends ● Using all available data ● Sales ● Click stream ● Likes/Tweets Traditional Approach Big Data
  • 5. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.5 Why Big Data? ● Web Recommendations ● Sentiment Analysis ● Marketing Campaign Analysis ● Customer Churn Modeling ● Fraud Detection ● Research and Development ● Risk Modeling ● Machine Learning 90% with Pilot Projects at end of 2012 Poor Data Costs 35% in Annual Revenues 10% Improvement in Data Usability Drives $2bn in Revenue Source: http://wikibon.org/blog/big-data-statistics/
  • 6. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.6 Why Hadoop? ● Scales to thousands of nodes ● Combines data from multiple sources ● Handles unstructured data ● Run queries against all of the data ● Runs on commodity servers ● Easy to set up ● Affordable ● Fault-tolerant ● File block replication ● Self-healing ● Map/Reduce ● Distributed processing model ● Good for large data sets
  • 7. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.7 Example Use-Case: On-Line Retail Browsing Recommendations Recommendations Updates Preferences Brands “Liked” Web Logs Page Views Comments Customers Purchase History Purchases
  • 8. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.8 Big Data Lifecycle Decide Organize Acquire Applier Analyze
  • 9. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.9 Hadoop Tools: In the Lifecycle Apache Sqoop MySQL Applier for Hadoop Apache Flume Apache Drill Apache Hive Apache Pig
  • 10. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.10 Hadoop Tools: Apache Sqoop ● Apache top-level project ● Part of Hadoop project ● Developed by Cloudera ● Bulk data import and export ● Between Hadoop HDFS and external data stores ● Support JDBC connector architecture ● Supports plug-ins for specific functionality ● “Fast-path” connector for MySQL
  • 11. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.11 Hadoop Tools: Apache Sqoop Sqoop Job Sqoop Job Sqoop Job Sqoop Job Sqoop Job Hadoop Cluster
  • 12. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.12 Hadoop Tools: Apache Flume ● Apache top-level project ● Part of Hadoop project ● Collecting log data ● Various sources: Avro, Thrift, Syslog, Netcat ● Can aggregate and consolidate data ● Data typically sent to HDFS ● Can store data in other “sinks” as well
  • 13. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.13 Hadoop Tools: Apache Flume Source Sink HDFS Channel
  • 14. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.14 New Tool: MySQL Applier for Hadoop ● Using Binlog API ● Proof of concept ● Replication from MySQL to HDFS ● Exploit replication protocol ● Read server binary log ● Fetches changes from MySQL ● Using Binary Log API ● Row-based replication ● Caveat: DDL not handled ● Stores changes into HDFS ● Consumable by other tools ● Caveat: only row inserts ● Considering update/delete
  • 15. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.15 New Tool: MySQL Applier for Hadoop HDFS Binlog API libhdfs Binary Log Events MySQL Applier for Hadoop Timestamp Primary Key Data Decode Row
  • 16. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.16 MySQL Applier for Hadoop: Requirements ● MySQL 5.6 or later ● Available at http://dev.mysql.com/downloads/mysql ● MySQL Applier for Hadoop ● Available at http://labs.mysql.com ● Apache Hadoop 1.0.4 or later ● Available at http://hadoop.apache.org/releases.html ● Apache Hive or other Hadoop Tool for analysis ● Available at http://hive.apache.org/releases.html
  • 17. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.17 Hadoop Applier for Hadoop: Mapping Rows ● Timestamp column is added first in table ● Timestamp from binary log INSERT INTO test.tbl VALUES    (23456,'Sanjai','Feldhoffer'),   (23457,'Manohar','Kakkar'),   (23458,'Christ','Kalefeld'),   (23459,'Gretta','Varker'),   (23460,'Masato','Steinauer'),   (23461,'Baruch','Uchoa'); 1379361681,23456,Sanjai,Feldhoffer 1379361685,23457,Manohar,Kakkar 1379361692,23458,Christ,Kalefeld 1379361693,23459,Gretta,Varker 1379361699,23460,Masato,Steinauer 1379361703,23461,Baruch,Uchoa MySQL HDFS
  • 18. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.18 Hadoop Applier for Hadoop: Using Hive ● Does not handle DDL ● Create table manually as above ● MySQL Applier field and row delimiter can be controlled ­­field­delimiter ­­row­delimiter CREATE TABLE tbl (   user_id INT PRIMARY KEY,   first CHAR(60), last CHAR(60) ) CREATE TABLE tbl (   ts INT,   user_id INT,   first STRING, last STRING ) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','   STORED AS TEXTFILE  SQL HDFS
  • 19. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.19 Hadoop Applier for Hadoop ● Start MySQL Applier for Hadoop happlier ­­field­delimiter=,    mysql://root@example.com hdfs://example.com:9000 ● Inserts written to files in warehouse directory ● Default: /user/hive/warehouse ● MySQL Table: test.tbl HDFS: /user/hive/warehouse/test.db/tbl/datafile1.txt
  • 20. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.20 Hadoop Applier for Hadoop: Update and Delete? ● Batch import using Sqoop ● Transfer all data each time ● If changes are small, bandwidth is wasted Sqoop Hadoop Rack
  • 21. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.21 Hadoop Applier for Hadoop: Update and Delete? ● Batch import using Sqoop ● Transfer all data each time ● If changes are small, bandwidth is wasted ● Incremental import using Applier ● Only changes imported ● Bandwidth is used efficiently ● … but what about updates and deletes? Applier Hadoop Rack
  • 22. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.22 Hadoop Applier for Hadoop: Update and Delete? ● Problem: ● HDFS is append-only ● Rows inserted are appended to file ● How can rows be updated or deleted? ● Idea: ● Rows updated are appended to file ● Rows have primary key ● Row contain after-image and timestamp of update ● For each primary key, pick row with latest timestamp
  • 23. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.23 Hadoop Applier for Hadoop: Update and Delete? Applier Hadoop Rack ● Timestamped rows to HDFS ● After image for updates ● Flag deletes ● Customized HiveQL queries SELECT … FROM tbl WHERE ts = MAX(ts) GROUP BY key
  • 24. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.24 Hadoop Applier for Hadoop: Update and Delete? Clean DirtyApplier Cleaning Job Hadoop Rack ● Timestamped rows to HDFS ● After image for updates ● Flag deletes ● Special “cleaning“ job ● Read dirty files ● Write clean files ● Moving data inside rack use bandwidth efficiently
  • 25. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.25 MySQL and Hadoop: Resources and Information ● MySQL and Hadoop: Guide to Big Data Integration http://www.mysql.com/why-mysql/white-papers/mysql-and-hadoop-guide-to- big-data-integration ● MySQL Applier for Hadoop http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html ● Developer Blogs ● Mats Kindahl: http://mysqlmusings.blogspot.com ● Shubhangi Garg: http://innovating-technology.blogspot.in ● Neha Kumari: http://nehakumari19.blogspot.in
  • 26. Copyright © 2013, Oracle and/or its affiliates. All rights reserved.26 Thank you!