3. Oracle In-Database Hadoop
Agenda
• In-Database MapReduce
• Why
• Previous Initiatives and Limitations
• Oracle In-Database Hadoop
• Integration with Oracle’s Big Data solution
• Summary
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
4. In-Database MapReduce
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
5. MapReduce Paradigm
You All Know This Stuff!
Map:
<K1,V1>
→
{<K2,V2>,…}
Shuffle:
{<K2,V2>,
…}
→
{<K2,{V2,…,V2}>,…}
Reduce:
<K2,{V2,…,V2}>
→
{<K3,V3>,…}
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
6. In-Database MapReduce
Why?
• Avoid shipping data residing in RDBMS to a
separate infrastructure.
• Many initiatives
• Address top two issues preventing broader
adoption of Hadoop in the enterprise
• Lack of development and/or administration skills
• Lack of enterprise-class security
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
7. In-Database MapReduce
Previous Efforts and Limitations
• SQL-MapReduce,HadoopDB (Hadapt), etc.
• PL/SQL User-defined pipelined table functions
and aggregation objects
• Limitations
• Lack of compatibility with Hadoop
• Loose integration with Hadoop
• Dependency on Hadoop infrastructure
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
8. Oracle In-Database Hadoop
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
9. Oracle In-Database Hadoop is
a prototype (not a feature of
Oracle products), built on
current Oracle products.
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
10. Oracle In-Database Hadoop
Goals
• Avoid shipping data residing in Oracle database to
Hadoop clusters.
• Preserve Hadoop programming model
• Reduce dependency on Hadoop infrastructure
• Get enterprise developers up to speed with minimal
training
• Get enterprise administrators (DBAs) up to speed
with minimal training
• Reduce deployment time
• Bring enterprise class security to MapReduce
• Seamless integration with Oracle’s Big Data
solution
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
11. Oracle In-Database Hadoop
Compatibility & Minimal Dependency on Hadoop Infra
Node 1 Node 2 Node 3
Pipelelined Table
Function w Java
impl.
Mapping Process Mapping Process Mapping Process
PARTITION by
CLUSTER BY Clause
Node 1 Node 2 Node 3
Pipelined Table
Function w Java impl.
Reducing Process Reducing Process Reducing Process
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
12. Oracle In-Database Hadoop
Preserve Hadoop Programming Model
• Source-compatibility
• Job configuration
• Invocation thru Java interface: job.run()
• Direct table access: TableReader and TableWriter
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
13. Oracle In-Database Hadoop
SQL and MapReduce Integration
• Mix SQL and MapReduce processing for flexibility and
efficiency.
• MapReduce steps as pipelined table functions.
INSERT
INTO
OutTable
SELECT
*
FROM
TABLE
(Word_Count_Reduce(:ConfKey,
CURSOR(SELECT
*
FROM
TABLE
(Word_Count_Map(:ConfKey,
CURSOR(SELECT
*
FROM
InTable))))))
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
14. Oracle In-Database Hadoop
SQL and Java interfaces
SELECT
*
FROM
TABLE
public
class
WordCount
{
(Reduce_VARCHAR2_NUMBER(:ConfKey,
public
static
void
main()
throws
Exception
{
/*
Setup
the
parameters
and
run
the
job
*/
CURSOR(SELECT
*
FROM
TABLE
……
(Map_VARCHAR2_NUMBER(:ConfKey,
job.init();
CURSOR(SELECT
*
from
InTable))))))
job.run();
}
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
15. Oracle In-Database Hadoop
Leverage Enterprise Skills
• Get database developers up to speed, with minimal
training, on developing MapReduce jobs by reusing
Hadoop Mappers and Reducers
• Get DBAs up to speed on deploying and managing
MapReduce jobs with minimal training
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
16. Oracle Database Security
Bringing Enterprise Class Security to MapReduce
• Auditing and Monitoring
• Database Activity Auditing
• Database Firewall Monitoring
• Centralized Audit Data Warehouse
• Encryption and Masking
• Transparent Data Encryption
• Network Encryption/Strong Auth
• Data Masking for Non-Production
• Privileged User Access Control and Contextual
Authorization
• Separation of Duties for DBAs
• Protection Realms & Rules
• Label Based Access Control
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
17. Seamless integration with Oracle’s Big
Data Solution
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
18. Oracle’s Big Data solution
Endeca Information Discovery
Oracle
Big Data
Oracle
Appliance
Exadata
Oracle
Exalytics
InfiniBand InfiniBand
Oracle
Real-Time
Decisions
Acquire Organize & Discover Analyze Decide
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
19. Oracle Direct Connector for HDFS
Direct Access from
HDFS Oracle Database Oracle Database
SQL Query
SQL access to HDFS
External
Table External table view
Data query or import
DCH HDFS
Infini
Band
DCH
DCH
Client
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
20. Oracle In-Database Analytics
Oracle Advanced
Analytics
Statistical
Data Mining
Text
Graph
Spatial
Semantic
2 miles
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
21. What Have We Done?
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
22. Oracle In-Database Hadoop
Summary
A prototype:
• Apply MapReduce processing to data in
Oracle RDBMS without the need of a
separate infrastructure.
• Compatibility with Hadoop while
minimizing dependency on the Apache
Hadoop infrastructure.
• Reduce training and deployment time.
• Integration with Oracle SQL, allowing
mixing MapReduce steps with
sophisticated SQL queries.
• Bring Enterprise Class Security to
Hadoop MapReduce
• Seamless integration with Oracle’s Big
Data solution
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
23. Demo
Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
24. Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah