The latest releases of OBIEE and ODI come with the ability to connect to Hadoop data sources, using MapReduce to integrate data from clusters of "big data" servers complementing traditional BI data sources. In this presentation, we will look at how these two tools connect to Apache Hadoop and access "big data" sources, and share tips and tricks on making it all work smoothly.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13
1. Leveraging Hadoop with OBIEE 11g and ODI 11g
Mark Rittman, CTO, Rittman Mead
UKOUG Tech’13 Conference, Manchester, December 2013
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
2. About the Speaker
• Mark Rittman, Co-Founder of Rittman Mead
• Oracle ACE Director, specialising in Oracle BI&DW
• 14 Years Experience with Oracle Technology
• Regular columnist for Oracle Magazine
• Author of two Oracle Press Oracle BI books
• Oracle Business Intelligence Developers Guide
• Oracle Exalytics Revealed
• Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
• Email : mark.rittman@rittmanmead.com
• Twitter : @markrittman
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
3. About Rittman Mead
• Oracle BI and DW Gold partner
• Winner of five UKOUG Partner of the Year awards in 2013 - including BI
• World leading specialist partner for technical excellence,
solutions delivery and innovation in Oracle BI
• Approximately 80 consultants worldwide
• All expert in Oracle BI and DW
• Offices in US (Atlanta), Europe, Australia and India
• Skills in broad range of supporting Oracle tools:
‣OBIEE, OBIA
‣ODIEE
‣Essbase, Oracle OLAP
‣GoldenGate
‣Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
4. Leveraging Hadoop with OBIEE 11g and ODI 11g
Part 1 : Hadoop, Big Data and DW Architectures
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
5. Traditional Data Warehouse / BI Architectures
• Three-layer architecture - staging, foundation and access/performance
• All three layers stored in a relational database (Oracle)
• ETL used to move data
from layer-to-layer
Traditional Relational Data Warehouse
Staging
Foundation /
ODS
Performance /
Dimensional
Data
Load
Data
Load
Traditional structured
data sources
ETL
ETL
Data
Load
Data
Load
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
Direct
Read
E : info@rittmanmead.com
W : www.rittmanmead.com
BI Tool (OBIEE)
with metadata
layer
OLAP / In-Memory
Tool with data load
into own database
6. Recent Innovations and Developments in DW Architecture
• The rise of “big data” and “hadoop”
‣New ways to process, store and analyse data
‣New paradigm for TCO - low-cost servers, open-source software, cheap clustering
• Explosion in potential data-source types
‣Unstructured data
‣Social media feeds
‣Schema-less and schema-on-read databases
• New ways of hosting data warehouses
‣In the cloud
‣Do we even need an Oracle database or DW?
• Lots of opportunities for DW/BI developers - make our systems cheaper, wider range of data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
7. Introduction of New Data Sources : Unstructured, Big Data
Data
Load
Traditional Relational Data Warehouse
Staging
Traditional structured
data sources
Data
Load
Schema-less / NoSQL
data sources
Unstructured/
Social / Doc
data sources
Hadoop /
Big Data
data sources
Foundation /
ODS
ETL
Performance /
Dimensional
Direct
Read
ETL
Data
Load
Data
Load
Data
Load
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
BI Tool (OBIEE)
with metadata
layer
OLAP / In-Memory
Tool with data load
into own database
8. Unstructured, Semi-Structured and Schema-Less Data
• Gaining access to the vast amounts of non-financial / application data out there
‣Data in documents, spreadsheets etc
- Warranty claims, supporting documents, notes etc
‣Data coming from the cloud / social media
‣Data for which we don’t yet have a structure
‣Data who’s structure we’ll decide when we
Schema-less / NoSQL
choose to access it (“schema-on-read”)
data sources
• All of the above could be useful information
Unstructured/
Social / Doc
to have in our DW and BI systems
data sources
‣But how do we load it in?
Hadoop /
‣And what if we want to access it directly?
Big Data
data sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
9. Hadoop, and the Big Data Ecosystem
• Apache Hadoop is one of the most well-known Big Data technologies
‣Family of open-source products used to store, and analyze distributed datasets
‣Hadoop is the enabling framework, automatically parallelises and co-ordinates jobs
‣MapReduce is the programming framework
for filtering, sorting and aggregating data
‣Map : filter data and pass on to reducers
‣Reduce : sort, group and return results
‣MapReduce jobs can be written in any
language (Java etc), but it is complicated
• Can be used as an extension of the DW staging layer - cheap processing & storage
• And there may be data stored in Hadoop that our BI users might benefit from
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
10. HDFS: Low-Cost, Clustered, Fault-Tolerant Storage
• The filesystem behind Hadoop, used to store data for Hadoop analysis
‣Unix-like, uses commands such as ls, mkdir, chown, chmod
• Fault-tolerant, with rapid fault detection and recovery
• High-throughput, with streaming data access and large block sizes
• Designed for data-locality, placing data closed to where it is processed
• Accessed from the command-line, via internet (hdfs://), GUI tools etc
[oracle@bigdatalite mapreduce]$ hadoop fs -mkdir /user/oracle/my_stuff
[oracle@bigdatalite mapreduce]$ hadoop fs -ls /user/oracle
Found 5 items
drwx------ oracle hadoop
0 2013-04-27 16:48 /user/oracle/.staging
drwxrwxrwx
- oracle hadoop
0 2012-09-18 17:02 /user/oracle/moviedemo
drwxrwxrwx
- oracle hadoop
0 2012-10-17 15:58 /user/oracle/moviework
drwxr-xr-x
- oracle hadoop
0 2013-05-03 17:49 /user/oracle/my_stuff
drwxr-xr-x
- oracle hadoop
0 2012-08-10 16:08 /user/oracle/stage
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
11. Hadoop & HDFS as a Low-Cost Pre-Staging Layer
Data
Load
Traditional Relational Data Warehouse
Hadoop
Pre-ETL
Filtering &
Aggregation
(MapReduce)
Traditional structured
data sources
Staging
Foundation /
ODS
Performance /
Dimensional
Direct
Read
BI Tool (OBIEE)
with metadata
layer
Data
Load
Schema-less / NoSQL
data sources
Unstructured/
Social / Doc
data sources
Hadoop /
Big Data
data sources
Data
Load
Data
Load
ETL
Data
Load
Low-cost
file store
(HDFS)
Data
Load
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
ETL
E : info@rittmanmead.com
W : www.rittmanmead.com
OLAP / In-Memory
Tool with data load
into own database
12. Big Data and the Hadoop “Data Warehouse”
• Rather than load Hadoop data
into the DW, access it directly
• Hadoop has a “DW layer” called
Hive, which provides SQL access
• Could even be used instead of
a traditional DW or data mart
• Limited functionality now
• But products maturing
• and unbeatable TCO
Data
Load
Hadoop
Cloud-Based
data sources
Pre-ETL
Filtering &
Aggregation
(MapReduce)
Direct
Read
Data
Load
Schema-less / NoSQL
data sources
Unstructured/
Social / Doc
data sources
Hadoop /
Big Data
data sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
Data
Load
Data
Load
E : info@rittmanmead.com
W : www.rittmanmead.com
Hadoop DW
Layer (Hive)
Low-cost
file store
(HDFS)
BI Tool (OBIEE)
with metadata
layer
13. Hive as the Hadoop “Data Warehouse”
• MapReduce jobs are typically written in Java, but Hive can make this simpler
• Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
• Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
• Approach used by ODI and OBIEE
to gain access to Hadoop data
• Allows Hadoop data to be accessed just like
any other data source (sort of...)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
14. How Hive Provides SQL Access over Hadoop
• Hive uses a RBDMS metastore to hold
table and column definitions in schemas
• Hive tables then map onto HDFS-stored files
‣Managed tables
‣External tables
• Oracle-like query optimizer, compiler,
executor
HDFS
• JDBC and OBDC drivers,
plus CLI etc
Hive Driver
(Compile
Optimize, Execute)
Metastore
Managed Tables
External Tables
/user/hive/warehouse/
/user/oracle/
/user/movies/data/
HDFS or local files
loaded into Hive HDFS
area, using HiveQL
CREATE TABLE
command
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
HDFS files loaded into HDFS
using external process, then
mapped into Hive using
CREATE EXTERNAL TABLE
command
15. Transforming HiveQL Queries into MapReduce Jobs
• HiveQL queries are automatically translated into Java MapReduce jobs
• Selection and filtering part becomes Map tasks
• Aggregation part becomes the Reduce tasks
Map
Task
Map
Task
SELECT a, sum(b)
FROM myTable
WHERE a<100
Map
Task
GROUP BY a
Reduce
Task
Reduce
Task
Result
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
16. An example Hive Query Session: Connect and Display Table List
[oracle@bigdatalite ~]$ hive
Hive history file=/tmp/oracle/hive_job_log_oracle_201304170403_1991392312.txt
hive> show tables;
OK
dwh_customer
dwh_customer_tmp
i_dwh_customer
ratings
src_customer
src_sales_person
weblog
weblog_preprocessed
weblog_sessionized
Time taken: 2.925 seconds
Hive Server lists out all
“tables” that have been
defined within the Hive
environment
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
17. An example Hive Query Session: Display Table Row Count
hive> select count(*) from src_customer;!
Request count(*) from table
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
Hive server generates
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
MapReduce job to “map” table
In order to limit the maximum number of reducers:
key/value pairs, and then
set hive.exec.reducers.max=
reduce the results to table
In order to set a constant number of reducers:
count
set mapred.reduce.tasks=
Starting Job = job_201303171815_0003, Tracking URL =
http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201303171815_0003
Kill Command = /usr/lib/hadoop-0.20/bin/
hadoop job -Dmapred.job.tracker=localhost.localdomain:8021 -kill job_201303171815_0003
2013-04-17 04:06:59,867 Stage-1 map
2013-04-17 04:07:03,926 Stage-1 map
2013-04-17 04:07:14,040 Stage-1 map
2013-04-17 04:07:15,049 Stage-1 map
Ended Job = job_201303171815_0003
OK
=
=
=
=
0%, reduce =
100%, reduce
100%, reduce
100%, reduce
0%
= 0%
= 33%
= 100%
MapReduce job automatically
run by Hive Server
!
25
Time taken: 22.21 seconds
Results returned to user
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
18. Leveraging Hadoop with OBIEE 11g and ODI 11g
Demonstration of Hive and HiveQL
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
19. DW 2013: The Mixed Architecture with Federated Queries
• Where many organisations are going:
• Traditional DW at core of strategy
• Making increasing use of low-cost,
cloud/big data tech for storage /
pre-processing
• Access to non-traditional data sources,
usually via ETL in to the DW
• Federated data access through
OBIEE connectivity & metadata layer
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
20. Oracle’s Big Data Products
• Oracle Big Data Appliance - Engineered System for Big Data Acquisition and Processing
‣Cloudera Distribution of Hadoop
‣Cloudera Manager
‣Open-source R
‣Oracle NoSQL Database Community Edition
‣Oracle Enterprise Linux + Oracle JVM
• Oracle Big Data Connectors
‣Oracle Loader for Hadoop (Hadoop > Oracle RBDMS)
‣Oracle Direct Connector for HDFS (HFDS > Oracle RBDMS)
‣Oracle Data Integration Adapter for Hadoop
‣Oracle R Connector for Hadoop
‣Oracle NoSQL Database (column/key-store DB based on BerkeleyDB)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
21. Oracle Loader for Hadoop
• Oracle technology for accessing Hadoop data, and loading it into an Oracle database
• Pushes data transformation, “heavy lifting” to the Hadoop cluster, using MapReduce
• Direct-path loads into Oracle Database, partitioned and non-partitioned
• Online and offline loads
• Key technology for fast load of
Hadoop results into Oracle DB
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
22. Oracle Direct Connector for HDFS
• Enables HDFS as a data-source for Oracle Database external tables
• Effectively provides Oracle SQL access over HDFS
• Supports data query, or import into Oracle DB
• Treat HDFS-stored files in the same way as regular files
‣But with HDFS’s low-cost
‣… and fault-tolerance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
23. Oracle Data Integration Adapter for Hadoop
• ODI 11g Application Adapter (pay-extra option) for Hadoop connectivity
• Works for both Windows and Linux installs of ODI Studio
‣Need to source HiveJDBC drivers and JARs from separate Hadoop install
• Provides six new knowledge modules
‣IKM File to Hive (Load Data)
‣IKM Hive Control Append
‣IKM Hive Transform
‣IKM File-Hive to Oracle (OLH)
‣CKM Hive
‣RKM Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
24. ODI as Part of Oracle’s Big Data Strategy
• ODI is the data integration tool for extracting data from Hadoop/MapReduce, and loading
into Oracle Big Data Appliance, Oracle Exadata and Oracle Exalytics
• Oracle Application Adaptor for Hadoop provides required data adapters
‣Load data into Hadoop from local filesystem,
or HDFS (Hadoop clustered FS)
‣Read data from Hadoop/MapReduce using
Apache Hive (JDBC) and HiveQL, load
into Oracle RDBMS using
Oracle Loader for Hadoop
• Supported by Oracle’s Engineered Systems
‣Exadata
‣Exalytics
‣Big Data Appliance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
25. Oracle Business Analytics and Big Data Sources
• OBIEE 11g can also make use of big data sources
‣OBIEE 11.1.1.7+ supports Hive/Hadoop as a data source
‣Oracle R Enterprise can expose R models through DB functions, columns
‣Oracle Exalytics has InfiniBand connectivity to Oracle BDA
• Endeca Information Discovery can analyze unstructured and semi-structured sources
‣Increasingly tighter-integration between
OBIEE and Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
26. Opportunities for OBIEE and ODI with Big Data Sources and Tools
• Load data from a Hadoop/HDFS/NoSQL environment into a structured DW for analysis
• Provide OBIEE as an alternative to
Java coding or HiveQL for analysts
• Leverage Hadoop & HDFS for
massively-parallel staging-layer
number crunching
• Make use of low-cost, fault-tolerant
hardware for parts of your BI platform
• Provide the reporting and analysis
for customers who have bought
Oracle Big Data Appliance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
27. OBIEE and ODI Access to Hive: MapReduce with no Java Coding
• Requests in HiveQL arrive via HiveODBC, HiveJDBC
or through the Hive command shell
• JDBC and ODBC access requires Thift server
‣Provides RPC call interface over Hive for external procs
• All queries then get parsed, optimized and compiled, then
sent to Hadoop NameNode and Job Tracker
• Then Hadoop processes the query, generating MapReduce
jobs and distributing it to run in parallel across all data nodes
• Hadoop access can still be performed procedurally if needed,
typically coded by hand in Java, or through Pig, etc
‣The equivalent of PL/SQL compared to SQL
‣But Hive works well with the OBIEE/ODI paradigm
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
28. Complementary Technologies: HDFS, Cloudera Manager, Hue etc
• You can download your own Hive binaries, libraries etc from Apache Hadoop website
• Or use pre-built VMs and distributions from the likes of Cloudera
‣Cloudera CDH3/4 is used on Oracle Big Data Appliance
‣Open-source + proprietary tools (Cloudera Manager)
• Other tools for managing Hive, HFDS etc including
‣Hue (HDFS file browser + management)
‣Beeswax (Hive administration + querying)
• Other complementary/required Hadoop tools
‣Sqoop
‣HDFS
‣Thrift
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
29. Leveraging Hadoop with OBIEE 11g and ODI 11g
Part 2 : ODI 11g and Hadoop / Big Data Sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
30. How ODI Accesses Hadoop Data
• ODI accesses data in Hadoop clusters through Apache Hive
‣Metadata and query layer over MapReduce
‣Provides SQL-like language (HiveQL) and a data dictionary
‣Provides a means to define “tables”, into
which file data is loaded, and then queried
Hadoop Cluster
via MapReduce
‣Accessed via Hive JDBC driver(separate
MapReduce
Hadoop install required
on ODI server, for client libs)
Hive Server
• Additional access through
HiveQL
Oracle Direct Connector for HDFS
and Oracle Loader for Hadoop
ODI 11g
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle RDBMS
Direct-path loads using
Oracle Loader for Hadoop,
transformation logic in
MapReduce
31. Relationship Between ODI and OBIEE with Big Data Sources
• OBIEE now has the ability to report
against Hadoop data, via Hive
‣Assumes that data is already loaded
into the Hive warehouse tables
• ODI therefore can be used to load
the Hive tables, through either:
‣Loading Hive from files
‣Joining and loading from Hive-Hive
‣Loading and transforming via
shell scripts (python, perl etc)
• ODI could also extract the Hive data
and load into Oracle, if more appropriate
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
32. Configuring ODI 11.1.1.6+ for Hadoop Connectivity
• Obtain an installation of Hadoop/Hive from somewhere (Cloudera CDH3/4 for example)
• Copy the following files into a temp directory, archive and transfer to ODI environment
$HIVE_HOME/lib/*.jar
$HADOOP_HOME/hadoop-*-core*.jar,
$HADOOP_HOME/Hadoop-*-tools*.jar
for example...
!
/usr/lib/hive/lib/*.jar
/usr/lib/hadoop-0.20/hadoop-*-core*.jar,
!
/usr/lib/hadoop-0.20/Hadoop-*-tools*.jar
!
• Copy JAR files into userlib directory and (standalone) agent lib directory
c:UsersAdministratorAppDataRoamingodioraclediuserlib
!
!
• Restart ODI Studio
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
33. Registering HDFS and Hive Sources and Targets in ODI
• For Hive sources and targets, use Hive technology
‣JDBC Driver : Apache Hive JDBC Driver
‣JDBC URL : jdbc:hive://[server_name]:10000/default
‣(Flexfield Name) Hive Metastore URIs : thrift://[server_name]:10000
!
• For HFDS sources, use File technology
‣JDBC URL :
hdfs://[server_name]:port
‣Special HDFS “trick” to use File tech
(no specific HDFS technology)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
34. Reverse Engineering Hive, HDFS and Local File Datastores + Models
• Hive tables reverse-engineer just like regular tables
• Define model in Designer navigator, uses Hive RKM to retrieve table metadata
• Information on Hive-specific metadata stored in flexfields
‣Hive Buckets
‣Hive Partition Column
‣Hive Cluster Column
‣Hive Sort Column
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
35. Leveraging Hadoop with OBIEE 11g and ODI 11g
Demonstration of ODI 11.1.1.6 Configured for Hadoop
Access, with Hive/HFDS source and targets registered
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
36. Oracle Data Integration Adapter for Hadoop
• ODI 11g Application Adapter (pay-extra option) for Hadoop connectivity
• Works for both Windows and Linux installs of ODI Studio
‣Need to source HiveJDBC drivers and JARs from separate Hadoop install
• Provides six new knowledge modules
‣IKM File to Hive (Load Data)
‣IKM Hive Control Append
‣IKM Hive Transform
‣IKM File-Hive to Oracle (OLH)
‣CKM Hive
‣RKM Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
37. Oracle Loader for Hadoop
• Oracle technology for accessing Hadoop data, and loading it into an Oracle database
• Pushes data transformation, “heavy lifting” to the Hadoop cluster, using MapReduce
• Direct-path loads into Oracle Database, partitioned and non-partitioned
• Online and offline loads
• Key technology for fast load of
Hadoop results into Oracle DB
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
38. Oracle Direct Connector for HDFS
• Enables HDFS as a data-source for Oracle Database external tables
• Effectively provides Oracle SQL access over HDFS
• Supports data query, or import into Oracle DB
• Treat HDFS-stored files in the same way as regular files
‣But with HDFS’s low-cost
‣… and fault-tolerance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
39. IKM File to Hive (Load Data): Loading Hive Tables from File or HDFS
• Uses the Hive Load Data command to load
from local or HDFS files
• Calls Hadoop FS commands for simple
copy/move into/around HDFS
• Commands generated by ODI through
IKM File to Hive (Load Data)
hive> load data inpath '/user/oracle/movielens_src/u.data'
> overwrite into table movie_ratings;
Loading data to table default.movie_ratings
Deleted hdfs://localhost.localdomain/user/hive/warehouse/
movie_ratings
OK
Time taken: 0.341 seconds
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
40. IKM File to Hive (Load Data): Loading Hive Tables from File or HDFS
• IKM File to Hive (Load Data) generates the
required HiveQL commands using a script template
• Executed over HiveJDBC interface
• Success/Failure/Warning returned to ODI
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
41. Load Data and Hadoop SerDe (Serializer-Deserializer) Transforms
• Hadoop SerDe transformations can be
accessed, for example to transform weblogs
• Hadoop interface that contains:
‣Deserializer - converts incoming data
into Java objects for Hive manipulation
‣Serializer - takes Hive Java objects &
converts to output for HDFS
• Library of SerDe transformations readily
available for use with Hive
• Use the OVERRIDE_ROW_FORMAT
option in IKM to override regular column
mappings in Mapping tab
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
42. IKM Hive Control Append: Load, Join & Filtering Between Hive Tables
• Hive source and target, transformations according to HiveQL
functionality (aggregations, functions etc)
• Ability to join data sources
• Other data sources can be used,
but will involve staging tables and
additional KMs (as per any multi-source join)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
43. IKM Hive Transform: Use Custom Shell Scripts to Integrate into Hive Table
• Gives developer the ability
to transform data
programmatically using
Python, Perl etc scripts
• Options to map output
of script to columns in
Hive table
• Useful for more
programmatic and complex
data transformations
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
44. IKM File-Hive to Oracle: Extract from Hive into Oracle Tables
• Uses Oracle Loaded for Hadoop (OLH) to process
any filtering, aggregation, transformation in Hadoop,
using MapReduce
• OLH part of Oracle Big Data Connectors (additional cost)
• High-performance loader into Oracle DB
• Optional sort by primary key, pre-partioning of data
• Can utilise the two OLH loading modes:
• JDBC or OCI direct load into Oracle
• Unload to files, Oracle DP into Oracle DB
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
45. Leveraging Hadoop with OBIEE 11g and ODI 11g
Demonstration of Integration Tasks using ODIAAH
Hadoop KMs
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
46. NoSQL Data Sources and Targets with ODI 11g
• No specific technology or driver for NoSQL databases, but can use Hive external tables
• Requires a specific “Hive Storage Handler” for key/value store sources
‣Hive feature for accessing data from other DB systems, for example MongoDB, Cassandra
‣For example, https://github.com/vilcek/HiveKVStorageHandler
• Additionally needs Hive collect_set aggregation method to aggregate results
‣Has to be defined in Languages panel in Topology
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
47. Pig, Sqoop and other Hadoop Technologies, and Hive
• Future versions of ODI might use other Hadoop technologies
‣Apache Sqoop for bulk transfer between Hadoop and RBDMSs
• Other technologies are not such an obvious fit
‣Apache Pig - the equivalent of PL/SQL for Hive’s SQL
• Commercial vendors may produce “better” versions of Hive, MapReduce etc
‣Cloudera Impala - more “real-time” version of Hive
‣MapR - solves many current issues with MapReduce, 100% Hadoop API compatibility
• Watch this space...!
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
48. Leveraging Hadoop with OBIEE 11g and ODI 11g
Part 3 : OBIEE 11g and Hadoop / Big Data Sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
49. OBIEE 11g and Hadoop/Big Data Access
• Two main scenarios for OBIEE 11g accessing “big data” sources
1. Through the data warehouse - no different to any other data provided through the DW
2. Directly - through OBIEE 11.1.1.7+ Hadoop/Hive connectivity
1
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
2
E : info@rittmanmead.com
W : www.rittmanmead.com
50. New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive
• MapReduce jobs are typically written in Java, but Hive can make this simpler
• Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
• Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
• Approach used by ODI and OBIEE to gain access to Hadoop data
• Allows Hadoop data to be accessed just like any other data source
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
51. Importing Hadoop/Hive Metadata into RPD
• HiveODBC driver has to be installed into Windows environment, so that
BI Administration tool can connect to Hive and return table metadata
• Import as ODBC datasource, change physical DB type to Apache Hadoop afterwards
• Note that OBIEE queries cannot span >1 Hive schema (no table prefixes)
2
1
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
52. Set up ODBC Connection at the OBIEE Server
• OBIEE 11.1.1.7+ ships with HiveODBC drivers, need to use 7.x versions though (only Linux
supported)
• Configure the ODBC connection in odbc.ini, name needs to match RPD ODBC name
• BI Server should then be able to connect to the Hive server, and Hadoop/MapReduce
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
bigdatalite=Oracle 7.1 Apache Hive Wire Protocol
[bigdatalite]
Driver=/u01/app/Middleware/Oracle_BI1/common/ODBC/
Merant/7.0.1/lib/ARhive27.so
Description=Oracle 7.1 Apache Hive Wire Protocol
ArraySize=16384
Database=default
DefaultLongDataBuffLen=1024
EnableLongDataBuffLen=1024
EnableDescribeParam=0
Hostname=bigdatalite
LoginTimeout=30
MaxVarcharSize=2000
PortNumber=10000
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
53. Leveraging Hadoop with OBIEE 11g and ODI 11g
Demonstration of OBIEE 11.1.1.7 accessing Hadoop
through Hive Connectivity
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
54. Dealing with Hadoop / Hive Latency Option 1 : Exalytics
• Hadoop access through Hive can be slow - due to inherent latency in Hive
• Hive queries use MapReduce in the background to query Hadoop
• Spins-up Java VM on each query
• Generates MapReduce job
• Runs and collates the answer
• Great for large, distributed queries ...
• ... but not so good for “speed-of-thought” dashboards
• So what if we could use Exalytics to speed-up Hadoop queries?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
55. Oracle Exalytics In-Memory Machine
• Engineered system, complements Oracle Exadata Database Machine (but can work standalone)
• Combination of high-end hardware (Sun x86_64 architecture, 3RU rack-mountable, 1-2TB RAM)
and optimized versions of Oracle’s BI, In-Memory Database and OLAP software
• Delivers “in-memory analytics” focusing on analysis, aggregation and UI
‣Rich, interactive dashboards with split-second response times
‣1-2TB (and now 4TB) of RAM, to run your analysis in-memory
‣Infiniband connection to Exadata and Oracle BDA
‣40 CPU cores (and now 128) to support high user numbers
‣Lower TCO through known configuration,
combined patch sets
‣Contains software features only licensable through
Exalytics package
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
56. Exalytics as the Query Performance Enhancer
Aggregates
Data Warehouse
Detail-level
Data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Exalytics
• In conjunction with a well-tuned data warehouse, Exalytics adds an in-memory analysis layer
• Based around Oracle TimesTen for Exalytics, Oracle’s In-Memory Database
• Aggregates are recommended based on query patterns, automatically created in TimesTen
• Summary Advisor makes recommendations, which adapt as queries change
• Meant to be “plug-and-play” - no need for
expensive data warehouse tuning
TimesTen
BI Server
• So can we use this for speeding-up Hadoop/Hive queries?
57. Summary Advisor for Aggregate Recommendation & Creation
• Utility within Oracle BI Administrator tool that recommends aggregates
• Bases recommendations on usage tracking and summary statistics data
• Captured based on past activity
• Runs an iterative algorithm that searches,
each iteration, for the best aggregate
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
58. Running Some Sample Hadoop / Hive Queries
• A simple Hadoop / Hive BMM was created, based off of a single Hive table
• Queries run against that BMM that requested aggregates
• Query details, and requested aggregates, go in usage tracking & summary statistics tables
• Avg. query response time = 30 secs+
select avg(T44678.age) as c1,
T44678.sales_pers as c2,
sum(T44678.age) as c3,
count(T44678.age) as c4
from
dwh_customer T44678
group by T44678.sales_pers
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
59. Generate Aggregate Recommendations using Summary Advisor
• Ensure BMM has one or more logical dimensions + 2 or more logical levels
• Ensure S_NQ_SUMMARY_ADVISOR table has aggregate recordings + level details
• Generate summary recommendations using Summary Advisor, output as nqcmd script
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
60. Implement Recommendations, Review Updated RPD
• Run generated logical SQL (Aggregate Persistence) script to create & populate TT tables
• Automatically updates RPD to “plug-in” new TimesTen aggregate tables
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
61. Re-run Reports, now with TimesTen for Exalytics Acceleration
• Reports can now be re-run to test improvements from
in-memory aggregation
• Response time is now instantaneous
• Aggregates will need to be refreshed once new data is
loaded into Hadoop
• Can also be used to improve speed of federated
RDBMS - Hadoop - OLAP queries too
‣But - relies on query caching - doesn’t make
Hadoop “faster”…
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
62. Dealing with Hadoop / Hive Latency Option 2 : Use Impala
• Hive is slow - because it’s meant to be used for batch-mode queries
• Many companies / projects are trying to improve Hive - one of which is Cloudera
• Cloudera Impala is an open-source but
commercially-sponsored in-memory MPP platform
• Replaces Hive and MapReduce in the Hadoop stack
• Can we use this, instead of Hive, to access Hadoop?
‣It will need to work with OBIEE
‣Warning - it won’t be a supported data source (yet…)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
63. How Impala Works
• A replacement for Hive, but uses Hive concepts and
data dictionary (metastore)
• MPP (Massively Parallel Processing) query engine
that runs within Hadoop
‣Uses same file formats, security,
resource management as Hadoop
• Processes queries in-memory
• Accesses standard HDFS file data
• Option to use Apache AVRO, RCFile,
LZO or Parquet (column-store)
• Designed for interactive, real-time
SQL-like access to Hadoop
BI Server
Presentation Svr
Cloudera Impala
ODBC Driver
Impala
Impala
Hadoop
Hadoop
HDFS etc
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
Impala
E : info@rittmanmead.com
W : www.rittmanmead.com
HDFS etc
Impala
Hadoop
HDFS etc
Multi-Node
Hadoop Cluster
64. Connecting OBIEE 11.1.1.7 to Cloudera Impala
• Warning - unsupported source - limited testing and no support from MOS
• Requires Cloudera Impala ODBC drivers - Windows or Linux (RHEL etc/SLES) - 32/64 bit
• ODBC Driver / DSN connection steps similar to Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
65. Importing Impala Metadata
• Import Impala tables (via the Hive metastore) into RPD
• Set database type to “Apache Hadoop”
‣Warning - don’t set ODBC type to Hadoop- leave at ODBC 2.0
‣Create physical layer keys, joins etc as normal
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
66. Importing RPD using Impala Metadata
• Create BMM layer, Presentation layer as normal
• Use “View Rows” feature to check connectivity back to Impala / Hadoop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
67. Impala / OBIEE Issue with ORDER BY Clause
• Although checking rows in the BI Administration tool worked, any query that aggregates
data in the dashboard will fail
• Issue is that Impala requires LIMIT with all ORDER BY clauses
‣OBIEE could use LIMIT, but doesn’t for Impala
at the moment (because not supported)
• Workaround - disable ORDER BY in
Database Features, have the BI Server do sorting
‣Not ideal - but it works, until Impala supported
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
68. So Does Impala Work, as a Hive Substitute?
• With ORDER BY disabled in DB features, it appears to
• But not extensively tested by me, or Oracle
• But it’s certainly interesting
• Reduces 30s, 180s queries down to 1s, 10s etc
• Impala, or one of the competitor projects
(Drill, Dremel etc) assumed to be the
real-time query replacement for Hive, in time
‣Oracle announced planned support for
Impala at OOW2013 - watch this space
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
69. Thank You for Attending!
• Thank you for attending this presentation, and more information can be found at http://
www.rittmanmead.com
• Contact us at info@rittmanmead.com or mark.rittman@rittmanmead.com
• Look out for our book, “Oracle Business Intelligence Developers Guide” out now!
• Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
70. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com