SlideShare uma empresa Scribd logo
1 de 64
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
1
Developing a Successful
Big Data Strategy
Best Practices for Development
Raul Goycoolea S.
Solution Architect Manager
Oracle Latin America
Architecture Team
Mexico Developer Day, Apr 2014
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
3
<Insert Picture Here>
Twitter
http://twitter.com/raul_goycoolea
Raul Goycoolea Seoane
Keep in Touch
Facebook
http://www.facebook.com/raul.goycoolea
Linkedin
http://www.linkedin.com/in/raulgoy
Blog
http://blogs.oracle.com/raulgoy/
Raul Goycoolea S.
Multiprocessor Programming
3
16 February 2012
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
4
Agenda
 Introduction
 Architecture/Design Pattern
 Use Cases
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
5
Who are you?
http://goo.gl/XkwxwM
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
6
MEDIA/
ENTERTAINMENT
Viewers / advertising
effectiveness
Cross Sell
COMMUNICATIONS
Location-based
advertising
EDUCATION &
RESEARCH
Experiment
sensor analysis
Retail / CPG
Sentiment analysis
Hot products
Optimized Marketing
HEALTH CARE
Patient sensors,
monitoring, EHRs
Quality of care
LIFE
SCIENCES
Clinical trials
Genomics
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty analysis
OIL & GAS
Drilling
exploration
sensor analysis
FINANCIAL
SERVICES
Risk & portfolio analysis
New products
AUTOMOTIVE
Auto sensors
reporting
location,
problems
Games
Adjust to
player
behavior
In-Game Ads
LAW
ENFORCEMENT
& DEFENSE
Threat analysis -
social media
monitoring, photo
analysis
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer sentiment
UTILITIES
Smart Meter
analysis for
network
capacity,
Sample of Big Data Use Cases Today
ON-LINE
SERVICES /
SOCIAL
MEDIA
People & career
matching
Web-site
optimization
What is the main difference in this data?
Volume, Velocity, Variety
These Characteristics Challenge Existing
Architectures
Make
Better
Decisions
Using
Big Data
Big Data In Action
ANALYZE
DECIDE ACQUIRE
ORGANIZE
Analyze all your
data, at once
Big Data in Action
ANALYZE
DECIDE ACQUIRE
ORGANIZE
ANALYZE
Strategic Transformations
Unified View
Real-time,
predictive
Insight-driven
optimization
TO
Fragmented
View
Historical
Reporting
Results
FROM
Traditional Data Sources - Reporting
New Data Sources - Predicting
Big Data Analysis Characteristics
• Integrate
– Traditional and New data
• Explore
– More data, More sources
• Discover
– Plan, Visualize, Model, Act
Big Data Analysis In Retail: The Problem
Fashion retailer sees flat
and declining sales
No apparent differences
by geography or standard
demographics
New marketing program
didn’t help
Step 1: New Segmentation
• Analyze weblog files
– Response rates
– Frequency and duration of visits
– Shopping cart activity
– Devices used to access
• Cross reference with demographics
– Affinity program
– Online profiles
• New insight: younger, affluent women are not buying
Step 2: Sentiment Analysis
• Analyze all comments
– Social media, forums
• Cross reference with customer information
– Affinity programs
– Online activity
– Sales records
• New insight: new segment expresses “out of stock”
Step 3: Inventory Analysis
• Analyze promoted products
– No stocking problems
• Cross-reference with all shopper activities
– Online shopping cart activity
– Affinity program
– Shopper location information
– “Out of stock” comments
• Key insight: matching accessories are out of stock
Big Data Analysis In Retail: The Answer
Young women with higher
disposable income (and
smart phones) did not buy
a designer sweater when
the matching sleeveless
top was out of stock.
Exadata Exalytics
Oracle Big Data Platform
ACQUIRE ORGANIZE DECIDE
ANALYZE
Big Data
Appliance
Oracle Exadata Database Machine
• Fastest Data Warehouse & OLTP
• Best Cost/Performance Data Warehouse & OLTP
• Optimized Hardware (per rack)
• Processor: up to128 Intel Cores and 2 TB DRAM
• Network: 880 Gb/Sec Throughput
• Storage: 5 TB Flash and up to 336 TB Disk
• Software Breakthroughs
• Exadata Smart Storage Grid
• Smart Flash Cache
• Hybrid Columnar Compression
• Parallel Scale-Out Database and Storage
• Scales from ¼ Rack to 8 Full Racks
Data Warehousing, Transaction Processing, Consolidation
Oracle In-Database Analytics Platform
XML Relational OLAP Spatial
Data Layer RDF Media
Parallel Processing Engine
Oracle R
Enterprise
Oracle
Data Mining
Text and
Search
Spatial
Analytics
SQL
Analytics
Oracle
MapReduce
Oracle In-Database Analytics
New: Oracle Advanced Analytics
2 miles
Statistical
Data Mining
Text
Graph
Spatial
Semantic
Oracle Exalytics In-Memory Machine
First engineered
system for analytics
Visual Analysis
without limits
Smarter analytic
applications
End-user Experience with Exalytics
Speed of Thought Interactive Analysis
Interactive Analysis
Free Exploration
Dense Visualizations
Fully Mobile
Over 80 Analytic Applications Run on Exalytics
No application changes required
Financials, HR
Sales, marketing
Planning, forecasting
Many industries
Analyzing Big Data
• Comprehensive
• Enterprise ready
• Engineered to work together
• Optimized for extreme analytics
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
26
Oracle
Exadata
Oracle
Exalytics
Oracle Big Data Platform
Stream Acquire Organize Discover & Analyze
Oracle Big Data
Appliance
Oracle
Big Data
Connectors Optimized for
Analytics & In-Memory Workloads
“System of Record”
Optimized for DW/OLTP
Optimized for Hadoop,
R, and NoSQL Processing
Oracle Enterprise
Performance
Management
Oracle Business
Intelligence Applications
Oracle Business
Intelligence Tools
Oracle Endeca
Information Discovery
Hadoop
Open Source R
Applications
Oracle NoSQL
Database
Oracle Big Data
Connectors
Oracle Data
Integrator
Data
Warehouse
Oracle Advanced
Analytics
Oracle
Database
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
27
Use Case Introduction
 Oracle MoviePlex is an on-line movie
streaming company
 Like many other on-line stores, they needed
a cost effective approach to tackle their “big
data” challenges
 They recently implemented Oracle’s Big
Data Platform to better manage their
business, identify key opportunities and
enhance customer satisfaction
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
28
Common Big Data Challenge
 Applications are generating massive
volumes of unstructured data that
describe user behavior and application
performance
 Today, most companies are unable to
fully capitalize on this potentially valuable
information due to cost and complexity
 How do you capitalize on this raw data to
gain better insights into your customers,
enhance their user experience and
ultimately improve profitability?
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}
{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}
{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}
{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}
{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}
{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}
{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}
{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}
{"custId":1067283,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:55","recommended":null,"activity":9}
{"custId":1377537,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:58","recommended":null,"activity":9}
{"custId":1347836,"movieId":null,"genreId":null,"time":"2012-07-01:00:02:03","recommended":null,"activity":8}
{"custId":1137285,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:39","recommended":null,"activity":8}
{"custId":1354924,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:51","recommended":null,"activity":9}
{"custId":1036191,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:55","recommended":null,"activity":8}
{"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07-01:00:04:00","recommended":"Y","activity":7}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:04:03","recommended":"Y","activity":5}
{"custId":1273464,"movieId":null,"genreId":null,"time":"2012-07-01:00:04:39","recommended":null,"activity":9}
{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
29
Common Big Data Challenge
 Applications are generating massive
volumes of unstructured data that
describe user behavior and application
performance
 Today, most companies are unable to
fully capitalize on this potentially valuable
information due to cost and complexity
 How do you capitalize on this raw data to
gain better insights into your customers,
enhance their user experience and
ultimately improve profitability?
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}
{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}
{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}
{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}
{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}
{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}
{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}
{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}
{"custId":1067283,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:55","recommended":null,"activity":9}
{"custId":1377537,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:58","recommended":null,"activity":9}
{"custId":1347836,"movieId":null,"genreId":null,"time":"2012-07-01:00:02:03","recommended":null,"activity":8}
{"custId":1137285,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:39","recommended":null,"activity":8}
{"custId":1354924,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:51","recommended":null,"activity":9}
{"custId":1036191,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:55","recommended":null,"activity":8}
{"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07-
01:00:04:00","recommended":"Y","activity":7}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:04:03","recommended":"Y","activity":5}
{"custId":1273464,"movieId":null,"genreId":null,"time":"2012-07-01:00:04:39","recommended":null,"activity":9}
{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
How can you get answers to….
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
30
Derive Value from Big Data
 Make the right movie offers at the right time?
 Better understand the viewing trends of various customer
segments?
 Optimize marketing spend by targeting customers with optimal
promotional offers?
 Minimize infrastructure spend by understanding bandwidth usage
over time?
 Prepare to answer questions that you haven’t thought of yet!
How can you ….
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
31
Oracle Exadata
Oracle Big Data Appliance
MoviePlex Architecture
Application
Log
Log of all activity
on site
Capture activity nec.
for MoviePlex site
Streamed into
HDFS using
Flume
Load Recommendations
Customer Profile
(e.g. recommended
movies)
Oracle NoSQL DB
HDFS
Map Reduce
ORCH - CF Recs.
Map Reduce
Hive - Activities
Map Reduce
Pig - Sessionize
Clustering/Market Basket
Oracle Advanced
Analytics
Oracle Exalytics
Endeca
Information
Discovery
Oracle Business
Intelligence EE
“Mood”
Recommendations
Load Session & Activity Data
Oracle Big Data
Connectors
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
32
Acknowledgements
 Movie information courtesy of The Internet Movie
Database (http://www.imdb.com). Used with permission.
 Movie images provided by the TMDb API but is not
endorsed or certified by TMDb
 All customer information and session details are
completely fictitious
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
33
ANALYZE
DECIDE
ACQUIRE
ORGANIZE
DISCOVER
VISUALIZE
Oracle’s Big Data Platform
STREAM
OPERATIONALIZE
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
34
Program Agenda
 Oracle SQL Connector for HDFS
– Brief Overview
– Hands-on Exercises
 Oracle Loader for Hadoop
– Brief Overview
– Hands-on Exercises
 (Optional exercise): Use both connectors together
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
35
Loading and Accessing Data from Hadoop
SHUFFLE
/SORT
SHUFFLE
/SORT
MAP
MAP
MAP
MAP
SHUFFLE
/SORT
REDUCE
REDUCE
INPUT
2
INPUT
1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
Oracle SQL Connector
for HDFS
Oracle Loader
for Hadoop
Oracle Database
LOG FILES
REDUCE
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
36
Hadoop Oracle Database
Oracle’s Big Data Platform
Oracle Big Data
Connectors
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
37
Oracle’s Big Data Platform
ACQUIRE ORGANIZE ANALYZE
Oracle Big Data
Connectors
Hadoop
Big Data Connectors
work with
• Oracle’s
engineered
systems, and
• Other hardware
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
38
Oracle SQL Connector for
HDFS
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
39
Oracle SQL access to
Hive tables and HDFS
files
Automated generation of
external table to access
the data
Query data in-place or
load
Access or load data in
parallel
Oracle SQL Connector for HDFS
High Performance Access and Load from Hadoop with Oracle SQL
External
Table
ODCH
ODCH
OSCH
SQL Query
Hadoop Oracle Database
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
40
Part 1a: Reading Hive Tables with
Oracle SQL Connector for HDFS
 cd /home/oracle/movie/moviework/osch
– This directory contains the scripts
genloc_moviefact_hive.sh, moviefact_hive.xml
 Execute the script
 sh genloc_moviefact_hive.sh
– (the password is: welcome1)
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
41
Part 1a
 The script sh genloc_moviefact_hive.sh
hadoop jar $OSCH_HOME/jlib/orahdfs.jar 
oracle.hadoop.exttab.ExternalTable 
-conf /home/oracle/movie/moviework/osch/moviefact_hive.xml 
-createTable
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
42
Part 1
 Examine the Hadoop configuration properties
– more moviefact_hive.xml
moviefact_hive.xml
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
43
Part 1a
 Query the table
– sqlplus moviework/oracle
SQL> select count(*) from movie_fact_ext_tab_hive;
SQL> select custid from movie_fact_ext_tab_hive where
rownum < 10;
SQL> select custid, title from movie_fact_ext_tab_hive p,
movie q where p.movieid = q.movieid and
rownum < 10;
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
44
Installing Oracle SQL Connector for HDFS
External
Table
Hadoop Cluster Oracle Database System
OSCH Hadoop
Client
Hive Client
OSCH
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
45
Part 1b: Reading Text Files on HDFS with
Oracle SQL Connector for HDFS
 cd /home/oracle/movie/moviework/osch
– This directory contains the scripts
genloc_moviefact_text.sh, moviefact_text.xml
 Execute the script
 sh genloc_moviefact_text.sh
– (the password is: welcome1)
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
46
Part 1b
 The script genloc_moviefact_text.sh
hadoop jar $OSCH_HOME/jlib/orahdfs.jar 
oracle.hadoop.exttab.ExternalTable 
-conf /home/oracle/movie/moviework/osch/moviefact_text.xml 
-createTable
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
47
Part 1b
 Examine the Hadoop configuration properties
– more moviefact_file.xml
moviefact_text.xml
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
48
Performance Comparison
0
1
2
3
4
5
6
Fuse-DFS Oracle Direct Connector for
HDFS
Load rate (TB/hour)
Load speed comparison CPU usage comparison
Fuse DFS
0
20
40
60
80
100
120
140
160
180
Fuse-DFS Oracle Direct Connector
for HDFS
CPU
seconds
used
per
GB
CPU Usage
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
49
Oracle Loader for Hadoop
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
50
Oracle Loader for Hadoop
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
High Performance
Loader
Convert data into Oracle
ready data types on Hadoop
Offload data pre-processing
from the database server to
Hadoop
Load pre-processed data
online or offline
Automatically handle input
data skew
Works with a range of input
formats
Connect to the database from
reducer nodes, load into
database partitions in
parallel
Partition, sort, and convert
into Oracle data types on
Hadoop
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
51
Part 2: Oracle Loader for Hadoop
 Examine the data files on HDFS
– hadoop fs -ls /user/oracle/moviedemo/session
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
52
Part 2: Oracle Loader for Hadoop
 cd /home/oracle/movie/moviework/olh
– This directory contains all the necessary scripts
moviesession.sql, moviesession.xml,
loaderMap_moviesession.xml, runolh_session.sh
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
53
Part 2
 Create the table data will be loaded into
– sqlplus moviedemo/welcome1
SQL> @moviesession.sql
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
54
Part 2
 Submit the Oracle Loader for Hadoop MapReduce job
– sh runolh_session.sh
hadoop jar ${OLH_HOME}/jlib/oraloader.jar
oracle.hadoop.loader.OraLoader
-conf home/oracle/movie/moviework/olh/moviesession.xml
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
55
Part 2
 Examine the Hadoop configuration properties
– more moviesession.xml
moviesession.xml
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
56
Part 2
 Examine the loaderMap file
– more loaderMap_moviesession.xml
loaderMap_moviesession.xml
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
57
Installing Oracle Loader for Hadoop
Target
Table
Hadoop Cluster Oracle Database System
Hive Client
OLH
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
58
Performance Comparison
Load speed comparison CPU usage comparison
Third party products
0
0.5
1
1.5
2
2.5
Comparable third party
product
Oracle Loader for Hadoop
Load rate (TB/hour)
0
100
200
300
400
500
600
700
Comparable third party
product
Oracle Loader for Hadoop
CPU
seconds
used
per
GB
CPU Usage
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
59
Versions
 Certified Versions
– Oracle Database 11.2.0.2 and higher
– Hadoop distributions
 CDH3, CDH4 (versions of Cloudera’s Distribution including Apache Hadoop)
 Apache Hadoop 1.0.x, 1.1.1
 Should work with Hadoop distros based on certified Apache Hadoop
versions
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
60
Oracle Loader for Hadoop and Oracle SQL Connector for
HDFS
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
External
Table
ODCH
ODCH
OSCH
SQL Query
HDFS
Client
Oracle Database
ORACLE SQL CONNECTOR FOR HDFS
Offline load: Data pre-
processed and written
as Oracle Data Pump
format in HDFS.
Oracle Data Pump files in
HDFS queried (and
loaded if necessary)
with Oracle SQL
Connector of HDFS.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
61
Thank You!
http://goo.gl/XkwxwM
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
62
<Insert Picture Here>
Twitter
http://twitter.com/raul_goycoolea
Raul Goycoolea Seoane
Keep in Touch
Facebook
http://www.facebook.com/raul.goycoolea
Linkedin
http://www.linkedin.com/in/raulgoy
Blog
http://blogs.oracle.com/raulgoy/
Raul Goycoolea S.
Multiprocessor Programming
62
16 February 2012
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
63
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
64

Mais conteúdo relacionado

Mais procurados

Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansen
BigDataExpo
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Kai Wähner
 

Mais procurados (19)

Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital Transformation
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
 
What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)
 
Platfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big DataPlatfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big Data
 
Every angle jacques adriaansen
Every angle   jacques adriaansenEvery angle   jacques adriaansen
Every angle jacques adriaansen
 
The data ecosystem
The data ecosystemThe data ecosystem
The data ecosystem
 
The digital transformation of CPG and manufacturing
The digital transformation of CPG and manufacturingThe digital transformation of CPG and manufacturing
The digital transformation of CPG and manufacturing
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
[Infographic] Cloud Integration Drivers and Requirements in 2015
[Infographic] Cloud Integration Drivers and Requirements in 2015[Infographic] Cloud Integration Drivers and Requirements in 2015
[Infographic] Cloud Integration Drivers and Requirements in 2015
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 

Semelhante a Best Practices for Development Apps for Big Data

6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...
Dr. Wilfred Lin (Ph.D.)
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
Revolution Analytics
 
A7 getting value from big data how to get there quickly and leverage your c...
A7   getting value from big data how to get there quickly and leverage your c...A7   getting value from big data how to get there quickly and leverage your c...
A7 getting value from big data how to get there quickly and leverage your c...
Dr. Wilfred Lin (Ph.D.)
 

Semelhante a Best Practices for Development Apps for Big Data (20)

1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big dataConociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
Conociendo y entendiendo a tu cliente mediante monitoreo, analíticos y big data
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old Constraints
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
A7 getting value from big data how to get there quickly and leverage your c...
A7   getting value from big data how to get there quickly and leverage your c...A7   getting value from big data how to get there quickly and leverage your c...
A7 getting value from big data how to get there quickly and leverage your c...
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin   simplifying big data delivery to drive new insights finalTdwi austin   simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights final
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data Scientist
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectHow to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 

Último

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 

Último (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 

Best Practices for Development Apps for Big Data

  • 1. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 1
  • 2. Developing a Successful Big Data Strategy Best Practices for Development Raul Goycoolea S. Solution Architect Manager Oracle Latin America Architecture Team Mexico Developer Day, Apr 2014
  • 3. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 3 <Insert Picture Here> Twitter http://twitter.com/raul_goycoolea Raul Goycoolea Seoane Keep in Touch Facebook http://www.facebook.com/raul.goycoolea Linkedin http://www.linkedin.com/in/raulgoy Blog http://blogs.oracle.com/raulgoy/ Raul Goycoolea S. Multiprocessor Programming 3 16 February 2012
  • 4. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 4 Agenda  Introduction  Architecture/Design Pattern  Use Cases
  • 5. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 5 Who are you? http://goo.gl/XkwxwM
  • 6. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 6 MEDIA/ ENTERTAINMENT Viewers / advertising effectiveness Cross Sell COMMUNICATIONS Location-based advertising EDUCATION & RESEARCH Experiment sensor analysis Retail / CPG Sentiment analysis Hot products Optimized Marketing HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LIFE SCIENCES Clinical trials Genomics HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis OIL & GAS Drilling exploration sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products AUTOMOTIVE Auto sensors reporting location, problems Games Adjust to player behavior In-Game Ads LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity, Sample of Big Data Use Cases Today ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization What is the main difference in this data? Volume, Velocity, Variety These Characteristics Challenge Existing Architectures
  • 7. Make Better Decisions Using Big Data Big Data In Action ANALYZE DECIDE ACQUIRE ORGANIZE
  • 8. Analyze all your data, at once Big Data in Action ANALYZE DECIDE ACQUIRE ORGANIZE ANALYZE
  • 11. New Data Sources - Predicting
  • 12. Big Data Analysis Characteristics • Integrate – Traditional and New data • Explore – More data, More sources • Discover – Plan, Visualize, Model, Act
  • 13. Big Data Analysis In Retail: The Problem Fashion retailer sees flat and declining sales No apparent differences by geography or standard demographics New marketing program didn’t help
  • 14. Step 1: New Segmentation • Analyze weblog files – Response rates – Frequency and duration of visits – Shopping cart activity – Devices used to access • Cross reference with demographics – Affinity program – Online profiles • New insight: younger, affluent women are not buying
  • 15. Step 2: Sentiment Analysis • Analyze all comments – Social media, forums • Cross reference with customer information – Affinity programs – Online activity – Sales records • New insight: new segment expresses “out of stock”
  • 16. Step 3: Inventory Analysis • Analyze promoted products – No stocking problems • Cross-reference with all shopper activities – Online shopping cart activity – Affinity program – Shopper location information – “Out of stock” comments • Key insight: matching accessories are out of stock
  • 17. Big Data Analysis In Retail: The Answer Young women with higher disposable income (and smart phones) did not buy a designer sweater when the matching sleeveless top was out of stock.
  • 18. Exadata Exalytics Oracle Big Data Platform ACQUIRE ORGANIZE DECIDE ANALYZE Big Data Appliance
  • 19. Oracle Exadata Database Machine • Fastest Data Warehouse & OLTP • Best Cost/Performance Data Warehouse & OLTP • Optimized Hardware (per rack) • Processor: up to128 Intel Cores and 2 TB DRAM • Network: 880 Gb/Sec Throughput • Storage: 5 TB Flash and up to 336 TB Disk • Software Breakthroughs • Exadata Smart Storage Grid • Smart Flash Cache • Hybrid Columnar Compression • Parallel Scale-Out Database and Storage • Scales from ¼ Rack to 8 Full Racks Data Warehousing, Transaction Processing, Consolidation
  • 20. Oracle In-Database Analytics Platform XML Relational OLAP Spatial Data Layer RDF Media Parallel Processing Engine Oracle R Enterprise Oracle Data Mining Text and Search Spatial Analytics SQL Analytics Oracle MapReduce
  • 21. Oracle In-Database Analytics New: Oracle Advanced Analytics 2 miles Statistical Data Mining Text Graph Spatial Semantic
  • 22. Oracle Exalytics In-Memory Machine First engineered system for analytics Visual Analysis without limits Smarter analytic applications
  • 23. End-user Experience with Exalytics Speed of Thought Interactive Analysis Interactive Analysis Free Exploration Dense Visualizations Fully Mobile
  • 24. Over 80 Analytic Applications Run on Exalytics No application changes required Financials, HR Sales, marketing Planning, forecasting Many industries
  • 25. Analyzing Big Data • Comprehensive • Enterprise ready • Engineered to work together • Optimized for extreme analytics
  • 26. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 26 Oracle Exadata Oracle Exalytics Oracle Big Data Platform Stream Acquire Organize Discover & Analyze Oracle Big Data Appliance Oracle Big Data Connectors Optimized for Analytics & In-Memory Workloads “System of Record” Optimized for DW/OLTP Optimized for Hadoop, R, and NoSQL Processing Oracle Enterprise Performance Management Oracle Business Intelligence Applications Oracle Business Intelligence Tools Oracle Endeca Information Discovery Hadoop Open Source R Applications Oracle NoSQL Database Oracle Big Data Connectors Oracle Data Integrator Data Warehouse Oracle Advanced Analytics Oracle Database
  • 27. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 27 Use Case Introduction  Oracle MoviePlex is an on-line movie streaming company  Like many other on-line stores, they needed a cost effective approach to tackle their “big data” challenges  They recently implemented Oracle’s Big Data Platform to better manage their business, identify key opportunities and enhance customer satisfaction
  • 28. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 28 Common Big Data Challenge  Applications are generating massive volumes of unstructured data that describe user behavior and application performance  Today, most companies are unable to fully capitalize on this potentially valuable information due to cost and complexity  How do you capitalize on this raw data to gain better insights into your customers, enhance their user experience and ultimately improve profitability? {"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8} {"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7} {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7} {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7} {"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7} {"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7} {"custId":1067283,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:55","recommended":null,"activity":9} {"custId":1377537,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:58","recommended":null,"activity":9} {"custId":1347836,"movieId":null,"genreId":null,"time":"2012-07-01:00:02:03","recommended":null,"activity":8} {"custId":1137285,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:39","recommended":null,"activity":8} {"custId":1354924,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:51","recommended":null,"activity":9} {"custId":1036191,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:55","recommended":null,"activity":8} {"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07-01:00:04:00","recommended":"Y","activity":7} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:04:03","recommended":"Y","activity":5} {"custId":1273464,"movieId":null,"genreId":null,"time":"2012-07-01:00:04:39","recommended":null,"activity":9} {"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
  • 29. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 29 Common Big Data Challenge  Applications are generating massive volumes of unstructured data that describe user behavior and application performance  Today, most companies are unable to fully capitalize on this potentially valuable information due to cost and complexity  How do you capitalize on this raw data to gain better insights into your customers, enhance their user experience and ultimately improve profitability? {"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8} {"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7} {"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7} {"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8} {"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9} {"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7} {"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7} {"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7} {"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7} {"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7} {"custId":1067283,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:55","recommended":null,"activity":9} {"custId":1377537,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:58","recommended":null,"activity":9} {"custId":1347836,"movieId":null,"genreId":null,"time":"2012-07-01:00:02:03","recommended":null,"activity":8} {"custId":1137285,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:39","recommended":null,"activity":8} {"custId":1354924,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:51","recommended":null,"activity":9} {"custId":1036191,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:55","recommended":null,"activity":8} {"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07- 01:00:04:00","recommended":"Y","activity":7} {"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:04:03","recommended":"Y","activity":5} {"custId":1273464,"movieId":null,"genreId":null,"time":"2012-07-01:00:04:39","recommended":null,"activity":9} {"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4} How can you get answers to….
  • 30. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 30 Derive Value from Big Data  Make the right movie offers at the right time?  Better understand the viewing trends of various customer segments?  Optimize marketing spend by targeting customers with optimal promotional offers?  Minimize infrastructure spend by understanding bandwidth usage over time?  Prepare to answer questions that you haven’t thought of yet! How can you ….
  • 31. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 31 Oracle Exadata Oracle Big Data Appliance MoviePlex Architecture Application Log Log of all activity on site Capture activity nec. for MoviePlex site Streamed into HDFS using Flume Load Recommendations Customer Profile (e.g. recommended movies) Oracle NoSQL DB HDFS Map Reduce ORCH - CF Recs. Map Reduce Hive - Activities Map Reduce Pig - Sessionize Clustering/Market Basket Oracle Advanced Analytics Oracle Exalytics Endeca Information Discovery Oracle Business Intelligence EE “Mood” Recommendations Load Session & Activity Data Oracle Big Data Connectors
  • 32. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 32 Acknowledgements  Movie information courtesy of The Internet Movie Database (http://www.imdb.com). Used with permission.  Movie images provided by the TMDb API but is not endorsed or certified by TMDb  All customer information and session details are completely fictitious
  • 33. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 33 ANALYZE DECIDE ACQUIRE ORGANIZE DISCOVER VISUALIZE Oracle’s Big Data Platform STREAM OPERATIONALIZE
  • 34. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 34 Program Agenda  Oracle SQL Connector for HDFS – Brief Overview – Hands-on Exercises  Oracle Loader for Hadoop – Brief Overview – Hands-on Exercises  (Optional exercise): Use both connectors together
  • 35. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 35 Loading and Accessing Data from Hadoop SHUFFLE /SORT SHUFFLE /SORT MAP MAP MAP MAP SHUFFLE /SORT REDUCE REDUCE INPUT 2 INPUT 1 MAP MAP MAP MAP MAP REDUCE REDUCE MAP MAP MAP MAP MAP REDUCE REDUCE REDUCE Oracle SQL Connector for HDFS Oracle Loader for Hadoop Oracle Database LOG FILES REDUCE
  • 36. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 36 Hadoop Oracle Database Oracle’s Big Data Platform Oracle Big Data Connectors
  • 37. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 37 Oracle’s Big Data Platform ACQUIRE ORGANIZE ANALYZE Oracle Big Data Connectors Hadoop Big Data Connectors work with • Oracle’s engineered systems, and • Other hardware SHUFFLE /SORT SHUFFLE /SORT REDUCE REDUCE REDUCE MAP MAP MAP MAP MAP MAP REDUCE REDUCE
  • 38. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 38 Oracle SQL Connector for HDFS
  • 39. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 39 Oracle SQL access to Hive tables and HDFS files Automated generation of external table to access the data Query data in-place or load Access or load data in parallel Oracle SQL Connector for HDFS High Performance Access and Load from Hadoop with Oracle SQL External Table ODCH ODCH OSCH SQL Query Hadoop Oracle Database
  • 40. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 40 Part 1a: Reading Hive Tables with Oracle SQL Connector for HDFS  cd /home/oracle/movie/moviework/osch – This directory contains the scripts genloc_moviefact_hive.sh, moviefact_hive.xml  Execute the script  sh genloc_moviefact_hive.sh – (the password is: welcome1)
  • 41. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 41 Part 1a  The script sh genloc_moviefact_hive.sh hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -conf /home/oracle/movie/moviework/osch/moviefact_hive.xml -createTable
  • 42. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 42 Part 1  Examine the Hadoop configuration properties – more moviefact_hive.xml moviefact_hive.xml
  • 43. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 43 Part 1a  Query the table – sqlplus moviework/oracle SQL> select count(*) from movie_fact_ext_tab_hive; SQL> select custid from movie_fact_ext_tab_hive where rownum < 10; SQL> select custid, title from movie_fact_ext_tab_hive p, movie q where p.movieid = q.movieid and rownum < 10;
  • 44. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 44 Installing Oracle SQL Connector for HDFS External Table Hadoop Cluster Oracle Database System OSCH Hadoop Client Hive Client OSCH
  • 45. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 45 Part 1b: Reading Text Files on HDFS with Oracle SQL Connector for HDFS  cd /home/oracle/movie/moviework/osch – This directory contains the scripts genloc_moviefact_text.sh, moviefact_text.xml  Execute the script  sh genloc_moviefact_text.sh – (the password is: welcome1)
  • 46. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 46 Part 1b  The script genloc_moviefact_text.sh hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -conf /home/oracle/movie/moviework/osch/moviefact_text.xml -createTable
  • 47. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 47 Part 1b  Examine the Hadoop configuration properties – more moviefact_file.xml moviefact_text.xml
  • 48. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 48 Performance Comparison 0 1 2 3 4 5 6 Fuse-DFS Oracle Direct Connector for HDFS Load rate (TB/hour) Load speed comparison CPU usage comparison Fuse DFS 0 20 40 60 80 100 120 140 160 180 Fuse-DFS Oracle Direct Connector for HDFS CPU seconds used per GB CPU Usage
  • 49. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 49 Oracle Loader for Hadoop
  • 50. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 50 Oracle Loader for Hadoop SHUFFLE /SORT SHUFFLE /SORT REDUCE REDUCE REDUCE MAP MAP MAP MAP MAP MAP REDUCE REDUCE ORACLE LOADER FOR HADOOP High Performance Loader Convert data into Oracle ready data types on Hadoop Offload data pre-processing from the database server to Hadoop Load pre-processed data online or offline Automatically handle input data skew Works with a range of input formats Connect to the database from reducer nodes, load into database partitions in parallel Partition, sort, and convert into Oracle data types on Hadoop
  • 51. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 51 Part 2: Oracle Loader for Hadoop  Examine the data files on HDFS – hadoop fs -ls /user/oracle/moviedemo/session
  • 52. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 52 Part 2: Oracle Loader for Hadoop  cd /home/oracle/movie/moviework/olh – This directory contains all the necessary scripts moviesession.sql, moviesession.xml, loaderMap_moviesession.xml, runolh_session.sh
  • 53. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 53 Part 2  Create the table data will be loaded into – sqlplus moviedemo/welcome1 SQL> @moviesession.sql
  • 54. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 54 Part 2  Submit the Oracle Loader for Hadoop MapReduce job – sh runolh_session.sh hadoop jar ${OLH_HOME}/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf home/oracle/movie/moviework/olh/moviesession.xml
  • 55. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 55 Part 2  Examine the Hadoop configuration properties – more moviesession.xml moviesession.xml
  • 56. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 56 Part 2  Examine the loaderMap file – more loaderMap_moviesession.xml loaderMap_moviesession.xml
  • 57. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 57 Installing Oracle Loader for Hadoop Target Table Hadoop Cluster Oracle Database System Hive Client OLH
  • 58. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 58 Performance Comparison Load speed comparison CPU usage comparison Third party products 0 0.5 1 1.5 2 2.5 Comparable third party product Oracle Loader for Hadoop Load rate (TB/hour) 0 100 200 300 400 500 600 700 Comparable third party product Oracle Loader for Hadoop CPU seconds used per GB CPU Usage
  • 59. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 59 Versions  Certified Versions – Oracle Database 11.2.0.2 and higher – Hadoop distributions  CDH3, CDH4 (versions of Cloudera’s Distribution including Apache Hadoop)  Apache Hadoop 1.0.x, 1.1.1  Should work with Hadoop distros based on certified Apache Hadoop versions
  • 60. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 60 Oracle Loader for Hadoop and Oracle SQL Connector for HDFS SHUFFLE /SORT SHUFFLE /SORT REDUCE REDUCE REDUCE MAP MAP MAP MAP MAP MAP REDUCE REDUCE ORACLE LOADER FOR HADOOP External Table ODCH ODCH OSCH SQL Query HDFS Client Oracle Database ORACLE SQL CONNECTOR FOR HDFS Offline load: Data pre- processed and written as Oracle Data Pump format in HDFS. Oracle Data Pump files in HDFS queried (and loaded if necessary) with Oracle SQL Connector of HDFS.
  • 61. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 61 Thank You! http://goo.gl/XkwxwM
  • 62. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 62 <Insert Picture Here> Twitter http://twitter.com/raul_goycoolea Raul Goycoolea Seoane Keep in Touch Facebook http://www.facebook.com/raul.goycoolea Linkedin http://www.linkedin.com/in/raulgoy Blog http://blogs.oracle.com/raulgoy/ Raul Goycoolea S. Multiprocessor Programming 62 16 February 2012
  • 63. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 63
  • 64. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 64

Notas do Editor

  1. Start by introducing you to the platform. We’ll talk about use cases - and then zero-in on the use case that you will be working with as part of your HOLs. Frankly, across these use cases - you’ll find similarity in terms of data processing flows. We’ll review the Oracle MoviePlex design pattern/architecture.
  2. In the rest of the presentation we’ll walk through the lifecycle of big data. So Big Data is all about making better business decisions to grow revenue and lower costs. The lifecycle of big data is acquire, organize, analyze, decide.
  3. Platform consists of:Big Data Appliance to source unstructured/semi structured dataExadata to combine the data customer has now structured, alongside traditional schema-based data, running in-DB analytics on itExalytics for in-memory extreme analyticsAll connected by InfiniBand
  4. Added standalone software componentsSo to summarize … I think we have the industry’s most complete and integrated solution for acquring, organizing, and analyzing big data.If someone comes up to you and needs you to deploy big datg in a few weeks, we can help you do this. Fastest time to value.We have the software – nosql db, em cc, hadoop, data integrator for hadoop, loader for hadoop, R, BIEE.Plus we have the Big Data Appliance, Exadata, and Exalytics to provide engineered solutions for running the software.In closing, I hope this session has been informative and you can now all go back to your organizations and tell them what big data is (hi vol low value), how it can be acquired, organized, loaded into your existing dw, and analyzed to bring new value to your business.
  5. So you have BIG data. You’re running M/R on that data. You want to load or access some of that data on Oracle Database for further analytics. This is what the Oracle Big Data Connectors are for.Note that the data is transformed into a structured form before loaded/accessed by the connector.s
  6. You have seen a similar slide in other big data presentations from Oracle, outlining the different stages in a big data application.The potential treasure trove of less structured data such as weblogs, social media, email, sensors, and location data can provide a wealth of useful information for business applications. Hadoop provides a massively parallel architecture to distill desired information from huge volumes of unstructured and semi-structured content. Frequently, this data needs to be analyzed with existing data in relational databases, the platform for most commercial applications. The two sets of data needs to be combined to enable users to derive greater insights from the less structured data that is processed and stored on Hadoop clusters, using the data in relational databases.A set of technologies and utilities referred to as “connectors” are necessary to make the data on Hadoop available to the database for analysis with the data in the database. Oracle Loader for Hadoop and Oracle SQL Connector for H-D-F-S (HDFS) are two high performance connectors to load and access very large volumes of data on Hadoop. We see here the different stages in a big data solution. Oracle has engineered solutions for each of these stages: Oracle Big Data Appliance, Oracle Exadata (an engineered system for running Oracle Database), and Oracle Exalytics (engineered system for BI applications), all connected by Infiniband, the super highway that integrates Oracle’s engineered systems. Note that the Big Data Connectors work both with the engineered systems and generic Hadoop and database installations (I will be discussing specific versions later in the presentation).Platform consists of:Big Data Appliance to source unstructured/semi structured dataExadata to combine the data customer has now structured, alongside traditional schema-based data, running in-DB analytics on itExalytics for in-memory extreme analytics====================================================All connected by InfiniBand---------------Set up for today’s conversation.Know a lot about Exadata &amp; Exalytics (Oracle BI) – Been hard at work developing a key component to the Big Data Platform – the BDA. Excited to speak to you about that today.Leveraging both Oracle’s appliance expertise – and importantly we’re leveraging the advice and technology of industry experts – to develop to create an open platform. Although it’s new, it offers a solid foundation – using tech that is well tested by the biggest players in the market.And, we then took this open system and optimized it for Oracle – delivering unique capabilities that simplify connections to the rest of your Oracle ecosystem – plus delivers outstanding performance.Level set – Introduce the system – and then step thru a use case that illustrates the flow of info across the system. Highlight the optimizations along the way – things that are unique to Oracle.Platform consists of:Big Data Appliance to source unstructured/semi structured dataExadata to combine the data customer has now structured, alongside traditional schema-based data, running in-DB analytics on itExalytics for in-memory extreme analyticsAll connected by InfiniBand - a key enabler and an example of Oracle’s Superior TechnologyWithout InfiniBand ( without the super highway that integrates Oracle’s Engineered Solutions): Customers will try to squeeze all these capabilities into one box for either a Performance or Price advantage. - They will fail at bothWith InfiniBand, Customer have the right tool Optimized for the right job : The Value of integrated Oracle Solutions is greater than the sum of the parts
  7. Connectors work with Oracle’s engineered systems and also with other Hadoop distributions and Oracle databases (as long as it is a version we support)
  8. Parallelism:PQ slaves in the database will read data in parallelIf you have 64 pq slaves, 64 files will be read in parallel# of PQ slaves is limited by # of location files
  9. When OSCH is invoked with the –createTable option,the external table definition is generated, the external table is created, and the location files are populated. You can examine the location files if you like. Their contents were also displayed on screen, along with the external table definition.
  10. Interesting properties: tableName (name of the external table), sourceType, hive.tableName, hive.databaseName
  11. Let us try some queries on this external table
  12. You will see that the external table has two location files, because of the value we specified in the locationFileCount property. You can see that the URIs of the smaller data files have been grouped into one location file. OSCH does this to load balance the reading of data as much as possible. URIs in the location files are read in parallel. You can examine the location files if you like.
  13. Interesting properties: tableName (external table that will be created),sourceType, dataPaths, locationFileCount
  14. How does this perform? The alternative to OSCH, is to use Fuse-dfs. We are 5 times faster than Fuse-dfs, while using 75% less CPU.Test was performed on BDA (18 Sun x4270 M2 Servers, 216 cores, 48 GB memory per server (864 GB total)) and Exadata X2-8 single instance (8 Intel Xeon X7560 servers, 64 cores, 1TB memory)The data size used in the CPU usage graph is 0.25 TB.
  15. OLH is a MapReduce job that runs on the Hadoop clusterJob submitted to the cluster like any MapReduce jobData is read through input formatsDatabase table partitions loaded in parallel by reducer tasksOnline and offline modesOnline: Pre-process and load in the same jobOffline: Write out data files on HDFS (text or Oracle Data Pump) for load laterThe data pre-processing performs partitioning, sorting, and data conversion on Hadoop.
  16. Now let us look at Oracle Loader for Hadoop. In additional the file containing the configuration parameters, we have a loader map table that describes the columns in the target table we are loading into. If all columns in the table are loaded, and the data columns have the default date format, this file is not needed. Here the date format in the data is different from the default and is specified in the loader map file.
  17. We first create the target table in the database that we want to load data into.
  18. Themapreduce.outputformat.class specifies OCIOutputFormat. This specifies that the online load option with direct path load will be used.mapred.input.dir specifies the datapath for the data files. mapreduce.inputformat.class specifies that the data is in DelimitedTextFormat
  19. Loader Map file. Note the specification of the date format.
  20. We use 85% less CPU, and are more than ten times as fast.The data size used in the CPU usage graph is 0.25 TB.
  21. This is a big deal. We spend significant time and effort keeping up with the versions. This saves you the time to make a connector work with a Hadoopdistro you are working with.
  22. The connectors can be used together. Oracle Data Pump files can be created by Oracle Loader for Hadoop, and then accessed/loaded into Oracle Database using Oracle SQL Connector for HDFS.So if the text is not de-limited text files, the Oracle Loader for Hadoop can be first used to transform the data into data pump files (or de-limited text files), which are then loaded/accessed by Oracle SQL Connector for HDFS.This is also a good time to highlight the offline load option of OLH.