SlideShare uma empresa Scribd logo
1 de 56
Baixar para ler offline
Oracle	and	Hadoop,
let	them	talk	together	!!!
Laurent	Leturgez
Whoami
• Oracle	Consultant	since	2001	
• Former	developer	(C,	Java,	perl,	PL/SQL)
• Hadoop aficionado
• Owner@Premiseo:	Data	Management	on	Premise	and	in	the	Cloud
• Blogger	since	2004	:	http://laurent-leturgez.com
• Mail:	laurent.leturgez@premiseo.com
• Twitter	:	@lleturgez
3	Membership	Tiers
• Oracle	ACE	Director
• Oracle	ACE
• Oracle	ACE	Associate
bit.ly/OracleACEProgram
500+	Technical	Experts	
Helping	Peers	Globally
Connect:
Nominate	yourself	or	someone	you	know:	acenomination.oracle.com
@oracleace
Facebook.com/oracleaces
oracle-ace_ww@oracle.com
Hadoop	&	Oracle:	let	them	talk	together
• Agenda
• Introduction
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Introduction
• The	project (and	database usage)	approach has	changed
• 10-15	years ago …	the	« product »	approach
• New	project
• I	usually use	Oracle	or	SQLServer
• So	…	I	will use	Oracle	or	SQL	Server	for	this project
• Nowdays …	the	« solution »	approach
• New	project
• What kind of	data	my system	will store	?
• Do	I	have	expectation	regarding consistency,	
security,	sizing,	pricing/licencing etc.	?
• So	…	I	will use	the	right	tool for	the	right	job	!
Companies are	now
living	in	a	
heterogeneous World	!!!
Hadoop	:	
What	it	is,	how	it	works,	and	what	it	can	do	?
• Hadoop	is a	framework used:
• For	distributed storage (HDFS	:	Hadoop	Distributed	File	System)
• To	process	high	volumes	of	data	using MapReduce	programming model	(not	only)
• Hadoop	is open	source
• But	some enterprise distributions	exist
• Hadoop	is designed to	analyze lots	of	data
• Hadoop	is designed to	scale to	tens on	petabytes
• Hadoop	is designed to	manage	structured and	unstructured data
• Hadoop	is designed to	run	on	commodity servers
• Hadoop	is initially designed for	analytics and	batch	workloads
Hadoop	:	
What	it	is,	how	it	works,	and	what	it	can	do	?
Hadoop	:	
What	it	is,	how	it	works,	and	what	it	can	do	?
• Cluster	Architecture
• Oracle	RAC	used « Shared Everything »	Clusters
• Based to	run	on	a	large	number of	machines
• All	these servers	share memory	(as	a	global	one)	and	disks
Hadoop	:	
What	it	is,	how	it	works,	and	what	it	can	do	?
• Cluster	Architecture
• Hadoop	used « Shared Nothing »	Clusters
• Based to	run	on	a	large	number of	machines
• All	these servers	don’t share memory	nor disks
• Scaled for	high	capacity and	throughput on	commodity servers
• Designed for	distributed storage and	distributed processing !!!
Hadoop	:	
What	it	is,	how	it	works,	and	what	it	can	do	?
• One	Data,	multiple	processing engines
HDFS
PARQUET AVRO
DELIMITED	
FILES
Hive Impala
Map Reduce /	YARN
Flume Spark
ORC
Right	tool for	the	right	job	?
• Oracle
• Perfect for	OLTP	processing
• Ideal for	all	in	One	databases :	
• Structured :	tables,	constraints,	typed data	etc.
• Unstructured :	Images,	videos,	binary
• Many format:	XML,	JSON
• Sharded database
• Hadoop
• Free	and	scalable
• Many open	data	formats	(Avro,	Parquet,	Kudu	etc.)	
• Many processing tools (MapReduce,	Spark,	Kafka	etc.)
• Analytic workloads
• Design	to	manage	large	amounts of	data	quickly
How	can	I	
connect Hadoop	
to	Oracle,	Oracle	
to	Hadoop,	and	
query data	?
Which solutions	
to	exchange	data	
between Oracle	
and	Hadoop	?
How	can	I	reuse
my Oracle	data	
in	a	hadoop
workload
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop is a	tool to	move	data	between rdbms and	Hadoop	(HDFS)
• Basically,	a	tool to	run	data	export	and	import	from hadoop cluster
• Scenarios
• Enrich analytic workloads with multiple	data	sources
RDBMS	/	
Oracle
Hadoop
/	HDFS
Analytic
Workload
Results /	
HDFS
RDBMS	/	
Oracle
Unstructured
Data
Sqoop
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop is a	tool to	move	data	between rdbms and	Hadoop	(HDFS)
• Basically,	a	tool to	run	data	export	and	import	
• Scenarios	
• Offload analytic workloads on	hadoop
RDBMS	/	
Oracle
Hadoop
/	HDFS
Analytic
Workload
Results /	
HDFS
RDBMS	/	
Oracle
Sqoop
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop is a	tool to	move	data	between rdbms and	Hadoop	(HDFS)
• Basically,	a	tool to	run	data	export	and	import	
• Scenarios
• Offload analytic workloads on	hadoop and	keep data	on	hdfs
RDBMS	/	
Oracle
Hadoop
/	HDFS
Analytic
Workload
Results /	
HDFS
Sqoop
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop is a	tool to	move	data	between rdbms and	Hadoop	(HDFS)
• Basically,	a	tool to	run	data	export	and	import	
• Scenarios
• Data	archiving into Hadoop
RDBMS	/	
Oracle
Hadoop
/	HDFS
Compressed	
filesets
HDFS
Sqoop
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop import	is from RDBMS	to	Hadoop	(Example	with Hive)
$ sqoop import 
> --connect jdbc:oracle:thin:@//192.168.99.8:1521/orcl 
> --username sh --password sh 
> --table S 
> --hive-import 
> --hive-overwrite 
> --hive-table S_HIVE 
> --hive-database hive_sample 
> --split-by PROD_ID -m 4 
> --map-column-hive "TIME_ID"=timestamp
0: jdbc:hive2://hadoop1:10001/hive_sample> show create table s_hive;
+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE TABLE `s_hive`( |
| `prod_id` double, |
| `cust_id` double, |
| `time_id` timestamp, |
| `channel_id` double, |
| `promo_id` double, |
| `quantity_sold` double, |
| `amount_sold` double) |
| COMMENT 'Imported by sqoop on 2017/09/18 14:55:40' |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| WITH SERDEPROPERTIES ( |
| 'field.delim'='u0001', |
| 'line.delim'='n', |
| 'serialization.format'='u0001') |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://hadoop1.localdomain:8020/user/hive/warehouse/hive_sample.db/s_hive' |
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='true', |
| 'numFiles'='4', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='122802447', |
| 'transient_lastDdlTime'='1505739346') |
+----------------------------------------------------+--+
$ hdfs dfs -ls -C /user/hive/warehouse/hive_sample.db/s_hive*
/user/hive/warehouse/hive_sample.db/s_hive/part-m-00000
/user/hive/warehouse/hive_sample.db/s_hive/part-m-00001
/user/hive/warehouse/hive_sample.db/s_hive/part-m-00002
/user/hive/warehouse/hive_sample.db/s_hive/part-m-00003
0: jdbc:hive2://hadoop1:10001/hive_sample> select count(*) from s_hive;
+----------+--+
| _c0 |
+----------+--+
| 2756529 |
+----------+--+
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop import	is from RDBMS	to	Hadoop
• One	Oracle	session	per	mapper
• Reads are	done in	direct	path mode
• SQL	Statement can	be used to	filter data	to	import
• Results can	be stored in	various format:	delimited text,	hive,	parquet,	
compressed or	not
• Key	issue	is Data	type	conversion
• Hive Datatype mapping	(--map-column-hive "TIME_ID"=timestamp)
• Java	Datatype mapping	(--map-column-java	"ID"=Integer,	"VALUE"=String)
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop export	is from Hadoop	to	RDBMS
• Destination	table	has	to	be created first
• Direct	mode	is possible
• Two modes
• Insert	mode
• Update	mode
$ sqoop export 
> --connect jdbc:oracle:thin:@//192.168.99.8:1521/orcl
> --username sh --password sh 
> --direct 
> --table S_RESULT 
> --export-dir=/user/hive/warehouse/hive_sample.db/s_result 
> --input-fields-terminated-by '001'
SQL> select * from sh.s_result;
PROD_ID P_SUM P_MIN
---------- ---------- ----------
47 1132200.93 25.97
46 749501.85 21.23
45 1527220.89 42.09
44 889945.74 42.09
…/…
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop
• Spark	is an	Open	Source	distributed computing framework
• Fault tolerant by	design
• Can	work with various cluster	managers
• YARN
• MESOS
• Spark	Standalone
• Kubernetes (Experimental)
• Centered on	a	data	structure	called RDD	(Resilient	Distributed	Dataset)
• Based on	various components
• Spark	Core
• Spark	SQL	(Data	Abstraction)
• Spark	streaming	(Data	Ingestion)
• Spark	MLlib (Machine	Learning)
• Spark	Graph	(Graph	processing on	top	of	Spark) Spark	Core
Spark	SQL
Spark	
Streaming
Spark	
MLLib
Spark	
Graph
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop
• Resilient	Distributed	DataSets (RDD)
• Read	Only Structure
• Distributed	over	cluster’s machines
• Maintained in	a	fault tolerant Way
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop
• Evolution
• RDD,	DataFrames,	and	DataSets can	be filled from an	Oracle	Data	source
RDD
DataFrame :
RDD	+	Named
Columns
organization
DataSets :	
DataFrame
specialization
Untyped :	
DataFrame=DataS
et[Row]	or
Typed :	DataSet[T]
Best	for	Spark	SQL
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop
• Spark	API	languages
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop :	Spark	vs	MapReduce
• MR	is batch	oriented (Map then Reduce),	Spark	is Real	Time
• MR	stores	data	on	Disk,	Spark	stores	data	in	Memory
• MR	is written in	Java,	Spark	is written in	Scala
• Performance	comparison
• WordCount on	a	file	of	2Gb
• Execution time	with and	without optimization (Mapper,	Reducer,	memory,	
partitioning etc.)	à Cf.	
http://repository.stcloudstate.edu/cgi/viewcontent.cgi?article=1008&context=csit_etds
MR Spark
Without optimization 3’	53’’ 34’’
With optimization 2’	23’’ 29’’
5x.	
faster
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop and	Oracle	…	use	cases
RDBMS	/	
Oracle
Spark
Analytic
Workload
RDD	/	
DataFrames
HDFS
Unstructured
Data
Spark	/	JDBC
Other datasources
:	S3	etc.
RDBMS	/	
Oracle
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop and	Oracle	…	example (Spark	1.6	/	CDH	5.9)
scala> val sales_df = sqlContext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//192.168.99.8:1521/orcl").option("driver",
"oracle.jdbc.OracleDriver").option("dbtable", "sales").option("user", "sh").option("password", "sh").load()
sales_df: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(38,10), CUST_ID: decimal(38,10), TIME_ID: timestamp, CHANNEL_ID: decimal(38,10), PROMO_ID:
decimal(38,10), QUANTITY_SOLD: decimal(10,2), AMOUNT_SOLD: decimal(10,2)]
scala> sales_df.groupBy("PROD_ID").count.show(10,false)
+-------------+-----+
|PROD_ID |count|
+-------------+-----+
|31.0000000000|23108|
|32.0000000000|11253|
|33.0000000000|22768|
|34.0000000000|13043|
|35.0000000000|16494|
|36.0000000000|13008|
|37.0000000000|17430|
|38.0000000000|9523 |
|39.0000000000|13319|
|40.0000000000|27114|
+-------------+-----+
only showing top 10 rows
Hadoop	&	Oracle:	let	them	talk	together
• Spark	for	hadoop and	Oracle	…	example (Spark	1.6	/	CDH	5.9)
scala> val s = sqlContext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//192.168.99.8:1521/orcl").option("driver",
"oracle.jdbc.OracleDriver").option("dbtable", "sales").option("user", "sh").option("password", "sh").load()
scala> val p = sqlContext.read.format("com.databricks.spark.csv").option("delimiter",";").load("/user/laurent/products.csv")
--
scala> s.registerTempTable("sales")
scala> p.registerTempTable("products")
--
scala> val f = sqlContext.sql(""" select products.prod_name,sum(sales.amount_sold) from sales, products where sales.prod_id=products.prod_id group by
products.prod_name""")
scala> f.write.parquet("/user/laurent/spark_parquet")
$ impala-shell
[hadoop1.localdomain:21000] > create external table t
> LIKE PARQUET '/user/laurent/spark_parquet/_metadata'
> STORED AS PARQUET location '/user/laurent/spark_parquet/’;
[hadoop1.localdomain:21000] > select * from t;
+------------------------------------------------+-------------+
| prod_name | _c1 |
+------------------------------------------------+-------------+
| O/S Documentation Set - Kanji | 509073.63 |
| Keyboard Wrist Rest | 348408.98 |
| Extension Cable | 60713.47 |
| 17" LCD w/built-in HDTV Tuner | 7189171.77 |
.../...
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• ODBC	Connectors
• Cloudera	delivers ODBC	drivers	for
• Hive
• Impala
• ODBC	drivers	are	available for	:
• HortonWorks (Hive,	SparkSQL)
• MapR (Hive)
• AWS	EMR	(Hive,	Impala,	Hbase)
• Azure	HDInsight (Hive)	à Only for	Windows	Client
Hadoop	&	Oracle:	let	them	talk	together
• ODBC	Connectors
• Install	driver	on	Oracle	Host
• Configure	ODBC	on	Oracle	host
• Configure	a	heterogeneous datasource based on	ODBC
• Create a	database link using this datasource
Service	
dg4odbc
Hadoop	&	Oracle:	let	them	talk	together
• ODBC	Connectors
Lsnr
DSN	
(odbc.ini)
Driver	
Manager
Odbc Driver
Non	Oracle	
Client
Tnsnames.ora
Join
select c1,c2
from orcl_t a,
t@hive_lnk b
where a.id=b.id
and b>1000
Filter predicates
are	pushed down	
to	hive /	Impala
Hadoop	&	Oracle:	let	them	talk	together
SQL> create public database link hivedsn connect to "laurent" identified by "laurent" using 'HIVEDSN';
Database link created.
SQL> show user
USER is "SH"
SQL> select p.prod_name,sum(s."quantity_sold")
2 from products p, s_hive@hivedsn s
3 where p.prod_id=s."prod_id"
4 and s."amount_sold">1500
5 group by p.prod_name;
PROD_NAME SUM(S."QUANTITY_SOLD")
-------------------------------------------------- ----------------------
Mini DV Camcorder with 3.5" Swivel LCD 3732
Envoy Ambassador 20607
256MB Memory Card 3
Hadoop	&	Oracle:	let	them	talk	together
------------------------------------------------------------------------------------------------
SQL_ID abb5vb85sd3kt, child number 0
-------------------------------------
select p.prod_name,sum(s."quantity_sold") from products p,
s_hive@hivedsn s where p.prod_id=s."prod_id" and s."amount_sold">1500
group by p.prod_name
Plan hash value: 3779319722
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT|
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 204 (100)| | | |
| 1 | HASH GROUP BY | | 71 | 4899 | 204 (1)| 00:00:01 | | |
|* 2 | HASH JOIN | | 100 | 6900 | 203 (0)| 00:00:01 | | |
| 3 | TABLE ACCESS FULL| PRODUCTS | 72 | 2160 | 3 (0)| 00:00:01 | | |
| 4 | REMOTE | S_HIVE | 100 | 3900 | 200 (0)| 00:00:01 | HIVED~ | R->S |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."prod_id")
Remote SQL Information (identified by operation id):
----------------------------------------------------
4 - SELECT `prod_id`,`quantity_sold`,`amount_sold` FROM `S_HIVE` WHERE
`amount_sold`>1500 (accessing 'HIVEDSN' )
hive> show create table s_hive;
.../...
+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE TABLE `s_hive`( |
.../...
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='true', |
| 'numFiles'='6', |
| 'numRows'='2756529', |
| 'rawDataSize'='120045918', |
| 'totalSize'='122802447', |
| 'transient_lastDdlTime'='1505740396') |
+----------------------------------------------------+--
SQL> select count(*) from s_hive@hivedsn where "amount_sold">1500;
COUNT(*)
----------
24342
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors
• Components	available
• Oracle	Datasource for	Apache	Hadoop
• Oracle	Loader	for	Hadoop
• Oracle	SQL	Connector for	HDFS
• Oracle	R	Advanced	Analytics	for	Hadoop
• Oracle	XQuery	for	Hadoop
• Oracle	Data	Integrator Enterprise	Edition
• BigData Connectors are	licensed separately from Big	Data	Appliance	(BDA)
• BigData Connectors can	be installated on	BDA	or	any hadoop Cluster
• BigData Connectors must	be licensed for	all	processors	of	a	hadoop cluster	
• public	price:	$2000	per	Oracle	Processor
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors :	Oracle	Datasource for	Apache	Hadoop
• Available for	Hive and	Spark
• Enables	Oracle	table	as	datasource in	Hive or	Spark
• Based on	Hive external tables
• Metadata are	stored in	Hcatalog
• Data	are	located on	Oracle	Server
• Secured (Wallet and	Kerberos	integration)
• Write	data	from hive to	Oracle	is possible
• Performance
• Filter predicates are	pushed down
• Projection	Pushdown to	retrieve only required columns
• Partition	pruning enabled
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors :	Oracle	Datasource for	Apache	Hadoop
hive> create external table s_od4h (
> prod_id double,
> cust_id double,
> time_id timestamp,
> channel_id double,
> promo_id double,
> quantity_sold double,
> amount_sold double)
> STORED BY 'oracle.hcat.osh.OracleStorageHandler'
> WITH SERDEPROPERTIES (
> 'oracle.hcat.osh.columns.mapping' = 'prod_id,cust_id,time_id,channel_id,promo_id,quantity_sold,amount_sold')
> TBLPROPERTIES (
> 'mapreduce.jdbc.url' = 'jdbc:oracle:thin:@//192.168.99.8:1521/orcl',
> 'mapreduce.jdbc.input.table.name' = 'SALES',
> 'mapreduce.jdbc.username' = 'SH',
> 'mapreduce.jdbc.password' = 'sh',
> 'oracle.hcat.osh.splitterKind' = 'SINGLE_SPLITTER'
> );
hive> select count(prod_id) from s_od4h;
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors :	Oracle	Loader	for	Hadoop
• Load data	from hadoop into Oracle	table
• Java	Map Reduce application
• Online	and	offline	mode
• Need	many input	files	(XML)
• A	loadermap:	describes destination	table	(Types,	format	etc.).	
• An	input	file	description:	AVRO,	Delimited,	KV	(if	Oracle	NoSQL	file)
• An	output	file	description:	JDBC	(Online),	OCI(Online),	Delimited (Offline),	DataPump
(Offline)
• A	Database connection description	file
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors :	Oracle	Loader	for	Hadoop
• Oracle	Shell	for	Hadoop	Loaders	(OHSH)
• Set	of	declarative commands to	copy	contents	from Oracle	to	Hadoop	(Hive)
• Need	Copy	To	Hadoop	which is included in	BigData SQL	licence
$ hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader 
-D oracle.hadoop.loader.jobName=HDFSUSER_sales_sh_loadJdbc 
-D mapred.reduce.tasks=0 
-D mapred.input.dir=/user/laurent/sqoop_raw 
-D mapred.output.dir=/user/laurent/OracleLoader 
-conf /home/laurent/OL_connection.xml 
-conf /home/laurent/OL_inputFormat.xml 
-conf /home/laurent/OL_mapconf.xml 
-conf /home/laurent/OL_outputFormat.xml
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors :	Oracle	SQL	Connector for	Hadoop
• Java	MapReduce	application
• Create an	Oracle	External table	and	link it to	HDFS	files
• Same Oracle	External tables’	limitations
• No	insert,	update	or	delete
• Parallel query enable	with automatic load balancing
• Full	Scan
• Indexing is not	possible
• Two commands:	
• createTable:	create external table	and	link local	files	to	hdfs files
• publish:	refresh files	in	table	DDL
• Can	be used to	read :
• Datapump files	on	HDFS
• Delimited text files	on	HDFS
• Delimited text files	on	Hive Tables
Data	is not	
updated in	real	
time
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData Connectors :	Oracle	SQL	Connector for	Hadoop
• CreateTable
$ hadoop jar $OSCH_HOME/jlib/orahdfs.jar 
oracle.hadoop.exttab.ExternalTable 
-D oracle.hadoop.exttab.tableName=T1_EXT 
-D oracle.hadoop.exttab.sourceType=hive 
-D oracle.hadoop.exttab.hive.tableName=T1 
-D oracle.hadoop.exttab.hive.databaseName=hive_sample 
-D oracle.hadoop.exttab.defaultDirectory=SALES_HIVE_DIR 
-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//192.168.99.8:1521/ORCL 
-D oracle.hadoop.connection.user=sh 
-D oracle.hadoop.exttab.printStackTrace=true 
-createTable
CREATE TABLE "SH"."T1_EXT"
( "ID" NUMBER(*,0),
"V" VARCHAR2(4000)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "SALES_HIVE_DIR"
ACCESS PARAMETERS
( RECORDS DELIMITED BY 0X'0A'
CHARACTERSET AL32UTF8
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
FIELDS TERMINATED BY 0X'01'
MISSING FIELD VALUES ARE NULL
(
"ID" CHAR NULLIF "ID"=0X'5C4E',
"V" CHAR(4000) NULLIF "V"=0X'5C4E'
)
)
LOCATION
( 'osch-20170919025617-5707-1',
'osch-20170919025617-5707-2',
'osch-20170919025617-5707-3'
)
)
REJECT LIMIT UNLIMITED
PARALLEL
$ grep uri /data/sales_hive/osch-20170919025617-5707-1
<uri_list>
<uri_list_item size="9" compressionCodec="">
hdfs://hadoop1.localdomain:8020/user/hive/warehouse/hive_sample.db/t1/000000_0
</uri_list_item>
</uri_list>
Hadoop	&	Oracle:	let	them	talk	together
SQL_ID d9w2xvzydkdhn, child number 0
-------------------------------------
select p.prod_name,sum(s."QUANTITY_SOLD") from products p,
"SALES_HIVE_XTAB" s where p.prod_id=s."PROD_ID" and s."AMOUNT_SOLD">300
group by p.prod_name
Plan hash value: 2918324602
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 99 (100)| | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 71 | 4899 | 99 (3)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
| 3 | HASH GROUP BY | | 71 | 4899 | 99 (3)| 00:00:01 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 71 | 4899 | 99 (3)| 00:00:01 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | 71 | 4899 | 99 (3)| 00:00:01 | Q1,01 | P->P | HASH |
| 6 | HASH GROUP BY | | 71 | 4899 | 99 (3)| 00:00:01 | Q1,01 | PCWP | |
|* 7 | HASH JOIN | | 10210 | 687K| 98 (2)| 00:00:01 | Q1,01 | PCWP | |
| 8 | PX RECEIVE | | 72 | 2160 | 3 (0)| 00:00:01 | Q1,01 | PCWP | |
| 9 | PX SEND BROADCAST | :TQ10000 | 72 | 2160 | 3 (0)| 00:00:01 | Q1,00 | S->P | BROADCAST |
| 10 | PX SELECTOR | | | | | | Q1,00 | SCWC | |
| 11 | TABLE ACCESS FULL | PRODUCTS | 72 | 2160 | 3 (0)| 00:00:01 | Q1,00 | SCWP | |
| 12 | PX BLOCK ITERATOR | | 10210 | 388K| 95 (2)| 00:00:01 | Q1,01 | PCWC | |
|* 13 | EXTERNAL TABLE ACCESS FULL| SALES_HIVE_XTAB | 10210 | 388K| 95 (2)| 00:00:01 | Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("P"."PROD_ID"="S"."PROD_ID")
13 - filter("S"."AMOUNT_SOLD">300)
Note
-----
- Degree of Parallelism is 8 because of table property
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL
• Support	for	queries	against	non-relational	datasources
• Apache	Hive
• HDFS
• Oracle	NoSQL
• Apache	Hbase
• Other	NoSQL	Databases
• “Cold”	tablespaces	(and	datafiles)	storage	on	Hadoop/HDFS
• Licensing
• BigData SQL	is	licensed	separately	from	Big	Data	Appliance
• Installation	on	a	BDA	not	mandatory
• BigData SQL	is	licensed	per-disk	drive	per	hadoop cluster	
• Public	price	:	$4000	per	disk	drive
• All	disks	in	a	hadoop cluster	have	to	be	licensed
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL
• Three phases	installation
• BigDataSQL Parcel deployment (CDH)
• Database Server	bundle	configuration
• Package	deployment on	the	database Server
• For	Oracle	12.1.0.2	and	above
• Need	some patches	!!	à Oracle	Big	Data	SQL	Master	Compatibility	Matrix	(Doc	ID	
2119369.1)
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL	:	Features
• External Table	with new	Access	drivers
• ORACLE_HIVE:	Existing Hive tables.	Metadata is stored in	Hcatalog
• ORACLE_HDFS:	Create External table	directly on	HDFS.	Metadata is declared through access
parameters (mandatory)
• Smart	Scan	for	HDFS
• Oracle	External Tables	typically require Full	Scan
• BigData SQL	extends Smart	Scan	capabilities to	External Tables:	
• smaller result sets	send to	Oracle	Server
• Data	movement and	network	traffic reduced
• Storage	Indexes	(Only for	HIVE	and	HDFS	sources)
• Oracle	External Tables	cannot have	indexes
• BigData SQL	maintains SI	automatically
• Available for	=	,	<,	<=,	!=,	=>,	>,	IS	NULL	and	IS	NOT	NULL
• Predicate pushdown and	column projection
• Read	only tablespaces	on	HDFS
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL	:	example
• External Table	creation on	a	pre-existing Hive Table
SQL> CREATE TABLE "LAURENT"."ORA_TRACKS" (
2 ref1 VARCHAR2(4000),
3 ref2 VARCHAR2(4000),
4 artist VARCHAR2(4000),
5 title VARCHAR2(4000))
6 ORGANIZATION EXTERNAL
7 (TYPE ORACLE_HIVE
8 DEFAULT DIRECTORY DEFAULT_DIR
9 ACCESS PARAMETERS (
10 com.oracle.bigdata.cluster=clusterName
11 com.oracle.bigdata.tablename=hive_lolo.tracks_h)
12 )
13 PARALLEL 2 REJECT LIMIT UNLIMITED
14 /
Table created.
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL	:	example
• External Table	creation on	HDFS	files
SQL> CREATE TABLE sales_hdfs (
2 PROD_ID number,
3 CUST_ID number,
4 TIME_ID date,
5 CHANNEL_ID number,
6 PROMO_ID number,
7 AMOUNT_SOLD number
8 )
9 ORGANIZATION EXTERNAL
10 (
11 TYPE ORACLE_HDFS
12 DEFAULT DIRECTORY DEFAULT_DIR
13 ACCESS PARAMETERS (
14 com.oracle.bigdata.rowformat: DELIMITED FIELDS TERMINATED BY ","
15 com.oracle.bigdata.erroropt: [{"action":"replace", "value":"-1", "col":["PROD_ID","CUST_ID","CHANNEL_ID","PROMO_ID"]} , 
16 {"action":"reject", "col":"AMOUNT_SOLD"} , 
17 {"action":"setnull", "col":"TIME_ID"} ]
18 )
19 LOCATION ('hdfs://192.168.99.101/user/laurent/sqoop_raw/*')
20 )
21 /
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL	:	example
SQL_ID dm5u21rng1mf4, child number 0
-------------------------------------
select p.prod_name,sum(s."QUANTITY_SOLD") from products p,
laurent.sales_hdfs s where p.prod_id=s."PROD_ID" and
s."AMOUNT_SOLD">300 group by p.prod_name
Plan hash value: 4039843832
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1364 (100)| |
| 1 | HASH GROUP BY | | 71 | 4899 | 1364 (1)| 00:00:01 |
|* 2 | HASH JOIN | | 20404 | 1374K| 1363 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL | PRODUCTS | 72 | 2160 | 3 (0)| 00:00:01 |
|* 4 | EXTERNAL TABLE ACCESS STORAGE FULL| SALES_HDFS | 20404 | 777K| 1360 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")
4 - filter("S"."AMOUNT_SOLD">300)
24 rows selected.
Hadoop	&	Oracle:	let	them	talk	together
• Oracle	BigData SQL	:	Read	only Tablespace	offload
• Move	cold	data	in	a	read only tablespace	to	HDFS
• Use	a	FUSE	mount point	to	HDFS	root
SQL> select tablespace_name,STATUS from dba_tablespaces where tablespace_name='MYTBS';
TABLESPACE_NAME STATUS
------------------------------ ---------
MYTBS READ ONLY
SQL> select tablespace_name,status,file_name from dba_data_files where tablespace_name='MYTBS';
TABLESPACE_NAME STATUS FILE_NAME
------------------------------ --------- --------------------------------------------------------------------------------
MYTBS AVAILABLE /u01/app/oracle/product/12.1.0/dbhome_1/dbs/hdfs:cluster/user/oracle/cluster-oel
6.localdomain-orcl/MYTBS/mytbs01.dbf
[oracle@oel6 MYTBS]$ pwd
/u01/app/oracle/product/12.1.0/dbhome_1/dbs/hdfs:cluster/user/oracle/cluster-oel6.localdomain-orcl/MYTBS
[oracle@oel6 MYTBS]$ ls -l
total 4
lrwxrwxrwx 1 oracle oinstall 82 Sep 19 18:21 mytbs01.dbf -> /mnt/fuse-cluster-hdfs/user/oracle/cluster-oel6.localdomain-
orcl/MYTBS/mytbs01.dbf
$ hdfs dfs -ls /user/oracle/cluster-oel6.localdomain-orcl/MYTBS/mytbs01.dbf
-rw-r--r-- 3 oracle oinstall 104865792 2017-09-19 18:21 /user/oracle/cluster-oel6.localdomain-orcl/MYTBS/mytbs01.dbf
Hadoop	&	Oracle:	let	them	talk	together
• Sqoop:	import	and	export	data	between from Oracle	into Hadoop
• Spark	for	hadoop
• ODBC	Connectors
• Oracle	BigData Connectors
• Oracle	BigData SQL
• Gluent	Data	Platform
Hadoop	&	Oracle:	let	them	talk	together
• Gluent	Data	Platform
• Present Data	stored in	Hadoop	(in	various formats)	to	any compatible	rDBMS
(Oracle,	SQL	Server,	Teradata)
• Offload your data	and	your workload into hadoop
• Table	or	Contiguous partitions
• Take benefits of	a	distributed platform	(Storage	and	processing)
• Advise you which schema or	data	can	be safely offloaded to	Hadoop
Hadoop	&	Oracle:	let	them	talk	together
• Conclusion
• Hadoop	integration to	Oracle	can	help
• To	take advantages of	distributed storage and	processing
• To	optimize storage placement
• To	reduce TCOs (workload offloading,	Oracle	DataMining Option	etc.)
• Many scenarios
• Many products for	many solutions
• Many Prices
• Choose the	best	solution(s)	for	your specific problematics !
Questions	?

Mais conteúdo relacionado

Mais procurados

Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversityAlex Zeltov
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAsMaxym Kharchenko
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
Jaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema OrhianJaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema OrhianSpark Summit
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3Gwen (Chen) Shapira
 

Mais procurados (20)

Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Apache sqoop
Apache sqoopApache sqoop
Apache sqoop
 
Sqoop tutorial
Sqoop tutorialSqoop tutorial
Sqoop tutorial
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAs
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Jaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema OrhianJaws - Data Warehouse with Spark SQL by Ema Orhian
Jaws - Data Warehouse with Spark SQL by Ema Orhian
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3
 

Semelhante a Oracle hadoop let them talk together !

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDavid Mann
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesAlfredo Abate
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 

Semelhante a Oracle hadoop let them talk together ! (20)

Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 

Mais de Laurent Leturgez

Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementPython and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementLaurent Leturgez
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachLaurent Leturgez
 
Improve oracle 12c security
Improve oracle 12c securityImprove oracle 12c security
Improve oracle 12c securityLaurent Leturgez
 
Which cloud provider for your oracle database
Which cloud provider for your oracle databaseWhich cloud provider for your oracle database
Which cloud provider for your oracle databaseLaurent Leturgez
 
SIMD inside and outside Oracle 12c In Memory
SIMD inside and outside Oracle 12c In MemorySIMD inside and outside Oracle 12c In Memory
SIMD inside and outside Oracle 12c In MemoryLaurent Leturgez
 

Mais de Laurent Leturgez (6)

Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementPython and Oracle : allies for best of data management
Python and Oracle : allies for best of data management
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approach
 
Improve oracle 12c security
Improve oracle 12c securityImprove oracle 12c security
Improve oracle 12c security
 
Which cloud provider for your oracle database
Which cloud provider for your oracle databaseWhich cloud provider for your oracle database
Which cloud provider for your oracle database
 
SIMD inside and outside Oracle 12c In Memory
SIMD inside and outside Oracle 12c In MemorySIMD inside and outside Oracle 12c In Memory
SIMD inside and outside Oracle 12c In Memory
 

Último

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Último (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

Oracle hadoop let them talk together !

  • 2. Whoami • Oracle Consultant since 2001 • Former developer (C, Java, perl, PL/SQL) • Hadoop aficionado • Owner@Premiseo: Data Management on Premise and in the Cloud • Blogger since 2004 : http://laurent-leturgez.com • Mail: laurent.leturgez@premiseo.com • Twitter : @lleturgez
  • 3. 3 Membership Tiers • Oracle ACE Director • Oracle ACE • Oracle ACE Associate bit.ly/OracleACEProgram 500+ Technical Experts Helping Peers Globally Connect: Nominate yourself or someone you know: acenomination.oracle.com @oracleace Facebook.com/oracleaces oracle-ace_ww@oracle.com
  • 4. Hadoop & Oracle: let them talk together • Agenda • Introduction • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 5. Introduction • The project (and database usage) approach has changed • 10-15 years ago … the « product » approach • New project • I usually use Oracle or SQLServer • So … I will use Oracle or SQL Server for this project • Nowdays … the « solution » approach • New project • What kind of data my system will store ? • Do I have expectation regarding consistency, security, sizing, pricing/licencing etc. ? • So … I will use the right tool for the right job ! Companies are now living in a heterogeneous World !!!
  • 6. Hadoop : What it is, how it works, and what it can do ? • Hadoop is a framework used: • For distributed storage (HDFS : Hadoop Distributed File System) • To process high volumes of data using MapReduce programming model (not only) • Hadoop is open source • But some enterprise distributions exist • Hadoop is designed to analyze lots of data • Hadoop is designed to scale to tens on petabytes • Hadoop is designed to manage structured and unstructured data • Hadoop is designed to run on commodity servers • Hadoop is initially designed for analytics and batch workloads
  • 8. Hadoop : What it is, how it works, and what it can do ? • Cluster Architecture • Oracle RAC used « Shared Everything » Clusters • Based to run on a large number of machines • All these servers share memory (as a global one) and disks
  • 9. Hadoop : What it is, how it works, and what it can do ? • Cluster Architecture • Hadoop used « Shared Nothing » Clusters • Based to run on a large number of machines • All these servers don’t share memory nor disks • Scaled for high capacity and throughput on commodity servers • Designed for distributed storage and distributed processing !!!
  • 11. Right tool for the right job ? • Oracle • Perfect for OLTP processing • Ideal for all in One databases : • Structured : tables, constraints, typed data etc. • Unstructured : Images, videos, binary • Many format: XML, JSON • Sharded database • Hadoop • Free and scalable • Many open data formats (Avro, Parquet, Kudu etc.) • Many processing tools (MapReduce, Spark, Kafka etc.) • Analytic workloads • Design to manage large amounts of data quickly How can I connect Hadoop to Oracle, Oracle to Hadoop, and query data ? Which solutions to exchange data between Oracle and Hadoop ? How can I reuse my Oracle data in a hadoop workload
  • 12. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 13. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 14. Hadoop & Oracle: let them talk together • Sqoop is a tool to move data between rdbms and Hadoop (HDFS) • Basically, a tool to run data export and import from hadoop cluster • Scenarios • Enrich analytic workloads with multiple data sources RDBMS / Oracle Hadoop / HDFS Analytic Workload Results / HDFS RDBMS / Oracle Unstructured Data Sqoop
  • 15. Hadoop & Oracle: let them talk together • Sqoop is a tool to move data between rdbms and Hadoop (HDFS) • Basically, a tool to run data export and import • Scenarios • Offload analytic workloads on hadoop RDBMS / Oracle Hadoop / HDFS Analytic Workload Results / HDFS RDBMS / Oracle Sqoop
  • 16. Hadoop & Oracle: let them talk together • Sqoop is a tool to move data between rdbms and Hadoop (HDFS) • Basically, a tool to run data export and import • Scenarios • Offload analytic workloads on hadoop and keep data on hdfs RDBMS / Oracle Hadoop / HDFS Analytic Workload Results / HDFS Sqoop
  • 17. Hadoop & Oracle: let them talk together • Sqoop is a tool to move data between rdbms and Hadoop (HDFS) • Basically, a tool to run data export and import • Scenarios • Data archiving into Hadoop RDBMS / Oracle Hadoop / HDFS Compressed filesets HDFS Sqoop
  • 18. Hadoop & Oracle: let them talk together • Sqoop import is from RDBMS to Hadoop (Example with Hive) $ sqoop import > --connect jdbc:oracle:thin:@//192.168.99.8:1521/orcl > --username sh --password sh > --table S > --hive-import > --hive-overwrite > --hive-table S_HIVE > --hive-database hive_sample > --split-by PROD_ID -m 4 > --map-column-hive "TIME_ID"=timestamp 0: jdbc:hive2://hadoop1:10001/hive_sample> show create table s_hive; +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE TABLE `s_hive`( | | `prod_id` double, | | `cust_id` double, | | `time_id` timestamp, | | `channel_id` double, | | `promo_id` double, | | `quantity_sold` double, | | `amount_sold` double) | | COMMENT 'Imported by sqoop on 2017/09/18 14:55:40' | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'='u0001', | | 'line.delim'='n', | | 'serialization.format'='u0001') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://hadoop1.localdomain:8020/user/hive/warehouse/hive_sample.db/s_hive' | | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='true', | | 'numFiles'='4', | | 'numRows'='0', | | 'rawDataSize'='0', | | 'totalSize'='122802447', | | 'transient_lastDdlTime'='1505739346') | +----------------------------------------------------+--+ $ hdfs dfs -ls -C /user/hive/warehouse/hive_sample.db/s_hive* /user/hive/warehouse/hive_sample.db/s_hive/part-m-00000 /user/hive/warehouse/hive_sample.db/s_hive/part-m-00001 /user/hive/warehouse/hive_sample.db/s_hive/part-m-00002 /user/hive/warehouse/hive_sample.db/s_hive/part-m-00003 0: jdbc:hive2://hadoop1:10001/hive_sample> select count(*) from s_hive; +----------+--+ | _c0 | +----------+--+ | 2756529 | +----------+--+
  • 19. Hadoop & Oracle: let them talk together • Sqoop import is from RDBMS to Hadoop • One Oracle session per mapper • Reads are done in direct path mode • SQL Statement can be used to filter data to import • Results can be stored in various format: delimited text, hive, parquet, compressed or not • Key issue is Data type conversion • Hive Datatype mapping (--map-column-hive "TIME_ID"=timestamp) • Java Datatype mapping (--map-column-java "ID"=Integer, "VALUE"=String)
  • 20. Hadoop & Oracle: let them talk together • Sqoop export is from Hadoop to RDBMS • Destination table has to be created first • Direct mode is possible • Two modes • Insert mode • Update mode $ sqoop export > --connect jdbc:oracle:thin:@//192.168.99.8:1521/orcl > --username sh --password sh > --direct > --table S_RESULT > --export-dir=/user/hive/warehouse/hive_sample.db/s_result > --input-fields-terminated-by '001' SQL> select * from sh.s_result; PROD_ID P_SUM P_MIN ---------- ---------- ---------- 47 1132200.93 25.97 46 749501.85 21.23 45 1527220.89 42.09 44 889945.74 42.09 …/…
  • 21. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 22. Hadoop & Oracle: let them talk together • Spark for hadoop • Spark is an Open Source distributed computing framework • Fault tolerant by design • Can work with various cluster managers • YARN • MESOS • Spark Standalone • Kubernetes (Experimental) • Centered on a data structure called RDD (Resilient Distributed Dataset) • Based on various components • Spark Core • Spark SQL (Data Abstraction) • Spark streaming (Data Ingestion) • Spark MLlib (Machine Learning) • Spark Graph (Graph processing on top of Spark) Spark Core Spark SQL Spark Streaming Spark MLLib Spark Graph
  • 23. Hadoop & Oracle: let them talk together • Spark for hadoop • Resilient Distributed DataSets (RDD) • Read Only Structure • Distributed over cluster’s machines • Maintained in a fault tolerant Way
  • 24. Hadoop & Oracle: let them talk together • Spark for hadoop • Evolution • RDD, DataFrames, and DataSets can be filled from an Oracle Data source RDD DataFrame : RDD + Named Columns organization DataSets : DataFrame specialization Untyped : DataFrame=DataS et[Row] or Typed : DataSet[T] Best for Spark SQL
  • 26. Hadoop & Oracle: let them talk together • Spark for hadoop : Spark vs MapReduce • MR is batch oriented (Map then Reduce), Spark is Real Time • MR stores data on Disk, Spark stores data in Memory • MR is written in Java, Spark is written in Scala • Performance comparison • WordCount on a file of 2Gb • Execution time with and without optimization (Mapper, Reducer, memory, partitioning etc.) à Cf. http://repository.stcloudstate.edu/cgi/viewcontent.cgi?article=1008&context=csit_etds MR Spark Without optimization 3’ 53’’ 34’’ With optimization 2’ 23’’ 29’’ 5x. faster
  • 28. Hadoop & Oracle: let them talk together • Spark for hadoop and Oracle … example (Spark 1.6 / CDH 5.9) scala> val sales_df = sqlContext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//192.168.99.8:1521/orcl").option("driver", "oracle.jdbc.OracleDriver").option("dbtable", "sales").option("user", "sh").option("password", "sh").load() sales_df: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(38,10), CUST_ID: decimal(38,10), TIME_ID: timestamp, CHANNEL_ID: decimal(38,10), PROMO_ID: decimal(38,10), QUANTITY_SOLD: decimal(10,2), AMOUNT_SOLD: decimal(10,2)] scala> sales_df.groupBy("PROD_ID").count.show(10,false) +-------------+-----+ |PROD_ID |count| +-------------+-----+ |31.0000000000|23108| |32.0000000000|11253| |33.0000000000|22768| |34.0000000000|13043| |35.0000000000|16494| |36.0000000000|13008| |37.0000000000|17430| |38.0000000000|9523 | |39.0000000000|13319| |40.0000000000|27114| +-------------+-----+ only showing top 10 rows
  • 29. Hadoop & Oracle: let them talk together • Spark for hadoop and Oracle … example (Spark 1.6 / CDH 5.9) scala> val s = sqlContext.read.format("jdbc").option("url", "jdbc:oracle:thin:@//192.168.99.8:1521/orcl").option("driver", "oracle.jdbc.OracleDriver").option("dbtable", "sales").option("user", "sh").option("password", "sh").load() scala> val p = sqlContext.read.format("com.databricks.spark.csv").option("delimiter",";").load("/user/laurent/products.csv") -- scala> s.registerTempTable("sales") scala> p.registerTempTable("products") -- scala> val f = sqlContext.sql(""" select products.prod_name,sum(sales.amount_sold) from sales, products where sales.prod_id=products.prod_id group by products.prod_name""") scala> f.write.parquet("/user/laurent/spark_parquet") $ impala-shell [hadoop1.localdomain:21000] > create external table t > LIKE PARQUET '/user/laurent/spark_parquet/_metadata' > STORED AS PARQUET location '/user/laurent/spark_parquet/’; [hadoop1.localdomain:21000] > select * from t; +------------------------------------------------+-------------+ | prod_name | _c1 | +------------------------------------------------+-------------+ | O/S Documentation Set - Kanji | 509073.63 | | Keyboard Wrist Rest | 348408.98 | | Extension Cable | 60713.47 | | 17" LCD w/built-in HDTV Tuner | 7189171.77 | .../...
  • 30. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 31. Hadoop & Oracle: let them talk together • ODBC Connectors • Cloudera delivers ODBC drivers for • Hive • Impala • ODBC drivers are available for : • HortonWorks (Hive, SparkSQL) • MapR (Hive) • AWS EMR (Hive, Impala, Hbase) • Azure HDInsight (Hive) à Only for Windows Client
  • 32. Hadoop & Oracle: let them talk together • ODBC Connectors • Install driver on Oracle Host • Configure ODBC on Oracle host • Configure a heterogeneous datasource based on ODBC • Create a database link using this datasource
  • 33. Service dg4odbc Hadoop & Oracle: let them talk together • ODBC Connectors Lsnr DSN (odbc.ini) Driver Manager Odbc Driver Non Oracle Client Tnsnames.ora Join select c1,c2 from orcl_t a, t@hive_lnk b where a.id=b.id and b>1000 Filter predicates are pushed down to hive / Impala
  • 34. Hadoop & Oracle: let them talk together SQL> create public database link hivedsn connect to "laurent" identified by "laurent" using 'HIVEDSN'; Database link created. SQL> show user USER is "SH" SQL> select p.prod_name,sum(s."quantity_sold") 2 from products p, s_hive@hivedsn s 3 where p.prod_id=s."prod_id" 4 and s."amount_sold">1500 5 group by p.prod_name; PROD_NAME SUM(S."QUANTITY_SOLD") -------------------------------------------------- ---------------------- Mini DV Camcorder with 3.5" Swivel LCD 3732 Envoy Ambassador 20607 256MB Memory Card 3
  • 35. Hadoop & Oracle: let them talk together ------------------------------------------------------------------------------------------------ SQL_ID abb5vb85sd3kt, child number 0 ------------------------------------- select p.prod_name,sum(s."quantity_sold") from products p, s_hive@hivedsn s where p.prod_id=s."prod_id" and s."amount_sold">1500 group by p.prod_name Plan hash value: 3779319722 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |IN-OUT| ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 204 (100)| | | | | 1 | HASH GROUP BY | | 71 | 4899 | 204 (1)| 00:00:01 | | | |* 2 | HASH JOIN | | 100 | 6900 | 203 (0)| 00:00:01 | | | | 3 | TABLE ACCESS FULL| PRODUCTS | 72 | 2160 | 3 (0)| 00:00:01 | | | | 4 | REMOTE | S_HIVE | 100 | 3900 | 200 (0)| 00:00:01 | HIVED~ | R->S | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("P"."PROD_ID"="S"."prod_id") Remote SQL Information (identified by operation id): ---------------------------------------------------- 4 - SELECT `prod_id`,`quantity_sold`,`amount_sold` FROM `S_HIVE` WHERE `amount_sold`>1500 (accessing 'HIVEDSN' ) hive> show create table s_hive; .../... +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE TABLE `s_hive`( | .../... | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='true', | | 'numFiles'='6', | | 'numRows'='2756529', | | 'rawDataSize'='120045918', | | 'totalSize'='122802447', | | 'transient_lastDdlTime'='1505740396') | +----------------------------------------------------+-- SQL> select count(*) from s_hive@hivedsn where "amount_sold">1500; COUNT(*) ---------- 24342
  • 36. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 37. Hadoop & Oracle: let them talk together • Oracle BigData Connectors • Components available • Oracle Datasource for Apache Hadoop • Oracle Loader for Hadoop • Oracle SQL Connector for HDFS • Oracle R Advanced Analytics for Hadoop • Oracle XQuery for Hadoop • Oracle Data Integrator Enterprise Edition • BigData Connectors are licensed separately from Big Data Appliance (BDA) • BigData Connectors can be installated on BDA or any hadoop Cluster • BigData Connectors must be licensed for all processors of a hadoop cluster • public price: $2000 per Oracle Processor
  • 38. Hadoop & Oracle: let them talk together • Oracle BigData Connectors : Oracle Datasource for Apache Hadoop • Available for Hive and Spark • Enables Oracle table as datasource in Hive or Spark • Based on Hive external tables • Metadata are stored in Hcatalog • Data are located on Oracle Server • Secured (Wallet and Kerberos integration) • Write data from hive to Oracle is possible • Performance • Filter predicates are pushed down • Projection Pushdown to retrieve only required columns • Partition pruning enabled
  • 39. Hadoop & Oracle: let them talk together • Oracle BigData Connectors : Oracle Datasource for Apache Hadoop hive> create external table s_od4h ( > prod_id double, > cust_id double, > time_id timestamp, > channel_id double, > promo_id double, > quantity_sold double, > amount_sold double) > STORED BY 'oracle.hcat.osh.OracleStorageHandler' > WITH SERDEPROPERTIES ( > 'oracle.hcat.osh.columns.mapping' = 'prod_id,cust_id,time_id,channel_id,promo_id,quantity_sold,amount_sold') > TBLPROPERTIES ( > 'mapreduce.jdbc.url' = 'jdbc:oracle:thin:@//192.168.99.8:1521/orcl', > 'mapreduce.jdbc.input.table.name' = 'SALES', > 'mapreduce.jdbc.username' = 'SH', > 'mapreduce.jdbc.password' = 'sh', > 'oracle.hcat.osh.splitterKind' = 'SINGLE_SPLITTER' > ); hive> select count(prod_id) from s_od4h;
  • 40. Hadoop & Oracle: let them talk together • Oracle BigData Connectors : Oracle Loader for Hadoop • Load data from hadoop into Oracle table • Java Map Reduce application • Online and offline mode • Need many input files (XML) • A loadermap: describes destination table (Types, format etc.). • An input file description: AVRO, Delimited, KV (if Oracle NoSQL file) • An output file description: JDBC (Online), OCI(Online), Delimited (Offline), DataPump (Offline) • A Database connection description file
  • 41. Hadoop & Oracle: let them talk together • Oracle BigData Connectors : Oracle Loader for Hadoop • Oracle Shell for Hadoop Loaders (OHSH) • Set of declarative commands to copy contents from Oracle to Hadoop (Hive) • Need Copy To Hadoop which is included in BigData SQL licence $ hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -D oracle.hadoop.loader.jobName=HDFSUSER_sales_sh_loadJdbc -D mapred.reduce.tasks=0 -D mapred.input.dir=/user/laurent/sqoop_raw -D mapred.output.dir=/user/laurent/OracleLoader -conf /home/laurent/OL_connection.xml -conf /home/laurent/OL_inputFormat.xml -conf /home/laurent/OL_mapconf.xml -conf /home/laurent/OL_outputFormat.xml
  • 42. Hadoop & Oracle: let them talk together • Oracle BigData Connectors : Oracle SQL Connector for Hadoop • Java MapReduce application • Create an Oracle External table and link it to HDFS files • Same Oracle External tables’ limitations • No insert, update or delete • Parallel query enable with automatic load balancing • Full Scan • Indexing is not possible • Two commands: • createTable: create external table and link local files to hdfs files • publish: refresh files in table DDL • Can be used to read : • Datapump files on HDFS • Delimited text files on HDFS • Delimited text files on Hive Tables Data is not updated in real time
  • 43. Hadoop & Oracle: let them talk together • Oracle BigData Connectors : Oracle SQL Connector for Hadoop • CreateTable $ hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable -D oracle.hadoop.exttab.tableName=T1_EXT -D oracle.hadoop.exttab.sourceType=hive -D oracle.hadoop.exttab.hive.tableName=T1 -D oracle.hadoop.exttab.hive.databaseName=hive_sample -D oracle.hadoop.exttab.defaultDirectory=SALES_HIVE_DIR -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//192.168.99.8:1521/ORCL -D oracle.hadoop.connection.user=sh -D oracle.hadoop.exttab.printStackTrace=true -createTable CREATE TABLE "SH"."T1_EXT" ( "ID" NUMBER(*,0), "V" VARCHAR2(4000) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY "SALES_HIVE_DIR" ACCESS PARAMETERS ( RECORDS DELIMITED BY 0X'0A' CHARACTERSET AL32UTF8 PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream' FIELDS TERMINATED BY 0X'01' MISSING FIELD VALUES ARE NULL ( "ID" CHAR NULLIF "ID"=0X'5C4E', "V" CHAR(4000) NULLIF "V"=0X'5C4E' ) ) LOCATION ( 'osch-20170919025617-5707-1', 'osch-20170919025617-5707-2', 'osch-20170919025617-5707-3' ) ) REJECT LIMIT UNLIMITED PARALLEL $ grep uri /data/sales_hive/osch-20170919025617-5707-1 <uri_list> <uri_list_item size="9" compressionCodec=""> hdfs://hadoop1.localdomain:8020/user/hive/warehouse/hive_sample.db/t1/000000_0 </uri_list_item> </uri_list>
  • 44. Hadoop & Oracle: let them talk together SQL_ID d9w2xvzydkdhn, child number 0 ------------------------------------- select p.prod_name,sum(s."QUANTITY_SOLD") from products p, "SALES_HIVE_XTAB" s where p.prod_id=s."PROD_ID" and s."AMOUNT_SOLD">300 group by p.prod_name Plan hash value: 2918324602 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 99 (100)| | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 71 | 4899 | 99 (3)| 00:00:01 | Q1,02 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 71 | 4899 | 99 (3)| 00:00:01 | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 71 | 4899 | 99 (3)| 00:00:01 | Q1,02 | PCWP | | | 5 | PX SEND HASH | :TQ10001 | 71 | 4899 | 99 (3)| 00:00:01 | Q1,01 | P->P | HASH | | 6 | HASH GROUP BY | | 71 | 4899 | 99 (3)| 00:00:01 | Q1,01 | PCWP | | |* 7 | HASH JOIN | | 10210 | 687K| 98 (2)| 00:00:01 | Q1,01 | PCWP | | | 8 | PX RECEIVE | | 72 | 2160 | 3 (0)| 00:00:01 | Q1,01 | PCWP | | | 9 | PX SEND BROADCAST | :TQ10000 | 72 | 2160 | 3 (0)| 00:00:01 | Q1,00 | S->P | BROADCAST | | 10 | PX SELECTOR | | | | | | Q1,00 | SCWC | | | 11 | TABLE ACCESS FULL | PRODUCTS | 72 | 2160 | 3 (0)| 00:00:01 | Q1,00 | SCWP | | | 12 | PX BLOCK ITERATOR | | 10210 | 388K| 95 (2)| 00:00:01 | Q1,01 | PCWC | | |* 13 | EXTERNAL TABLE ACCESS FULL| SALES_HIVE_XTAB | 10210 | 388K| 95 (2)| 00:00:01 | Q1,01 | PCWP | | ----------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - access("P"."PROD_ID"="S"."PROD_ID") 13 - filter("S"."AMOUNT_SOLD">300) Note ----- - Degree of Parallelism is 8 because of table property
  • 45. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 46. Hadoop & Oracle: let them talk together • Oracle BigData SQL • Support for queries against non-relational datasources • Apache Hive • HDFS • Oracle NoSQL • Apache Hbase • Other NoSQL Databases • “Cold” tablespaces (and datafiles) storage on Hadoop/HDFS • Licensing • BigData SQL is licensed separately from Big Data Appliance • Installation on a BDA not mandatory • BigData SQL is licensed per-disk drive per hadoop cluster • Public price : $4000 per disk drive • All disks in a hadoop cluster have to be licensed
  • 47. Hadoop & Oracle: let them talk together • Oracle BigData SQL • Three phases installation • BigDataSQL Parcel deployment (CDH) • Database Server bundle configuration • Package deployment on the database Server • For Oracle 12.1.0.2 and above • Need some patches !! à Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1)
  • 48. Hadoop & Oracle: let them talk together • Oracle BigData SQL : Features • External Table with new Access drivers • ORACLE_HIVE: Existing Hive tables. Metadata is stored in Hcatalog • ORACLE_HDFS: Create External table directly on HDFS. Metadata is declared through access parameters (mandatory) • Smart Scan for HDFS • Oracle External Tables typically require Full Scan • BigData SQL extends Smart Scan capabilities to External Tables: • smaller result sets send to Oracle Server • Data movement and network traffic reduced • Storage Indexes (Only for HIVE and HDFS sources) • Oracle External Tables cannot have indexes • BigData SQL maintains SI automatically • Available for = , <, <=, !=, =>, >, IS NULL and IS NOT NULL • Predicate pushdown and column projection • Read only tablespaces on HDFS
  • 49. Hadoop & Oracle: let them talk together • Oracle BigData SQL : example • External Table creation on a pre-existing Hive Table SQL> CREATE TABLE "LAURENT"."ORA_TRACKS" ( 2 ref1 VARCHAR2(4000), 3 ref2 VARCHAR2(4000), 4 artist VARCHAR2(4000), 5 title VARCHAR2(4000)) 6 ORGANIZATION EXTERNAL 7 (TYPE ORACLE_HIVE 8 DEFAULT DIRECTORY DEFAULT_DIR 9 ACCESS PARAMETERS ( 10 com.oracle.bigdata.cluster=clusterName 11 com.oracle.bigdata.tablename=hive_lolo.tracks_h) 12 ) 13 PARALLEL 2 REJECT LIMIT UNLIMITED 14 / Table created.
  • 50. Hadoop & Oracle: let them talk together • Oracle BigData SQL : example • External Table creation on HDFS files SQL> CREATE TABLE sales_hdfs ( 2 PROD_ID number, 3 CUST_ID number, 4 TIME_ID date, 5 CHANNEL_ID number, 6 PROMO_ID number, 7 AMOUNT_SOLD number 8 ) 9 ORGANIZATION EXTERNAL 10 ( 11 TYPE ORACLE_HDFS 12 DEFAULT DIRECTORY DEFAULT_DIR 13 ACCESS PARAMETERS ( 14 com.oracle.bigdata.rowformat: DELIMITED FIELDS TERMINATED BY "," 15 com.oracle.bigdata.erroropt: [{"action":"replace", "value":"-1", "col":["PROD_ID","CUST_ID","CHANNEL_ID","PROMO_ID"]} , 16 {"action":"reject", "col":"AMOUNT_SOLD"} , 17 {"action":"setnull", "col":"TIME_ID"} ] 18 ) 19 LOCATION ('hdfs://192.168.99.101/user/laurent/sqoop_raw/*') 20 ) 21 /
  • 51. Hadoop & Oracle: let them talk together • Oracle BigData SQL : example SQL_ID dm5u21rng1mf4, child number 0 ------------------------------------- select p.prod_name,sum(s."QUANTITY_SOLD") from products p, laurent.sales_hdfs s where p.prod_id=s."PROD_ID" and s."AMOUNT_SOLD">300 group by p.prod_name Plan hash value: 4039843832 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 1364 (100)| | | 1 | HASH GROUP BY | | 71 | 4899 | 1364 (1)| 00:00:01 | |* 2 | HASH JOIN | | 20404 | 1374K| 1363 (1)| 00:00:01 | | 3 | TABLE ACCESS FULL | PRODUCTS | 72 | 2160 | 3 (0)| 00:00:01 | |* 4 | EXTERNAL TABLE ACCESS STORAGE FULL| SALES_HDFS | 20404 | 777K| 1360 (1)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("P"."PROD_ID"="S"."PROD_ID") 4 - filter("S"."AMOUNT_SOLD">300) 24 rows selected.
  • 52. Hadoop & Oracle: let them talk together • Oracle BigData SQL : Read only Tablespace offload • Move cold data in a read only tablespace to HDFS • Use a FUSE mount point to HDFS root SQL> select tablespace_name,STATUS from dba_tablespaces where tablespace_name='MYTBS'; TABLESPACE_NAME STATUS ------------------------------ --------- MYTBS READ ONLY SQL> select tablespace_name,status,file_name from dba_data_files where tablespace_name='MYTBS'; TABLESPACE_NAME STATUS FILE_NAME ------------------------------ --------- -------------------------------------------------------------------------------- MYTBS AVAILABLE /u01/app/oracle/product/12.1.0/dbhome_1/dbs/hdfs:cluster/user/oracle/cluster-oel 6.localdomain-orcl/MYTBS/mytbs01.dbf [oracle@oel6 MYTBS]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/dbs/hdfs:cluster/user/oracle/cluster-oel6.localdomain-orcl/MYTBS [oracle@oel6 MYTBS]$ ls -l total 4 lrwxrwxrwx 1 oracle oinstall 82 Sep 19 18:21 mytbs01.dbf -> /mnt/fuse-cluster-hdfs/user/oracle/cluster-oel6.localdomain- orcl/MYTBS/mytbs01.dbf $ hdfs dfs -ls /user/oracle/cluster-oel6.localdomain-orcl/MYTBS/mytbs01.dbf -rw-r--r-- 3 oracle oinstall 104865792 2017-09-19 18:21 /user/oracle/cluster-oel6.localdomain-orcl/MYTBS/mytbs01.dbf
  • 53. Hadoop & Oracle: let them talk together • Sqoop: import and export data between from Oracle into Hadoop • Spark for hadoop • ODBC Connectors • Oracle BigData Connectors • Oracle BigData SQL • Gluent Data Platform
  • 54. Hadoop & Oracle: let them talk together • Gluent Data Platform • Present Data stored in Hadoop (in various formats) to any compatible rDBMS (Oracle, SQL Server, Teradata) • Offload your data and your workload into hadoop • Table or Contiguous partitions • Take benefits of a distributed platform (Storage and processing) • Advise you which schema or data can be safely offloaded to Hadoop
  • 55. Hadoop & Oracle: let them talk together • Conclusion • Hadoop integration to Oracle can help • To take advantages of distributed storage and processing • To optimize storage placement • To reduce TCOs (workload offloading, Oracle DataMining Option etc.) • Many scenarios • Many products for many solutions • Many Prices • Choose the best solution(s) for your specific problematics !