SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
Open Source Big Data in OPC
Edelweiss Kammermann
Frank MunzJava One 2017
munz & more #2
© IT Convergence 2016. All rights reserved.
© IT Convergence 2016. All rights reserved.
About Me
à Computer Engineer, BI and Data Integration Specialist
à Over 20 years of Consulting and Project Management experience in Oracle
technology.
à Co-founder and Vice President of Uruguayan Oracle User Group (UYOUG)
à Director of Community of LAOUC
à Head of BI Team CMS at ITConvergence
à Writer and frequent speaker at international conferences:
à Collaborate, OTN Tour LA, UKOUG Tech & Apps, OOW, Rittman Mead BI Forum
à Oracle ACE Director
© IT Convergence 2016. All rights reserved.
Uruguay
6
Dr. Frank Munz
•Founded munz & more in 2007
•17 years Oracle Middleware,
Cloud, and Distributed Computing
•Consulting and
High-End Training
•Wrote two Oracle WLS and
one Cloud book
#1
Hadoop
© IT Convergence 2016. All rights reserved.
What is Big Data?
à Volume: The high amount of data
à Variety: The wide range of different data formats and schemas.
Unstructured and semi-structured data
à Velocity: The speed which data is created or consumed
à Oracle added another V in this definition
à Value: Data has intrinsic value—but it must be discovered.
© IT Convergence 2016. All rights reserved.
What is Oracle Big Data Cloud Compute Edition?
à Big Data Platform that integrates Oracle Big Data solution with
Open Source tools
à Fully Elastic
à Integrated with Other Paas Services as Database Cloud Service, MySQL Cloud
Service, Event Hub Cloud Service
à Access, Data and Network Security
à REST access to all the funcitonality
© IT Convergence 2016. All rights reserved.
Big Data Cloud Service – Compute Edition (BDCS-CE)
© IT Convergence 2016. All rights reserved.
BDCS-CE Notebook: Interactive Analysis
à Apache Zeppelin Notebook (version0.7) to interactively work with data
© IT Convergence 2016. All rights reserved.
What is Hadoop?
à An open source software platform for distributed storage and
processing
à Manage huge volumes of unstructured data
à Parallel processing of large data set
à Highly scalable
à Fault-tolerant
à Two main components:
à HDFS: Hadoop Distributed File System for storing information
à MapReduce: programming framework that process information
© IT Convergence 2016. All rights reserved.
Hadoop Components: HFDS
à Stores the data on the cluster
à Namenode: block registry
à DataNode: block containers themselves (Datanode)
à HDFS cartoon by Mvarshney
© IT Convergence 2016. All rights reserved.
Hadoop Components: MapReduce
à Retrieves data from HDFS
à A MapReduce program is composed by
à Map() method: performs filtering and sorting of the <key, value> inputs
à Reduce() method: summarize the <key,value> pairs provided by the Mappers
à Code can be written in many languages (Perl, Python, Java etc)
© IT Convergence 2016. All rights reserved.
MapReduce Example
© IT Convergence 2016. All rights reserved.
Code Example
© IT Convergence 2016. All rights reserved.
Code Example
© IT Convergence 2016. All rights reserved.
#2
Hive
© IT Convergence 2016. All rights reserved.
What is Hive?
à An open source data warehouse software on top of Apache Hadoop
à Analyze and query data stored in HDFS
à Structure the data into tables
à Tools for simple ETL
à SQL- like queries (HiveQL)
à Procedural language with HPL-SQL
à Metadata storage in a RDBMS
© IT Convergence 2016. All rights reserved.
Hadoop & Hive Demo
#3
Spark
Revisited: Map Reduce I/O
munz & more #23
Source:	Hadoop	Application	Architecture	Book
Spark
• Orders of magnitude(s) faster than M/R
• Higher level Scala, Java or Python API
• Standalone, in Hadoop, or Mesos
• Principle: Run an operation on all data
-> ”Spark is the new MapReduce”
• See also: Apache Storm, etc
• Uses RDDs, or Dataframes, or Datasets
munz & more #24
https://stackoverflow.com/questions/31508083/difference-between-
dataframe-in-spark-2-0-i-e-datasetrow-and-rdd-in-spark
https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
RDDs
Resilient Distributed Datasets
Where do they come from?
Collection of data grouped into named columns.
Supports text, JSON, Apache Parquet, sequence.
Read	in	
HDFS,	Local	FS,	S3,	Hbase
Parallelize	
existing	Collection
Transform
other	RDD
->	RDDs	are	immutable
Lazy Evaluation
munz & more #26
Nothing	is	executed Execution
Transformations:
map(), flatMap(),
reduceByKey(), groupByKey()
Actions:
collect(), count(), first(), takeOrdered(),
saveAsTextFile(), …
http://spark.apache.org/docs/2.1.1/programming-guide.html
map(func) Return	a	new	distributed	dataset	formed	
by	passing	each	element	of	the	source	
through	a	function func.
flatMap(func) Similar	to	map,	but	each	input	item	can	be	
mapped	to	0	or	more	output	items	(so func
should	return	a	Seq rather	than	a	single	
item).
reduceByKey(func,	[numTasks]) When	called	on	a	dataset	of	(K,	V)	pairs,	
returns	a	dataset	of	(K,	V)	pairs	where	the	
values	for	each	key	are	aggregated	using	
the	given	reduce	function func,	which	must	
be	of	type	(V,V)	=>	V.	
groupByKey([numTasks]) When	called	on	a	dataset	of	(K,	V)	pairs,	
returns	a	dataset	of	(K,	Iterable<V>)	pairs.
Transformations
Spark Demo
munz & more #30
Apache Zeppelin Notebook
munz & more #31
Word Count and Histogram
munz & more #32
res =
t.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
res.takeOrdered(5, key = lambda x: -x[1])
Zeppelin Notebooks
munz & more #33
Big Data Compute Service CE
munz & more #34
#4
Kafka
Kafka
Partitioned, replicated commit log
munz & more #36
0 1 2 3 4 … n
Immutable	log:	Messages	with	offset
Producer
Consumer	A
Consumer	B
https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it-
is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that
Broker1
Broker2
Broker3
Topic	A	
(1)
Topic	A	
(2)
Topic	A	
(3)
Partition	/
Leader
Repl A	
(1)
Repl A	
(2)
Repl A	
(3)
Producer
Replication	/
Follower
Zoo-
keeper
Zoo-
keeper
Zoo-
keeper
State	/
HA
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
- 1 topic
- 1 partition
- Contains every article published
since 1851
- Multiple producers / consumers
Example	for	
Stream	/	Table	Duality
Kafka Clients
SDKs Connect Streams
- OOTB:	Java,	Scala
- Confluent:	Python,	C,	
C++
Confluent:
- HDFS	sink,	
- JDBC	source,
- S3	sink
- Elastic	search	sink
- Plugin	.jar	file
- JDBC:	Change	data	
capture	(CDC)
- Real-time	data	ingestion
- Microservices
- KSQL:	SQL	streaming	
engine	for	streaming	
ETL,	anomaly	detection,	
monitoring
- .jar	file	runs	anywhere
High	/	low	level	Kafka	API Configuration	only
Integrate	external	Systems
Data	in	Motion
Stream	/	Table	duality	
REST
- Language	
agnostic
- Easy	for	
mobile	apps
- Easy	to	
tunnel	
through	FW	
etc.
Lightweight
Oracle Event Hub Cloud Service
• PaaS: Managed Kafka 0.10.2
• Two deployment modes
– Basic (Broker and ZK on 1 node)
– Recommended (distributed)
• REST Proxy
– Separate sever(s) running REST Proxy
munz & more #40
Event Hub
munz & more #41
Event Hub Service
munz & more #42
Ports
You must open ports to allow access for
external clients
• Kafka Broker (from OPC connect string)
• Zookeeper with port 2181
munz & more #43
Scaling
munz & more #44
horizontal (up)vertical
Event Hub REST Interface
munz & more #45
https://129.151.91.31:1080/restproxy/topics/a12345orderTopic
Service = Topic
Interesting to Know
• Event Hub topics are prefixed with ID domain
• With Kafka CLI topics with ID Domain can be
created
• Topics without ID domain are not shown in
OPC console
46
#5
Conclusion
TL;DR #bigData #openSource #OPC
OpenSource: entry point to Oracle Big
Data world / Low(er) setup times /
Check for resource usage & limits in
Big Data OPC / BDCS-CE: managed
Hadoop, Hive, Spark + Event hub:
Kafka / Attend a hands-on workshop! /
Next level: Oracle Big Data tools
@EdelweissK
@FrankMunz
www.linkedin.com/in/frankmunz/
www.munzandmore.com/blog
facebook.com/cloudcomputingbook
facebook.com/weblogicbook
@frankmunz
youtube.com/weblogicbook
-> more than 50 web casts
Don’t be
shy J
email:	ekammermann@itconvergence.com
Twitter:	@EdelweissK
3	Membership	Tiers
• Oracle	ACE	Director
• Oracle	ACE
• Oracle	ACE	Associate
bit.ly/OracleACEProgram
500+	Technical	Experts	
Helping	Peers	Globally
Connect:
Nominate	yourself	or	someone	you	know:	acenomination.oracle.com
@oracleace
Facebook.com/oracleaces
oracle-ace_ww@oracle.com
Sign up for Free Trial
http://cloud.oracle.com

Mais conteúdo relacionado

Mais procurados

Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and cephShapeBlue
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalleybuildacloud
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesosnelsonadpresent
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerYahoo Developer Network
 
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...ClusterHQ
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
Cloud OS development
Cloud OS developmentCloud OS development
Cloud OS developmentSean Chang
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Lucas Jellema
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstackFramgia Vietnam
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalleybuildacloud
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Isaac Chiang
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 

Mais procurados (20)

Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
 
Server 2016 sneak peek
Server 2016 sneak peekServer 2016 sneak peek
Server 2016 sneak peek
 
OpenStack Cinder
OpenStack CinderOpenStack Cinder
OpenStack Cinder
 
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
Cloud OS development
Cloud OS developmentCloud OS development
Cloud OS development
 
CloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community worksCloudStack Hyderabad Meetup: How the Apache community works
CloudStack Hyderabad Meetup: How the Apache community works
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
Cloud data center and openstack
Cloud data center and openstackCloud data center and openstack
Cloud data center and openstack
 

Semelhante a Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka

The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersEdelweiss Kammermann
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologiesinside-BigData.com
 
Impala turbocharge your big data access
Impala   turbocharge your big data accessImpala   turbocharge your big data access
Impala turbocharge your big data accessOphir Cohen
 

Semelhante a Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka (20)

The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
Impala turbocharge your big data access
Impala   turbocharge your big data accessImpala   turbocharge your big data access
Impala turbocharge your big data access
 

Mais de Frank Munz

Microservices Runtimes
Microservices RuntimesMicroservices Runtimes
Microservices RuntimesFrank Munz
 
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017Frank Munz
 
Docker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSDocker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSFrank Munz
 
Oracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to KnowOracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to KnowFrank Munz
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...Frank Munz
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cFrank Munz
 
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla1512 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla15Frank Munz
 
WebLogic JMX for DevOps
WebLogic JMX for DevOpsWebLogic JMX for DevOps
WebLogic JMX for DevOpsFrank Munz
 
Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial Frank Munz
 

Mais de Frank Munz (9)

Microservices Runtimes
Microservices RuntimesMicroservices Runtimes
Microservices Runtimes
 
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
From Docker Swarm to OCCS and Wercker: Live-hacking at Oracle CODE Mexico 2017
 
Docker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSDocker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCS
 
Oracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to KnowOracle Service Bus 12c (12.2.1) What You Always Wanted to Know
Oracle Service Bus 12c (12.2.1) What You Always Wanted to Know
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
 
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla1512 Things About WebLogic 12.1.3 #oow2014 #otnla15
12 Things About WebLogic 12.1.3 #oow2014 #otnla15
 
WebLogic JMX for DevOps
WebLogic JMX for DevOpsWebLogic JMX for DevOps
WebLogic JMX for DevOps
 
Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial Oracle Service Bus (OSB) for the Busy IT Professonial
Oracle Service Bus (OSB) for the Busy IT Professonial
 

Último

20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...SUHANI PANDEY
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...SUHANI PANDEY
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 

Último (20)

20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 

Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka

  • 1. Open Source Big Data in OPC Edelweiss Kammermann Frank MunzJava One 2017
  • 3. © IT Convergence 2016. All rights reserved.
  • 4. © IT Convergence 2016. All rights reserved. About Me à Computer Engineer, BI and Data Integration Specialist à Over 20 years of Consulting and Project Management experience in Oracle technology. à Co-founder and Vice President of Uruguayan Oracle User Group (UYOUG) à Director of Community of LAOUC à Head of BI Team CMS at ITConvergence à Writer and frequent speaker at international conferences: à Collaborate, OTN Tour LA, UKOUG Tech & Apps, OOW, Rittman Mead BI Forum à Oracle ACE Director
  • 5. © IT Convergence 2016. All rights reserved. Uruguay
  • 6. 6 Dr. Frank Munz •Founded munz & more in 2007 •17 years Oracle Middleware, Cloud, and Distributed Computing •Consulting and High-End Training •Wrote two Oracle WLS and one Cloud book
  • 7.
  • 9. © IT Convergence 2016. All rights reserved. What is Big Data? à Volume: The high amount of data à Variety: The wide range of different data formats and schemas. Unstructured and semi-structured data à Velocity: The speed which data is created or consumed à Oracle added another V in this definition à Value: Data has intrinsic value—but it must be discovered.
  • 10. © IT Convergence 2016. All rights reserved. What is Oracle Big Data Cloud Compute Edition? à Big Data Platform that integrates Oracle Big Data solution with Open Source tools à Fully Elastic à Integrated with Other Paas Services as Database Cloud Service, MySQL Cloud Service, Event Hub Cloud Service à Access, Data and Network Security à REST access to all the funcitonality
  • 11. © IT Convergence 2016. All rights reserved. Big Data Cloud Service – Compute Edition (BDCS-CE)
  • 12. © IT Convergence 2016. All rights reserved. BDCS-CE Notebook: Interactive Analysis à Apache Zeppelin Notebook (version0.7) to interactively work with data
  • 13. © IT Convergence 2016. All rights reserved. What is Hadoop? à An open source software platform for distributed storage and processing à Manage huge volumes of unstructured data à Parallel processing of large data set à Highly scalable à Fault-tolerant à Two main components: à HDFS: Hadoop Distributed File System for storing information à MapReduce: programming framework that process information
  • 14. © IT Convergence 2016. All rights reserved. Hadoop Components: HFDS à Stores the data on the cluster à Namenode: block registry à DataNode: block containers themselves (Datanode) à HDFS cartoon by Mvarshney
  • 15. © IT Convergence 2016. All rights reserved. Hadoop Components: MapReduce à Retrieves data from HDFS à A MapReduce program is composed by à Map() method: performs filtering and sorting of the <key, value> inputs à Reduce() method: summarize the <key,value> pairs provided by the Mappers à Code can be written in many languages (Perl, Python, Java etc)
  • 16. © IT Convergence 2016. All rights reserved. MapReduce Example
  • 17. © IT Convergence 2016. All rights reserved. Code Example
  • 18. © IT Convergence 2016. All rights reserved. Code Example
  • 19. © IT Convergence 2016. All rights reserved. #2 Hive
  • 20. © IT Convergence 2016. All rights reserved. What is Hive? à An open source data warehouse software on top of Apache Hadoop à Analyze and query data stored in HDFS à Structure the data into tables à Tools for simple ETL à SQL- like queries (HiveQL) à Procedural language with HPL-SQL à Metadata storage in a RDBMS
  • 21. © IT Convergence 2016. All rights reserved. Hadoop & Hive Demo
  • 23. Revisited: Map Reduce I/O munz & more #23 Source: Hadoop Application Architecture Book
  • 24. Spark • Orders of magnitude(s) faster than M/R • Higher level Scala, Java or Python API • Standalone, in Hadoop, or Mesos • Principle: Run an operation on all data -> ”Spark is the new MapReduce” • See also: Apache Storm, etc • Uses RDDs, or Dataframes, or Datasets munz & more #24 https://stackoverflow.com/questions/31508083/difference-between- dataframe-in-spark-2-0-i-e-datasetrow-and-rdd-in-spark https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
  • 25. RDDs Resilient Distributed Datasets Where do they come from? Collection of data grouped into named columns. Supports text, JSON, Apache Parquet, sequence. Read in HDFS, Local FS, S3, Hbase Parallelize existing Collection Transform other RDD -> RDDs are immutable
  • 26. Lazy Evaluation munz & more #26 Nothing is executed Execution Transformations: map(), flatMap(), reduceByKey(), groupByKey() Actions: collect(), count(), first(), takeOrdered(), saveAsTextFile(), … http://spark.apache.org/docs/2.1.1/programming-guide.html
  • 27. map(func) Return a new distributed dataset formed by passing each element of the source through a function func. flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item). reduceByKey(func, [numTasks]) When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V. groupByKey([numTasks]) When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs. Transformations
  • 28.
  • 29.
  • 30. Spark Demo munz & more #30
  • 32. Word Count and Histogram munz & more #32 res = t.flatMap(lambda line: line.split(" ")) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a + b) res.takeOrdered(5, key = lambda x: -x[1])
  • 34. Big Data Compute Service CE munz & more #34
  • 36. Kafka Partitioned, replicated commit log munz & more #36 0 1 2 3 4 … n Immutable log: Messages with offset Producer Consumer A Consumer B https://www.quora.com/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it- is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that
  • 37. Broker1 Broker2 Broker3 Topic A (1) Topic A (2) Topic A (3) Partition / Leader Repl A (1) Repl A (2) Repl A (3) Producer Replication / Follower Zoo- keeper Zoo- keeper Zoo- keeper State / HA
  • 38. https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ - 1 topic - 1 partition - Contains every article published since 1851 - Multiple producers / consumers Example for Stream / Table Duality
  • 39. Kafka Clients SDKs Connect Streams - OOTB: Java, Scala - Confluent: Python, C, C++ Confluent: - HDFS sink, - JDBC source, - S3 sink - Elastic search sink - Plugin .jar file - JDBC: Change data capture (CDC) - Real-time data ingestion - Microservices - KSQL: SQL streaming engine for streaming ETL, anomaly detection, monitoring - .jar file runs anywhere High / low level Kafka API Configuration only Integrate external Systems Data in Motion Stream / Table duality REST - Language agnostic - Easy for mobile apps - Easy to tunnel through FW etc. Lightweight
  • 40. Oracle Event Hub Cloud Service • PaaS: Managed Kafka 0.10.2 • Two deployment modes – Basic (Broker and ZK on 1 node) – Recommended (distributed) • REST Proxy – Separate sever(s) running REST Proxy munz & more #40
  • 41. Event Hub munz & more #41
  • 43. Ports You must open ports to allow access for external clients • Kafka Broker (from OPC connect string) • Zookeeper with port 2181 munz & more #43
  • 44. Scaling munz & more #44 horizontal (up)vertical
  • 45. Event Hub REST Interface munz & more #45 https://129.151.91.31:1080/restproxy/topics/a12345orderTopic Service = Topic
  • 46. Interesting to Know • Event Hub topics are prefixed with ID domain • With Kafka CLI topics with ID Domain can be created • Topics without ID domain are not shown in OPC console 46
  • 48. TL;DR #bigData #openSource #OPC OpenSource: entry point to Oracle Big Data world / Low(er) setup times / Check for resource usage & limits in Big Data OPC / BDCS-CE: managed Hadoop, Hive, Spark + Event hub: Kafka / Attend a hands-on workshop! / Next level: Oracle Big Data tools @EdelweissK @FrankMunz
  • 51. 3 Membership Tiers • Oracle ACE Director • Oracle ACE • Oracle ACE Associate bit.ly/OracleACEProgram 500+ Technical Experts Helping Peers Globally Connect: Nominate yourself or someone you know: acenomination.oracle.com @oracleace Facebook.com/oracleaces oracle-ace_ww@oracle.com
  • 52. Sign up for Free Trial http://cloud.oracle.com