SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Mark Rittman, Oracle ACE Director
NEW WORLD HADOOP ARCHITECTURES (& WHAT
PROBLEMS THEY REALLY SOLVE) FOR DBAS
UKOUG DATABASE SIG MEETING
London, February 2017
•Oracle ACE Director, Independent Analyst
•Past ODTUG Exec Board Member + Oracle Scene Editor
•Author of two books on Oracle BI
•Co-founder & CTO of Rittman Mead
•15+ Years in Oracle BI, DW, ETL + now Big Data
•Host of the Drill to Detail Podcast (www.drilltodetail.com)
•Based in Brighton & work in London, UK
About The Presenter
2
BACK IN FEBRUARY
3
“Hi Mark, In things I have seen and read quite o6en people
start with a high-level overview of a product (e.g. Hadoop,
Ka@a), then describe the technical concepts (using all the
appropriate terminology) …”
“but I am usually le6 missing something. I think it's around
the area of what problems these technologies are solving
and how they are doing it? Without that context I'm finding
it all very academic”
“Many people say tradiKonal systems will sKll be
needed. Are these new technologies solving completely
different problems to those handled by tradi=onal IT?
Is there an overlap?”
•Started back in 1996 on a bank Oracle DW project
•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL and shell scripts
•Data warehouses provided a unified view of the business
•Single place to store key data and metrics
•Joined-up view of the business
•Aggregates and conformed dimensions
•ETL routines to load, cleanse and conform data
•BI tools for simple, guided access to information
•Tabular data access using SQL-generating tools
•Drill paths, hierarchies, facts, attributes
•Fast access to pre-computed aggregates
•Packaged BI for fast-start ERP analytics
20 Years in Old-school BI & Data Warehousing
5
Data Warehousing and BI at “Peak Oracle”
7
Oracle Data Management Platform as of Today
8
What Happened?
10
Let’s Go Back to 2003…
•Google needed to store and query their vast amount of server log files
•And wanted to do so using cheap, commodity hardware
•Google File System and MapReduce designed together for this use
Google File System and MapReduce
12
•GFS optimised for particular task at hand -
computing PageRank for sites
•Streaming reads for PageRank calcs, block writes for
crawler whole-site dumps
•Master node only holds metadata
•Stops client/master I/O being bottleneck, also acts as
traffic controller for clients
•Simple design, optimised for specific Google Need
•MapReduce focused on simple computations on
abstraction framework
•Select & filter (MAP) and reduce (aggregate) functions,
easily to distribute on cluster
•MapReduce abstracted cluster compute, HDFS
abstracted cluster storage
•Projects that inspired Apache Hadoop + HDFS
Google File System + MapReduce Key Innovations
13
How Traditional RDBMS Data Warehousing Scaled-Up
14
Shared-Everything	Architectures	(i.e.	
Oracle	RAC,	Exadata)
Shared-Nothing	Architectures

(e.g.	Teradata,	Netezza)
Problem #1 That Hadoop / NoSQL Solved :
Scaling Affordably
“Oracle scales infinitely and is free. Period”
•Enterprise High-End RDBMSs such as Oracle can scale
•Clustering for single-instance DBs can scale to >PB
•Exadata scales further by offloading queries to storage
•Sharded databases (e.g. Netezza) can scale further
•But cost (and complexity) become limiting factors
•Typically $1m/node is not uncommon
Cost and Complexity around Scaling DW Clusters
17
•A way of storing (non-relational) data cheaply and easily expandable
•Gave us a way of scaling beyond TB-size without paying $$$
•First use-cases were offline storage, active archive of data
Hadoop’s Original Appeal to Data Warehouse Owners
18
(c) 2013
Hadoop Ecosystem Expanded Beyond MapReduce
19
•Core Hadoop, MapReduce and HDFS
•HBase and other NoSQL Databases
•Apache Hive and SQL-on-Hadoop
•Storm, Spark and Stream Processing
•Apache YARN and Hadoop 2.0
•Solution to the problem of storing semi-structured data at-scale
•Built on Google File System
•Scale for capacity e.g., webtable
•100,000,000,000 pages,
•10 versions per page,
•20 KB / version = 20 PB of data
•Scale for throughput
•Hundreds of millions of users
•Tens of thousands to millions of queries/sec
•At low-latency with high-reliability
Google BigTable, HBase and NoSQL Databases
20
•Optimised for a particular task - fast
lookups of ts-versioned web data
•Data stored in multidimensional map keyed
on row, column + timestamp
•Master + data tablets stored on GFS cluster
nodes
•Simple key/value lookup with client doing
interpretation
•Innovation - focus on single job with
different needs to OLTP
•Formed inspiration for Apache HBase
How BigTable Scaled Beyond Traditional RDBMSs
21
•Original developed at Facebook, now foundational within Hadoop
•SQL-like language that compiles to MapReduce, Spark, HBase
•Solved the problem of enabling non-programmers to access big data
•And made Hadoop data transformation and aggregation code more productive
•JDBC and ODBC drivers for tool integration
Hive - Hadoop Discovers Set-Based Processing
22
•Hive is extensible to help with accessing and integrating new data sets
•SerDes : Serializer-Deserializers that interpret semi-structured sources
•UDFs + Hive Streaming : User-defined functions and streaming input
•File Formats : make use of compressed and/or optimised file storage
•Storage Handlers : use storage other than HDFS (e.g. MongoDB)
Apache Hive as SQL Access Engine For Everything
23
•Hadoop as low-cost ETL pre-processing engine - “ETL-offload”
•NoSQL database for landing real-time data at high speed/low latency
•Incoming data then aggregated and stored in RBDMS DW
Common Hadoop/NoSQL Use-Case (c) 2014
24
MartsData Warehouse
Σ Σ
Business
Intelligence
• Online
• Scalable
• Flexible
• Cost
Effective
Hadoop
25
Jump Ahead to 2012…
•Driven by pace of business, and user demands for more agility and control
•Traditional IT-governed data loading not always appropriate
•Not all data needed to be modelled right-away
•Not all data suited storing in tabular form
•New ways of analyzing data beyond SQL
•Graph analysis
•Machine learning
Data Warehousing and ETL Needed Some Agility
29
Problem #2 That Hadoop / NoSQL Solved :
Making Data Warehousing Agile
•Storing data in format it arrived in, and then applying schema at query time
•Suits data that may be analysed in different ways by different tools
•In addition, some datatypes may have schema embedded in file format
•Key benefit - fast arriving data of unknown value can get to users earlier
•Made possible by tools such as Apache Hive + SerDes,

Apache Drill and self-describing file formats, HDFS storage
Advent of Schema-on-Read, and Data Lakes
31
•Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage
•Flexible data storage platform with cheap storage, flexible schema support + compute
•Solves the problem of how to store new types of data + choose best time/way to process it
•Hadoop/NoSQL increasingly used for all store/transform/query tasks
Meet the New Data Warehouse : The “Data Lake”
32
Data	Transfer Data	Access
Data	Factory
Data	Reservoir
Business	
Intelligence	Tools
Hadoop	Platform
File	Based	
Integration
Stream	
Based	
Integration
Data	streams
Discovery	&	Development	Labs
Safe	&	secure	Discovery	and	Development	
environment
Data	sets	and	
samples
Models	and	
programs
Marketing	/
Sales	Applications
Models
Machine
Learning
Segments
Operational	Data
Transactions
Customer
Master	ata
Unstructured	Data
Voice	+	Chat	
Transcripts
ETL	Based
Integration
Raw	
Customer	Data
Data	stored	in	
the	original	
format	(usually	
files)		such	as	
SS7,	ASN.1,	
JSON	etc.
Mapped	
Customer	Data
Data	sets	
produced	by	
mapping	and	
transforming	
raw	data
Hadoop 2.0 and YARN

(“Yet Another Resource Negotiator”)
Key Innovation : Separating how data is stored,

from how it is processed
•Hadoop started by being synonymous with MapReduce, and Java coding
•But YARN (Yet another Resource Negotiator) broke this dependency
•Hadoop now just handles resource management
•Multiple different query engines can run against data in-place
•General-purpose (e.g. MapReduce)
•Graph processing
•Machine Learning
•Real-Time Processing
Hadoop 2.0 - Enabling Multiple Query Engines
35
Technologies Emerged to Bridge Old/New World
36
FAST FORWARD TO NOW…
37
•New generation of big data platform services from Google, Amazon, Oracle
•Combines three key innovations from earlier technologies:
•Organising of data into tables and columns (from RDBMS DWs)
•Massively-scalable and distributed storage and query (from Big Data)
•Elastically-scalable Platform-as-a-Service (from Cloud)
Elastically-Scalable Data Warehouse-as-a-Service
38
… Which Is What I’m Working On Right Now
39
Example Architecture : Google BigQuery
40
41
•On-premise Hadoop, even with simple resilient clustering, will hit limits
•Clusters can reach 5000+ nodes, need to scale-up for demand peaks etc
•Scale limits are encountered way beyond those for DWs…
•… but future is elastically-scaled, query and compute-as-a-service
What Problem Did Analytics-as-a-Service Solve?
42
Oracle	Big	Data	Cloud	Compute	Edition	
Free	$300	developer	credit	at:

https://cloud.oracle.com/en_US/tryit
•And things come full-circle … analytics
typically requires tabular data
•Google BigQuery based-on DremelX
massively-parallel query engine
•But stores data columnar and provides SQL
interface
•Solves the problem of providing DW-like
functionality at scale, as-a-service
•This is the future … ;-)
BigQuery : Big Data Meets Data Warehousing
43
Mark Rittman, Oracle ACE Director
NEW WORLD HADOOP ARCHITECTURES (& WHAT
PROBLEMS THEY REALLY SOLVE) FOR DBAS
UKOUG DATABASE SIG MEETING
London, February 2017

Mais conteúdo relacionado

Mais procurados

Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineDataWorks Summit
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 

Mais procurados (20)

Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data Pipeline
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 

Destaque

Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...
AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...
AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...Stewart Bryson
 
The New World of Enterprise Architecture
The New World of Enterprise ArchitectureThe New World of Enterprise Architecture
The New World of Enterprise ArchitectureMike Walker
 
Tugas kjd aliffio f x tkj 1 membuat hotspot di mikrotik
Tugas kjd aliffio f x tkj 1 membuat hotspot di mikrotikTugas kjd aliffio f x tkj 1 membuat hotspot di mikrotik
Tugas kjd aliffio f x tkj 1 membuat hotspot di mikrotikaliffio firmansyah
 
ACCT 505 Final Exam (2017 version)
ACCT 505 Final Exam (2017 version)ACCT 505 Final Exam (2017 version)
ACCT 505 Final Exam (2017 version)GleenBallak
 
Costo –volumen utilidad
Costo –volumen  utilidadCosto –volumen  utilidad
Costo –volumen utilidadrmendozacue
 
Kagiso - Analysis final version
Kagiso - Analysis  final versionKagiso - Analysis  final version
Kagiso - Analysis final versionDieter von Willert
 
Ell foro por Julietos
 Ell foro por Julietos Ell foro por Julietos
Ell foro por Julietosvalpar970
 
Architecture or revolution
Architecture or revolution Architecture or revolution
Architecture or revolution Nindito Nondito
 
Памятка родителям выпускников
Памятка родителям выпускниковПамятка родителям выпускников
Памятка родителям выпускниковIldar Rakhmatulin
 
Contaduría pública
Contaduría  públicaContaduría  pública
Contaduría públicajuliormm
 
Conservation agriculture in Zambia and Malawi; the opportunities and constrai...
Conservation agriculture in Zambia and Malawi; the opportunities and constrai...Conservation agriculture in Zambia and Malawi; the opportunities and constrai...
Conservation agriculture in Zambia and Malawi; the opportunities and constrai...African Conservation Tillage Network
 
Textpert Media facebook advertising presentation.pptx
Textpert Media facebook advertising presentation.pptxTextpert Media facebook advertising presentation.pptx
Textpert Media facebook advertising presentation.pptxTextpert Media
 
How to Make the Most of Google Analytics on Your Evoq Site
How to Make the Most of Google Analytics on Your Evoq SiteHow to Make the Most of Google Analytics on Your Evoq Site
How to Make the Most of Google Analytics on Your Evoq SiteDNN
 
Impact of religion on architecture
Impact of religion on architectureImpact of religion on architecture
Impact of religion on architectureAjitha Reddy
 

Destaque (20)

Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...
AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...
AgileAnalytics: Agile Real-Time BI with Oracle Business Intelligence, Oracle ...
 
The New World of Enterprise Architecture
The New World of Enterprise ArchitectureThe New World of Enterprise Architecture
The New World of Enterprise Architecture
 
Tugas kjd aliffio f x tkj 1 membuat hotspot di mikrotik
Tugas kjd aliffio f x tkj 1 membuat hotspot di mikrotikTugas kjd aliffio f x tkj 1 membuat hotspot di mikrotik
Tugas kjd aliffio f x tkj 1 membuat hotspot di mikrotik
 
SolarWinds_certification
SolarWinds_certificationSolarWinds_certification
SolarWinds_certification
 
COLLAGE
COLLAGECOLLAGE
COLLAGE
 
ACCT 505 Final Exam (2017 version)
ACCT 505 Final Exam (2017 version)ACCT 505 Final Exam (2017 version)
ACCT 505 Final Exam (2017 version)
 
Costo –volumen utilidad
Costo –volumen  utilidadCosto –volumen  utilidad
Costo –volumen utilidad
 
Kagiso - Analysis final version
Kagiso - Analysis  final versionKagiso - Analysis  final version
Kagiso - Analysis final version
 
Ell foro por Julietos
 Ell foro por Julietos Ell foro por Julietos
Ell foro por Julietos
 
Comunidades Virtuales
Comunidades VirtualesComunidades Virtuales
Comunidades Virtuales
 
Architecture or revolution
Architecture or revolution Architecture or revolution
Architecture or revolution
 
Памятка родителям выпускников
Памятка родителям выпускниковПамятка родителям выпускников
Памятка родителям выпускников
 
Contaduría pública
Contaduría  públicaContaduría  pública
Contaduría pública
 
Conservation agriculture in Zambia and Malawi; the opportunities and constrai...
Conservation agriculture in Zambia and Malawi; the opportunities and constrai...Conservation agriculture in Zambia and Malawi; the opportunities and constrai...
Conservation agriculture in Zambia and Malawi; the opportunities and constrai...
 
Textpert Media facebook advertising presentation.pptx
Textpert Media facebook advertising presentation.pptxTextpert Media facebook advertising presentation.pptx
Textpert Media facebook advertising presentation.pptx
 
How to Make the Most of Google Analytics on Your Evoq Site
How to Make the Most of Google Analytics on Your Evoq SiteHow to Make the Most of Google Analytics on Your Evoq Site
How to Make the Most of Google Analytics on Your Evoq Site
 
Impact of religion on architecture
Impact of religion on architectureImpact of religion on architecture
Impact of religion on architecture
 
Sparkflows Use Cases
Sparkflows Use CasesSparkflows Use Cases
Sparkflows Use Cases
 

Semelhante a Hadoop Architectures and the Problems They Solve

Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemZohar Elkayam
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 

Semelhante a Hadoop Architectures and the Problems They Solve (20)

Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Apache drill
Apache drillApache drill
Apache drill
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 

Mais de Rittman Analytics

From Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsFrom Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsRittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataRittman Analytics
 
User Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelUser Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelRittman Analytics
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeRittman Analytics
 
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingPlanning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingRittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataRittman Analytics
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseRittman Analytics
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeRittman Analytics
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataRittman Analytics
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionRittman Analytics
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataRittman Analytics
 
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Rittman Analytics
 
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursAnalytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursRittman Analytics
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerRittman Analytics
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
 

Mais de Rittman Analytics (16)

From Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsFrom Zero to One with Rittman Analytics
From Zero to One with Rittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
User Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelUser Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity Model
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingPlanning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
 
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursAnalytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and Looker
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 

Último

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 

Último (20)

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 

Hadoop Architectures and the Problems They Solve

  • 1. Mark Rittman, Oracle ACE Director NEW WORLD HADOOP ARCHITECTURES (& WHAT PROBLEMS THEY REALLY SOLVE) FOR DBAS UKOUG DATABASE SIG MEETING London, February 2017
  • 2. •Oracle ACE Director, Independent Analyst •Past ODTUG Exec Board Member + Oracle Scene Editor •Author of two books on Oracle BI •Co-founder & CTO of Rittman Mead •15+ Years in Oracle BI, DW, ETL + now Big Data •Host of the Drill to Detail Podcast (www.drilltodetail.com) •Based in Brighton & work in London, UK About The Presenter 2
  • 4. “Hi Mark, In things I have seen and read quite o6en people start with a high-level overview of a product (e.g. Hadoop, Ka@a), then describe the technical concepts (using all the appropriate terminology) …” “but I am usually le6 missing something. I think it's around the area of what problems these technologies are solving and how they are doing it? Without that context I'm finding it all very academic” “Many people say tradiKonal systems will sKll be needed. Are these new technologies solving completely different problems to those handled by tradi=onal IT? Is there an overlap?”
  • 5. •Started back in 1996 on a bank Oracle DW project •Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL and shell scripts •Data warehouses provided a unified view of the business •Single place to store key data and metrics •Joined-up view of the business •Aggregates and conformed dimensions •ETL routines to load, cleanse and conform data •BI tools for simple, guided access to information •Tabular data access using SQL-generating tools •Drill paths, hierarchies, facts, attributes •Fast access to pre-computed aggregates •Packaged BI for fast-start ERP analytics 20 Years in Old-school BI & Data Warehousing 5
  • 6.
  • 7. Data Warehousing and BI at “Peak Oracle” 7
  • 8. Oracle Data Management Platform as of Today 8
  • 10. 10 Let’s Go Back to 2003…
  • 11.
  • 12. •Google needed to store and query their vast amount of server log files •And wanted to do so using cheap, commodity hardware •Google File System and MapReduce designed together for this use Google File System and MapReduce 12
  • 13. •GFS optimised for particular task at hand - computing PageRank for sites •Streaming reads for PageRank calcs, block writes for crawler whole-site dumps •Master node only holds metadata •Stops client/master I/O being bottleneck, also acts as traffic controller for clients •Simple design, optimised for specific Google Need •MapReduce focused on simple computations on abstraction framework •Select & filter (MAP) and reduce (aggregate) functions, easily to distribute on cluster •MapReduce abstracted cluster compute, HDFS abstracted cluster storage •Projects that inspired Apache Hadoop + HDFS Google File System + MapReduce Key Innovations 13
  • 14. How Traditional RDBMS Data Warehousing Scaled-Up 14 Shared-Everything Architectures (i.e. Oracle RAC, Exadata) Shared-Nothing Architectures
 (e.g. Teradata, Netezza)
  • 15. Problem #1 That Hadoop / NoSQL Solved : Scaling Affordably
  • 16. “Oracle scales infinitely and is free. Period”
  • 17. •Enterprise High-End RDBMSs such as Oracle can scale •Clustering for single-instance DBs can scale to >PB •Exadata scales further by offloading queries to storage •Sharded databases (e.g. Netezza) can scale further •But cost (and complexity) become limiting factors •Typically $1m/node is not uncommon Cost and Complexity around Scaling DW Clusters 17
  • 18. •A way of storing (non-relational) data cheaply and easily expandable •Gave us a way of scaling beyond TB-size without paying $$$ •First use-cases were offline storage, active archive of data Hadoop’s Original Appeal to Data Warehouse Owners 18 (c) 2013
  • 19. Hadoop Ecosystem Expanded Beyond MapReduce 19 •Core Hadoop, MapReduce and HDFS •HBase and other NoSQL Databases •Apache Hive and SQL-on-Hadoop •Storm, Spark and Stream Processing •Apache YARN and Hadoop 2.0
  • 20. •Solution to the problem of storing semi-structured data at-scale •Built on Google File System •Scale for capacity e.g., webtable •100,000,000,000 pages, •10 versions per page, •20 KB / version = 20 PB of data •Scale for throughput •Hundreds of millions of users •Tens of thousands to millions of queries/sec •At low-latency with high-reliability Google BigTable, HBase and NoSQL Databases 20
  • 21. •Optimised for a particular task - fast lookups of ts-versioned web data •Data stored in multidimensional map keyed on row, column + timestamp •Master + data tablets stored on GFS cluster nodes •Simple key/value lookup with client doing interpretation •Innovation - focus on single job with different needs to OLTP •Formed inspiration for Apache HBase How BigTable Scaled Beyond Traditional RDBMSs 21
  • 22. •Original developed at Facebook, now foundational within Hadoop •SQL-like language that compiles to MapReduce, Spark, HBase •Solved the problem of enabling non-programmers to access big data •And made Hadoop data transformation and aggregation code more productive •JDBC and ODBC drivers for tool integration Hive - Hadoop Discovers Set-Based Processing 22
  • 23. •Hive is extensible to help with accessing and integrating new data sets •SerDes : Serializer-Deserializers that interpret semi-structured sources •UDFs + Hive Streaming : User-defined functions and streaming input •File Formats : make use of compressed and/or optimised file storage •Storage Handlers : use storage other than HDFS (e.g. MongoDB) Apache Hive as SQL Access Engine For Everything 23
  • 24. •Hadoop as low-cost ETL pre-processing engine - “ETL-offload” •NoSQL database for landing real-time data at high speed/low latency •Incoming data then aggregated and stored in RBDMS DW Common Hadoop/NoSQL Use-Case (c) 2014 24 MartsData Warehouse Σ Σ Business Intelligence • Online • Scalable • Flexible • Cost Effective Hadoop
  • 25. 25 Jump Ahead to 2012…
  • 26.
  • 27.
  • 28.
  • 29. •Driven by pace of business, and user demands for more agility and control •Traditional IT-governed data loading not always appropriate •Not all data needed to be modelled right-away •Not all data suited storing in tabular form •New ways of analyzing data beyond SQL •Graph analysis •Machine learning Data Warehousing and ETL Needed Some Agility 29
  • 30. Problem #2 That Hadoop / NoSQL Solved : Making Data Warehousing Agile
  • 31. •Storing data in format it arrived in, and then applying schema at query time •Suits data that may be analysed in different ways by different tools •In addition, some datatypes may have schema embedded in file format •Key benefit - fast arriving data of unknown value can get to users earlier •Made possible by tools such as Apache Hive + SerDes,
 Apache Drill and self-describing file formats, HDFS storage Advent of Schema-on-Read, and Data Lakes 31
  • 32. •Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage •Flexible data storage platform with cheap storage, flexible schema support + compute •Solves the problem of how to store new types of data + choose best time/way to process it •Hadoop/NoSQL increasingly used for all store/transform/query tasks Meet the New Data Warehouse : The “Data Lake” 32 Data Transfer Data Access Data Factory Data Reservoir Business Intelligence Tools Hadoop Platform File Based Integration Stream Based Integration Data streams Discovery & Development Labs Safe & secure Discovery and Development environment Data sets and samples Models and programs Marketing / Sales Applications Models Machine Learning Segments Operational Data Transactions Customer Master ata Unstructured Data Voice + Chat Transcripts ETL Based Integration Raw Customer Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Customer Data Data sets produced by mapping and transforming raw data
  • 33. Hadoop 2.0 and YARN
 (“Yet Another Resource Negotiator”) Key Innovation : Separating how data is stored,
 from how it is processed
  • 34.
  • 35. •Hadoop started by being synonymous with MapReduce, and Java coding •But YARN (Yet another Resource Negotiator) broke this dependency •Hadoop now just handles resource management •Multiple different query engines can run against data in-place •General-purpose (e.g. MapReduce) •Graph processing •Machine Learning •Real-Time Processing Hadoop 2.0 - Enabling Multiple Query Engines 35
  • 36. Technologies Emerged to Bridge Old/New World 36
  • 37. FAST FORWARD TO NOW… 37
  • 38. •New generation of big data platform services from Google, Amazon, Oracle •Combines three key innovations from earlier technologies: •Organising of data into tables and columns (from RDBMS DWs) •Massively-scalable and distributed storage and query (from Big Data) •Elastically-scalable Platform-as-a-Service (from Cloud) Elastically-Scalable Data Warehouse-as-a-Service 38
  • 39. … Which Is What I’m Working On Right Now 39
  • 40. Example Architecture : Google BigQuery 40
  • 41. 41
  • 42. •On-premise Hadoop, even with simple resilient clustering, will hit limits •Clusters can reach 5000+ nodes, need to scale-up for demand peaks etc •Scale limits are encountered way beyond those for DWs… •… but future is elastically-scaled, query and compute-as-a-service What Problem Did Analytics-as-a-Service Solve? 42 Oracle Big Data Cloud Compute Edition Free $300 developer credit at:
 https://cloud.oracle.com/en_US/tryit
  • 43. •And things come full-circle … analytics typically requires tabular data •Google BigQuery based-on DremelX massively-parallel query engine •But stores data columnar and provides SQL interface •Solves the problem of providing DW-like functionality at scale, as-a-service •This is the future … ;-) BigQuery : Big Data Meets Data Warehousing 43
  • 44. Mark Rittman, Oracle ACE Director NEW WORLD HADOOP ARCHITECTURES (& WHAT PROBLEMS THEY REALLY SOLVE) FOR DBAS UKOUG DATABASE SIG MEETING London, February 2017