SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Hadoop : Past, Present and
Future
Chris Harris
Email : charris@hortonworks.com
Twitter : cj_harris5

© Hortonworks Inc. 2013
Past

© Hortonworks Inc. 2013

Page 2
A little history… it’s 2005

© Hortonworks Inc. 2013
A Brief History of Apache Hadoop

Apache Project
Established

Yahoo! begins to
Operate at scale

Hortonworks
Data Platform

2013
2004

2006

2008

2010

2012

Enterprise
Hadoop

2005: Yahoo! creates
team under E14 to
work on Hadoop

© Hortonworks Inc. 2013

Page 4
Key Hadoop Data Types
1.  Sentiment
Understand how your customers feel about your brand and
products – right now

2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3.  Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines

4.  Geographic
Analyze location-based data to manage operations where
they occur

5.  Server Logs
Research logs to diagnose process failures and prevent
security breaches

6.  Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages,
emails, and documents

© Hortonworks Inc. 2013

Value
Hadoop is NOT
!
!
!
!
!
!

ESB
NoSQL
HPC
Relational
Real-time
The Jack of all Trades

© Hortonworks Inc. 2013
Hadoop 1
•  Limited up to 4,000 nodes per cluster
•  O(# of tasks in a cluster)
•  JobTracker bottleneck - resource management,
job scheduling and monitoring
•  Only has one namespace for managing HDFS
•  Map and Reduce slots are static
•  Only job to run is MapReduce

© Hortonworks Inc. 2013
Hadoop 1 - Basics
MapReduce (Computation Framework)

A

B

C

C

B

B

C

A

A

A

HDFS (Storage Framework)
© Hortonworks Inc. 2013
Hadoop 1 - Reading Files
NameNode
read file

SNameNode
(fsimage/edit)

Hadoop Client
return DNs,
block ids, etc.

checkpoint

heartbeat/
block report

read blocks

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

Rack1

Rack2

Rack3

© Hortonworks Inc. 2013

RackN
Hadoop 1 - Writing Files
NameNode
request write

Hadoop Client

SNameNode
(fsimage/edit)
checkpoint

return DNs, etc.

block report

write blocks

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

Rack1

Rack2

Rack3

© Hortonworks Inc. 2013

RackN

replication pipelining
Hadoop 1 - Running Jobs
Hadoop Client

submit job

JobTracker

map
deploy job

shuffle
DN | TT

part 0

© Hortonworks Inc. 2013

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

Rack1

reduce

DN | TT

Rack2

Rack3

RackN
Hadoop 1 - Security

authN/authZ

LDAP/AD

Users

F
I
R
E
W
A
L
L

KDC

service request

Hadoop Cluster

block token
delegate token

Client Node/
Spoke Server

Encryption Plugin

* block token is for accessing data
* delegate token is for running jobs

© Hortonworks Inc. 2013
Hadoop 1 - APIs
!
!
!
!

org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Job

© Hortonworks Inc. 2013
Present

© Hortonworks Inc. 2013

Page 14
Hadoop 2
!
!
!
!
!
!
!

Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing HDFS
Efficient cluster utilization (YARN)
MRv1 backward and forward compatible
Any apps can integrate with Hadoop
Beyond Java

© Hortonworks Inc. 2013
Hadoop 2 - Basics

© Hortonworks Inc. 2013
Hadoop 2 - Reading Files
(w/ NN Federation)
SNameNode
per NN
Hadoop Client

NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4

fsimage/edit copy

read file

checkpoint

or

return DNs,
block ids, etc.
fs sync
read blocks

Backup NN
per NN

checkpoint

register/
heartbeat/
block report

Block Pools
DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

ns1

Rack1

Rack2

© Hortonworks Inc. 2013

Rack3

RackN

ns2

ns3

ns4

dn1, dn2
dn1, dn3
dn4, dn5 dn4, dn5
Hadoop 2 - Writing Files
SNameNode
per NN
Hadoop Client

NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4

request write

fsimage/edit copy
checkpoint

or

return DNs, etc.
fs sync
write blocks
block report

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

checkpoint

DN | NM

DN | NM

Backup NN
per NN

Rack1

Rack2

© Hortonworks Inc. 2013

Rack3

RackN

replication pipelining
Hadoop 2 - Running Jobs
create app1

Hadoop Client 1

submit app1

ASM
NM

ResourceManager

.......negotiates....... Containers
.......reports to....... ASM

Scheduler .......partitions.......
Resources

create app2

Hadoop Client 2

submit app2

Scheduler

ASM

queues
status report

NodeManager
C2.1
NodeManager
C2.2
NodeManager
AM2

Rack1

© Hortonworks Inc. 2013

NodeManager

NodeManager

C1.3
NodeManager
C2.3

C1.2

NodeManager
AM1

Rack2

NodeManager
C1.4
NodeManager
C1.1

RackN
Hadoop 2 - Security
DMZ
KDC
LDAP/AD
Knox Gateway Cluster

Enterprise/
Cloud SSO
Provider
JDBC Client

F
I
R
E
W
A
L
L

F
I
R
E
W
A
L
L

Hadoop Cluster

REST Client
Native Hive/HBase Encryption

Browser(HUE)

© Hortonworks Inc. 2013
Hadoop 2 - APIs
! org.apache.hadoop.yarn.api.ApplicationClientProt
ocol
! org.apache.hadoop.yarn.api.ApplicationMasterPro
tocol
! org.apache.hadoop.yarn.api.ContainerManagemen
tProtocol

© Hortonworks Inc. 2013
Future

© Hortonworks Inc. 2013

Page 22
Apache Tez
A New Hadoop Data Processing Framework

© Hortonworks Inc. 2013

Page 23
HDP: Enterprise Hadoop Distribution
OPERATIONAL	
  
SERVICES	
  
AMBARI	
  

FLUME	
  
PIG	
  

FALCON*	
  
OOZIE	
  

Hortonworks
Data Platform (HDP)

DATA	
  
SERVICES	
  

SQOOP	
  

HIVE	
  &	
  

HCATALOG	
  

HBASE	
  

Enterprise Hadoop

OTHER	
  

•  The ONLY 100% open source
and complete distribution

LOAD	
  &	
  	
  
EXTRACT	
  

HADOOP	
  	
  
CORE	
  
PLATFORM	
  	
  
SERVICES	
  

NFS	
  
WebHDFS	
  

KNOX*	
  

MAP	
  	
  
REDUCE*	
   TEZ*	
  
	
  

YARN*	
  	
  	
  
HDFS	
  
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS	
  	
  
DATA	
  PLATFORM	
  (HDP)	
  

© Hortonworks Inc. 2013

•  Enterprise grade, proven and
tested at scale
•  Ecosystem endorsed to
ensure interoperability

Page 24
Tez (“Speed”)
• What is it?
– A data processing framework as an alternative to MapReduce
– A new incubation project in the ASF

• Who else is involved?
– 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo,
Microsoft

• Why does it matter?
– Widens the platform for Hadoop use cases
– Crucial to improving the performance of low-latency applications
– Core to the Stinger initiative
– Evidence of Hortonworks leading the community in the evolution
of Enterprise Hadoop
© Hortonworks Inc. 2013
Moving Hadoop Beyond MapReduce
• Low level data-processing execution engine
• Built on YARN
• Enables pipelining of jobs
• Removes task and job launch times
• Does not write intermediate output to HDFS
– Much lighter disk and network usage

• New base of MapReduce, Hive, Pig, Cascading etc.
• Hive and Pig jobs no longer need to move to the end
of the queue between steps in the pipeline

© Hortonworks Inc. 2013
Tez - Core Idea
Task with pluggable Input, Processor & Output

Input

Processor

Output

Task

Tez Task - <Input, Processor, Output>

YARN ApplicationMaster to run DAG of Tez Tasks
© Hortonworks Inc. 2013
Building Blocks for Tasks
MapReduce ‘Map’

HDFS
Input

Map
Processor

Sorted
Output

MapReduce ‘Map’ Task

Special Pig/Hive ‘Map’

HDFS
Input

Map
Processor

Pipelin
e
Sorter
Output

Tez Task

© Hortonworks Inc. 2013

MapReduce ‘Reduce’

Shuffle
Input

Reduce
Processor

HDFS
Output

MapReduce ‘Reduce’ Task

Special Pig/Hive ‘Reduce’

Shuffle
Skipmerge
Input

Reduce
Processor

Tez Task

Sorted
Output

Intermediate ‘Reduce’ for
Map-Reduce-Reduce

Shuffle
Input

Reduce
Processor

Sorted
Output

Intermediate ‘Reduce’ for
Map-Reduce-Reduce

In-memory Map

HDFSI
nput

Map
Processor

Tez Task

Inmemor
y
Sorted
Output
Pig/Hive-MR versus Pig/Hive-Tez
SELECT a.state, COUNT(*), AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state

Job 1

Job 2
I/O Synchronization
Barrier

I/O Synchronization
Barrier

Single Job
Job 3

Pig/Hive - MR
© Hortonworks Inc. 2013

Pig/Hive - Tez
Tez on YARN: Going Beyond Batch

Tez Task

Tez Optimizes Execution
New runtime engine for
more efficient data processing

© Hortonworks Inc. 2013

Always-On Tez Service
Low latency processing for
all Hadoop data processing
Apache Knox
Secure Access to Hadoop

© Hortonworks Inc. 2013
Knox Initiative

Make Hadoop security simple

Simplify Security

Aggregate Access

Client Agility

Simplify security for both users
and operators.

Deliver unified and centralized
access to the Hadoop cluster.

Provide seamless access for
users while securing cluster at
the perimeter, shielding the
intricacies of the security
implementation.

Make Hadoop feel like a single
application to users.

Ensure service users are
abstracted from where services
are located and how services
are configured & scaled.

© Hortonworks Inc. 2013
Knox: Make Hadoop Security Simple
Authentication &
Verification
User Store
KDC, AD, LDAP

Client

{REST}!

© Hortonworks Inc. 2013

Knox
Gateway

Hadoop Cluster
Knox: Next Generation of Hadoop Security
•  All users see one end-point
website
online apps
+
analytics tools

end users

•  All online systems see one endpoint RESTful service

Gateway

•  Consistency across all interfaces
and capabilities
•  Firewalled cluster that no end
users need to access
•  More IT-friendly. Enables:
–  Systems admins
–  DB admins
–  Security admins
–  Network admins
© Hortonworks Inc. 2013

firewall

Hadoop cluster

firewall
Apache Falcon
Data Lifecycle Management for Hadoop

© Hortonworks Inc. 2013
Data Lifecycle on Hadoop is Challenging

Data Management Needs

Tools

Data Processing

Oozie

Replication

Sqoop

Retention

Distcp

Scheduling

Flume

Reprocessing

Map / Reduce

Multi Cluster Management

Hive and Pig Jobs

Problem: Patchwork of tools complicate data lifecycle management.
Result:
Long development cycles and quality challenges.

© Hortonworks Inc. 2013
Falcon: One-stop Shop for Data Lifecycle
Apache Falcon
Provides

Orchestrates

Data Management Needs

Tools

Data Processing

Oozie

Replication

Sqoop

Retention

Distcp

Scheduling

Flume

Reprocessing

Map / Reduce

Multi Cluster Management

Hive and Pig Jobs

Falcon provides a single interface to orchestrate data lifecycle.
Sophisticated DLM easily added to Hadoop applications.

© Hortonworks Inc. 2013
Falcon At A Glance
Data Processing Applications

Spec Files or
REST APIs

Falcon Data Lifecycle Management Service
Data Import
and
Replication

Scheduling
and
Coordination

Data Lifecycle
Policies

Multi-Cluster
Management

SLA
Management

>  Falcon provides the key services data processing applications need.
>  Complex data processing logic handled by Falcon instead of hard-coded in apps.
>  Faster development and higher quality for ETL, reporting and other data
processing apps on Hadoop.

© Hortonworks Inc. 2013
Falcon Core Capabilities
• Core Functionality
– Pipeline processing
– Replication
– Retention
– Late data handling

• Automates
– Scheduling and retry
– Recording audit, lineage and metrics

• Operations and Management
– Monitoring, management, metering
– Alerts and notifications
– Multi Cluster Federation

• CLI and REST API
© Hortonworks Inc. 2013
Falcon Example: Multi-Cluster Failover
Primary Hadoop Cluster
Cleansed
Data

Conformed
Data

Presented
Data

BI and Analytics

Replication

Staged
Data

Staged
Data

Presented
Data

Failover Hadoop Cluster

>  Falcon manages workflow, replication or both.
>  Enables business continuity without requiring full data reprocessing.
>  Failover clusters require less storage and CPU.

© Hortonworks Inc. 2013
Falcon Example: Retention Policies

Staged Data

Cleansed Data

Conformed
Data

Presented
Data

Retain 5 Years

Retain 3 Years

Retain 3 Years

Retain Last
Copy Only

>  Sophisticated retention policies expressed in one place.
>  Simplify data retention for audit, compliance, or for data re-processing.

© Hortonworks Inc. 2013
Falcon Example: Late Data Handling
Online
Transaction
Data (Pull via
Sqoop)
Wait up to 4
hours for FTP data
to arrive

Staging Area

Combined
Dataset

Web Log Data
(Push via FTP)

>  Processing waits until all data is available.
>  Developers don’t write complex data handling rules within applications.

© Hortonworks Inc. 2013
Multi Cluster Management with Prism

>  Prism is the part of Falcon that handles multi-cluster.
>  Key use cases: Replication and data processing that spans clusters.

© Hortonworks Inc. 2013

Page 43
Hortonworks Sandbox
Go from Zero to Big Data in 15 minutes

© Hortonworks Inc. 2013

Page 44
Sandbox: A Guided Tour of HDP
Tutorials and videos give
a guided tour of HDP and
Hadoop
Perfect for beginners or
anyone learning more
about Hadoop
Installs easily on your
laptop or desktop

Easy-to-use editors
for Apache Pig and Hive
© Hortonworks Inc. 2013

Easily import data
and create tables

Browse and manage
HDFS files

Latest tutorials pushed
directly to your Sandbox
Page 45
THANK YOU!
Chris Harris
charris@hortonworks.com

Download Sandbox
hortonworks.com/sandbox
© Hortonworks Inc. 2013

Page 46

Mais conteúdo relacionado

Mais procurados

Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hortonworks
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to HadoopHortonworks
 
TEZ-8 UI Walkthrough
TEZ-8 UI WalkthroughTEZ-8 UI Walkthrough
TEZ-8 UI Walkthrought3rmin4t0r
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoopmarklpollack
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache DrillCharles Givre
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 

Mais procurados (20)

Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to Hadoop
 
TEZ-8 UI Walkthrough
TEZ-8 UI WalkthroughTEZ-8 UI Walkthrough
TEZ-8 UI Walkthrough
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Pivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache HadoopPivotal HD and Spring for Apache Hadoop
Pivotal HD and Spring for Apache Hadoop
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 

Destaque

Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to HadoopUp-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoopblueboxtraveler
 
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Yahoo Developer Network
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 

Destaque (6)

Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to HadoopUp-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 

Semelhante a Hadoop past, present and future

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisProcess and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisHortonworks
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduceSteve Loughran
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and BeyondDataWorks Summit
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times FasterApril 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times FasterYahoo Developer Network
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 

Semelhante a Hadoop past, present and future (20)

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisProcess and Visualize Your Data with Revolution R, Hadoop and GoogleVis
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times FasterApril 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 

Mais de Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Mais de Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Hadoop past, present and future

  • 1. Hadoop : Past, Present and Future Chris Harris Email : charris@hortonworks.com Twitter : cj_harris5 © Hortonworks Inc. 2013
  • 3. A little history… it’s 2005 © Hortonworks Inc. 2013
  • 4. A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop 2005: Yahoo! creates team under E14 to work on Hadoop © Hortonworks Inc. 2013 Page 4
  • 5. Key Hadoop Data Types 1.  Sentiment Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geographic Analyze location-based data to manage operations where they occur 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 Value
  • 6. Hadoop is NOT ! ! ! ! ! ! ESB NoSQL HPC Relational Real-time The Jack of all Trades © Hortonworks Inc. 2013
  • 7. Hadoop 1 •  Limited up to 4,000 nodes per cluster •  O(# of tasks in a cluster) •  JobTracker bottleneck - resource management, job scheduling and monitoring •  Only has one namespace for managing HDFS •  Map and Reduce slots are static •  Only job to run is MapReduce © Hortonworks Inc. 2013
  • 8. Hadoop 1 - Basics MapReduce (Computation Framework) A B C C B B C A A A HDFS (Storage Framework) © Hortonworks Inc. 2013
  • 9. Hadoop 1 - Reading Files NameNode read file SNameNode (fsimage/edit) Hadoop Client return DNs, block ids, etc. checkpoint heartbeat/ block report read blocks DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT Rack1 Rack2 Rack3 © Hortonworks Inc. 2013 RackN
  • 10. Hadoop 1 - Writing Files NameNode request write Hadoop Client SNameNode (fsimage/edit) checkpoint return DNs, etc. block report write blocks DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT Rack1 Rack2 Rack3 © Hortonworks Inc. 2013 RackN replication pipelining
  • 11. Hadoop 1 - Running Jobs Hadoop Client submit job JobTracker map deploy job shuffle DN | TT part 0 © Hortonworks Inc. 2013 DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT Rack1 reduce DN | TT Rack2 Rack3 RackN
  • 12. Hadoop 1 - Security authN/authZ LDAP/AD Users F I R E W A L L KDC service request Hadoop Cluster block token delegate token Client Node/ Spoke Server Encryption Plugin * block token is for accessing data * delegate token is for running jobs © Hortonworks Inc. 2013
  • 13. Hadoop 1 - APIs ! ! ! ! org.apache.hadoop.mapreduce.Partitioner org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer org.apache.hadoop.mapreduce.Job © Hortonworks Inc. 2013
  • 15. Hadoop 2 ! ! ! ! ! ! ! Potentially up to 10,000 nodes per cluster O(cluster size) Supports multiple namespace for managing HDFS Efficient cluster utilization (YARN) MRv1 backward and forward compatible Any apps can integrate with Hadoop Beyond Java © Hortonworks Inc. 2013
  • 16. Hadoop 2 - Basics © Hortonworks Inc. 2013
  • 17. Hadoop 2 - Reading Files (w/ NN Federation) SNameNode per NN Hadoop Client NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4 fsimage/edit copy read file checkpoint or return DNs, block ids, etc. fs sync read blocks Backup NN per NN checkpoint register/ heartbeat/ block report Block Pools DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM ns1 Rack1 Rack2 © Hortonworks Inc. 2013 Rack3 RackN ns2 ns3 ns4 dn1, dn2 dn1, dn3 dn4, dn5 dn4, dn5
  • 18. Hadoop 2 - Writing Files SNameNode per NN Hadoop Client NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4 request write fsimage/edit copy checkpoint or return DNs, etc. fs sync write blocks block report DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM checkpoint DN | NM DN | NM Backup NN per NN Rack1 Rack2 © Hortonworks Inc. 2013 Rack3 RackN replication pipelining
  • 19. Hadoop 2 - Running Jobs create app1 Hadoop Client 1 submit app1 ASM NM ResourceManager .......negotiates....... Containers .......reports to....... ASM Scheduler .......partitions....... Resources create app2 Hadoop Client 2 submit app2 Scheduler ASM queues status report NodeManager C2.1 NodeManager C2.2 NodeManager AM2 Rack1 © Hortonworks Inc. 2013 NodeManager NodeManager C1.3 NodeManager C2.3 C1.2 NodeManager AM1 Rack2 NodeManager C1.4 NodeManager C1.1 RackN
  • 20. Hadoop 2 - Security DMZ KDC LDAP/AD Knox Gateway Cluster Enterprise/ Cloud SSO Provider JDBC Client F I R E W A L L F I R E W A L L Hadoop Cluster REST Client Native Hive/HBase Encryption Browser(HUE) © Hortonworks Inc. 2013
  • 21. Hadoop 2 - APIs ! org.apache.hadoop.yarn.api.ApplicationClientProt ocol ! org.apache.hadoop.yarn.api.ApplicationMasterPro tocol ! org.apache.hadoop.yarn.api.ContainerManagemen tProtocol © Hortonworks Inc. 2013
  • 23. Apache Tez A New Hadoop Data Processing Framework © Hortonworks Inc. 2013 Page 23
  • 24. HDP: Enterprise Hadoop Distribution OPERATIONAL   SERVICES   AMBARI   FLUME   PIG   FALCON*   OOZIE   Hortonworks Data Platform (HDP) DATA   SERVICES   SQOOP   HIVE  &   HCATALOG   HBASE   Enterprise Hadoop OTHER   •  The ONLY 100% open source and complete distribution LOAD  &     EXTRACT   HADOOP     CORE   PLATFORM     SERVICES   NFS   WebHDFS   KNOX*   MAP     REDUCE*   TEZ*     YARN*       HDFS   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS     DATA  PLATFORM  (HDP)   © Hortonworks Inc. 2013 •  Enterprise grade, proven and tested at scale •  Ecosystem endorsed to ensure interoperability Page 24
  • 25. Tez (“Speed”) • What is it? – A data processing framework as an alternative to MapReduce – A new incubation project in the ASF • Who else is involved? – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft • Why does it matter? – Widens the platform for Hadoop use cases – Crucial to improving the performance of low-latency applications – Core to the Stinger initiative – Evidence of Hortonworks leading the community in the evolution of Enterprise Hadoop © Hortonworks Inc. 2013
  • 26. Moving Hadoop Beyond MapReduce • Low level data-processing execution engine • Built on YARN • Enables pipelining of jobs • Removes task and job launch times • Does not write intermediate output to HDFS – Much lighter disk and network usage • New base of MapReduce, Hive, Pig, Cascading etc. • Hive and Pig jobs no longer need to move to the end of the queue between steps in the pipeline © Hortonworks Inc. 2013
  • 27. Tez - Core Idea Task with pluggable Input, Processor & Output Input Processor Output Task Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tez Tasks © Hortonworks Inc. 2013
  • 28. Building Blocks for Tasks MapReduce ‘Map’ HDFS Input Map Processor Sorted Output MapReduce ‘Map’ Task Special Pig/Hive ‘Map’ HDFS Input Map Processor Pipelin e Sorter Output Tez Task © Hortonworks Inc. 2013 MapReduce ‘Reduce’ Shuffle Input Reduce Processor HDFS Output MapReduce ‘Reduce’ Task Special Pig/Hive ‘Reduce’ Shuffle Skipmerge Input Reduce Processor Tez Task Sorted Output Intermediate ‘Reduce’ for Map-Reduce-Reduce Shuffle Input Reduce Processor Sorted Output Intermediate ‘Reduce’ for Map-Reduce-Reduce In-memory Map HDFSI nput Map Processor Tez Task Inmemor y Sorted Output
  • 29. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2 I/O Synchronization Barrier I/O Synchronization Barrier Single Job Job 3 Pig/Hive - MR © Hortonworks Inc. 2013 Pig/Hive - Tez
  • 30. Tez on YARN: Going Beyond Batch Tez Task Tez Optimizes Execution New runtime engine for more efficient data processing © Hortonworks Inc. 2013 Always-On Tez Service Low latency processing for all Hadoop data processing
  • 31. Apache Knox Secure Access to Hadoop © Hortonworks Inc. 2013
  • 32. Knox Initiative Make Hadoop security simple Simplify Security Aggregate Access Client Agility Simplify security for both users and operators. Deliver unified and centralized access to the Hadoop cluster. Provide seamless access for users while securing cluster at the perimeter, shielding the intricacies of the security implementation. Make Hadoop feel like a single application to users. Ensure service users are abstracted from where services are located and how services are configured & scaled. © Hortonworks Inc. 2013
  • 33. Knox: Make Hadoop Security Simple Authentication & Verification User Store KDC, AD, LDAP Client {REST}! © Hortonworks Inc. 2013 Knox Gateway Hadoop Cluster
  • 34. Knox: Next Generation of Hadoop Security •  All users see one end-point website online apps + analytics tools end users •  All online systems see one endpoint RESTful service Gateway •  Consistency across all interfaces and capabilities •  Firewalled cluster that no end users need to access •  More IT-friendly. Enables: –  Systems admins –  DB admins –  Security admins –  Network admins © Hortonworks Inc. 2013 firewall Hadoop cluster firewall
  • 35. Apache Falcon Data Lifecycle Management for Hadoop © Hortonworks Inc. 2013
  • 36. Data Lifecycle on Hadoop is Challenging Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs Problem: Patchwork of tools complicate data lifecycle management. Result: Long development cycles and quality challenges. © Hortonworks Inc. 2013
  • 37. Falcon: One-stop Shop for Data Lifecycle Apache Falcon Provides Orchestrates Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs Falcon provides a single interface to orchestrate data lifecycle. Sophisticated DLM easily added to Hadoop applications. © Hortonworks Inc. 2013
  • 38. Falcon At A Glance Data Processing Applications Spec Files or REST APIs Falcon Data Lifecycle Management Service Data Import and Replication Scheduling and Coordination Data Lifecycle Policies Multi-Cluster Management SLA Management >  Falcon provides the key services data processing applications need. >  Complex data processing logic handled by Falcon instead of hard-coded in apps. >  Faster development and higher quality for ETL, reporting and other data processing apps on Hadoop. © Hortonworks Inc. 2013
  • 39. Falcon Core Capabilities • Core Functionality – Pipeline processing – Replication – Retention – Late data handling • Automates – Scheduling and retry – Recording audit, lineage and metrics • Operations and Management – Monitoring, management, metering – Alerts and notifications – Multi Cluster Federation • CLI and REST API © Hortonworks Inc. 2013
  • 40. Falcon Example: Multi-Cluster Failover Primary Hadoop Cluster Cleansed Data Conformed Data Presented Data BI and Analytics Replication Staged Data Staged Data Presented Data Failover Hadoop Cluster >  Falcon manages workflow, replication or both. >  Enables business continuity without requiring full data reprocessing. >  Failover clusters require less storage and CPU. © Hortonworks Inc. 2013
  • 41. Falcon Example: Retention Policies Staged Data Cleansed Data Conformed Data Presented Data Retain 5 Years Retain 3 Years Retain 3 Years Retain Last Copy Only >  Sophisticated retention policies expressed in one place. >  Simplify data retention for audit, compliance, or for data re-processing. © Hortonworks Inc. 2013
  • 42. Falcon Example: Late Data Handling Online Transaction Data (Pull via Sqoop) Wait up to 4 hours for FTP data to arrive Staging Area Combined Dataset Web Log Data (Push via FTP) >  Processing waits until all data is available. >  Developers don’t write complex data handling rules within applications. © Hortonworks Inc. 2013
  • 43. Multi Cluster Management with Prism >  Prism is the part of Falcon that handles multi-cluster. >  Key use cases: Replication and data processing that spans clusters. © Hortonworks Inc. 2013 Page 43
  • 44. Hortonworks Sandbox Go from Zero to Big Data in 15 minutes © Hortonworks Inc. 2013 Page 44
  • 45. Sandbox: A Guided Tour of HDP Tutorials and videos give a guided tour of HDP and Hadoop Perfect for beginners or anyone learning more about Hadoop Installs easily on your laptop or desktop Easy-to-use editors for Apache Pig and Hive © Hortonworks Inc. 2013 Easily import data and create tables Browse and manage HDFS files Latest tutorials pushed directly to your Sandbox Page 45
  • 46. THANK YOU! Chris Harris charris@hortonworks.com Download Sandbox hortonworks.com/sandbox © Hortonworks Inc. 2013 Page 46