SlideShare uma empresa Scribd logo
1 de 48
Hortonworks: We Do Hadoop.
Our mission is to enable your Modern Data Architecture
by delivering One Enterprise Hadoop

November 2013

© Hortonworks Inc. 2013 - Confidential

Page 1
Agenda
• Hortonworks Overview of Tez
– Quick and painless

• A driver for Tez: The Stinger Initiative
• Tez Deep Dive
• Demo

Page 2
A Brief History of Apache Hadoop
Apache Project
Established

Yahoo! begins to
Operate at scale

Hortonworks
Data Platform

2013
2004

2006

2008

2010

2005: Hadoop created
at Yahoo!

2012

Focus on INNOVATION

2008: Yahoo team extends focus to
operations to support multiple
projects & growing clusters

Focus on OPERATIONS

2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24
key Hadoop engineers from Yahoo

© Hortonworks Inc. 2013 - Confidential

Enterprise
Hadoop

STABILITY

Page 3
Our Mission:

Enable your Modern Data Architecture
by delivering One Enterprise Hadoop

Our Commitment
Headquarters: Palo Alto, CA
Employees: 240+ and growing
Customers: 120+ and growing
Investors: Benchmark, Index,
Yahoo, Dragoneer, Tenaya

Innovate in the Open
We employ the core architects and operators of Hadoop and
drive innovation through open source Apache Foundation
projects to avoid vendor lock-in

Certify for the Enterprise
Trusted Partners with:

We engineer, test and certify the Hortonworks Data Platform for
enterprise usage and deliver the highest quality of support

Interoperate with the Ecosystem
We work with partners to deeply integrate Hadoop with key
technologies so you can leverage existing skills and investments

© Hortonworks Inc. 2013 - Confidential

Page 4
DATA SYSTEM

APPLICATIONS

Goal: Interoperable and Familiar
BusinessObjects BI

DEV & DATA TOOLS

OPERATIONAL TOOLS

RDBMS

HANA

EDW

MPP

SOURCES

INFRASTRUCTURE

Existing Sources

Emerging Sources

(CRM, ERP, Clickstream, Logs)

(Sensor, Sentiment, Geo, Unstructured)

© Hortonworks Inc. 2013 - Confidential

Page 5
Betting on Hortonworks…

HDInsight &
HDP for Windows

Teradata Portfolio
for Hadoop

• Only Hadoop Distribution
for Windows Azure &
Windows Server

• Seamless data access
between Teradata and
Hadoop (SQL-H)

• Native integration with
SQL Server, Excel, and
System Center

• Simple management &
monitoring with Viewpoint
integration

• Extends Hadoop to .NET
community

• Flexible deployment
options

Instant Access +
Infinite Scale
• SAP can assure their
customers they are
deploying an SAP HANA
+ Hadoop architecture
fully supported by SAP
• Enables analytics apps
(BOBJ) to interact with
Hadoop

Complete Portfolio for Hadoop

UDA
Diagram
Appliances

© Hortonworks Inc. 2013 - Confidential

Page 6
Hortonworks Approach to Enterprise Hadoop
Community Driven Enterprise Apache Hadoop
Identify and introduce enterprise
requirements into the public domain
Work with the community to advance and
incubate open source projects

Apply Enterprise Rigor to provide the most
stable and reliable distribution

© Hortonworks Inc. 2013 - Confidential
Driving Hadoop Innovation
Total Net Lines Contributed
to Apache Hadoop

End Users
449,768 lines

Hortonworks engineers focus on making
Apache Hadoop an enterprise viable
platform that powers modern data
architectures and deeply integrates
with existing data center technologies

614,041 lines

147,933 lines
10 Others

21

63
total

LinkedIn: 3

IBM: 3
Facebook: 5

Yahoo: 10
Cloudera: 7

Total Number of Committers
to Apache Hadoop

© Hortonworks Inc. 2013 - Confidential
HDP: Enterprise Hadoop Platform
OPERATIONAL
SERVICES
AMBARI

FLUME
HBASE

FALCON*
OOZIE

Hortonworks
Data Platform (HDP)

DATA
SERVICES
PIG

SQOOP

HIVE &
HCATALOG

• The ONLY 100% open source
and complete platform

LOAD &
EXTRACT

HADOOP
CORE

NFS
WebHDFS

MAP
REDUCE

TEZ

YARN
HDFS
Enterprise Readiness

PLATFORM
SERVICES

KNOX*

High Availability, Disaster
Recovery, Rolling
Upgrades, Security and
Snapshots

HORTONWORKS
DATA PLATFORM (HDP)
OS/VM

Cloud

© Hortonworks Inc. 2013 - Confidential

• Integrates full range of
enterprise-ready services
• Certified and tested at scale
• Engineered for deep
ecosystem interoperability

Appliance

Page 9
Hortonworks: The Value of “Open” for You
Connect With the Hadoop Community
We employ a large number of Apache project committers & innovators so
that you are represented in the open source community

Avoid Vendor Lock
Hortonworks Data Platform remain as close to the open source trunk as
possible and is developed 100% in the open so you are never locked in

The partners you rely on, rely on Hortonworks
We work with partners to deeply integrate Hadoop with data center
technologies so you can leverage existing skills and investments

Certified for the Enterprise
We engineer, test and certify the Hortonworks Data Platform at scale to
ensure reliability and stability you require for enterprise use

Support from the experts
We provide the highest quality of support for deploying at scale. You are
supported by hundreds of years of Hadoop experience

© Hortonworks Inc. 2013 - Confidential

Page 10
SQL-in-Hadoop with Apache Hive
Business
Analytics

Custom
Apps
SQL

Hadoop

Hive
MapReduce

Tez

YARN

• Apache Hive is the standard for
SQL interaction with Hadoop
– Enterprise makes final purchasing
decision on two key characteristics:
'compatibility' with existing
investments (60%) and skills (20%)
– Most application claim Hive
compatibility TODAY*

HDFS

• Stinger Initiative: Simple Focus
Improves existing
tools & preserves
investments

– Performance
– SQL-Compatibility
Claims publicly made by: Teradata, Microsoft, Oracle, Microstrategy, IBM, Information
Builders, SAS, QlikTech, SAP, Tableau, Tibco, Actuate, Jaspersoft, Alteryx, Datameer, Pentah
o

© Hortonworks Inc. 2013 - Confidential

Page 11
Stinger Initiative Goals
Execution
Engine

+

Tez

Windowing
&
Subqueries

Query
Planner
Hive

+

Data
Types

+

File
Format

= 100X

ORC file

= SQL Compatible

• Enables Hive to support interactive workloads
• Improves existing tools & preserves investments
© Hortonworks Inc. 2013 - Confidential
Stinger: Hive For All Analytics
Parameterized Reports

Enterprise Reports
Dashboard / Scorecard

Data Mining

Visualization

100X Faster
+
SQL Compatible

Interactive

© Hortonworks Inc. 2013 - Confidential

Batch
Stinger Roadmap
• Join optimizations
• ORCFile
• SQL:2003
windowing
functions
DATA TYPES
• Subqueries for
IN, NOT
IN, HAVING
• Datatypes:
CHAR, VARCHAR,
DATETIME
• Improvements to
DECIMAL datatype
• Integration with Tez
and Tez Service
• Vectorization
Preview
• Intelligent Optimizer
• Column Statistics
• Authentication and
Authorization
Enhancements
• Full vector query

© Hortonworks Inc. 2013 - Confidential

Page 14
Stinger: Some early Results

• Query Engine Work ONLY
• Uses TPC “style” benchmark
• Just a few weeks of work
• OTHER work coming
© Hortonworks Inc. 2013 - Confidential

Page 15
Apache Tez : Accelerating
Hadoop Query Processing

© Hortonworks Inc. 2013 - Confidential

Page 16
Tez – Introduction
• Distributed execution
framework targeted towards
data-processing applications.
• Based on expressing a
computation as a dataflow
graph.
• Built on top of YARN – the
resource management
framework for Hadoop.
• Open source Apache incubator
project and Apache licensed.

© Hortonworks Inc. 2013 - Confidential

Page 17
Old School Hadoop: MapReduce

© Hortonworks Inc. 2013 - Confidential
Fundamentals of YARN
• The fundamental idea of YARN is to split up the two
major responsibilities of the JobTracker/TaskTracker
into separate entities:
– a global ResourceManager
– a per-application ApplicationMaster.
– a per-node slave NodeManager and
– a per-application Container running on a NodeManager

© Hortonworks Inc. 2013 - Confidential

Page 19
New School Hadoop with YARN
Node
Manager

Container

App Mstr

Client
Resource
Manager

Node
Manager

Client
App Mstr

MapReduce Status
Job Submission
Node Status
Resource Request

© Hortonworks Inc. 2013 - Confidential

Container

Node
Manager

Container

Container
Tez – Design Themes
• Empowering End Users
• Execution Performance

© Hortonworks Inc. 2013 - Confidential

Page 21
Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying deployment

© Hortonworks Inc. 2013 - Confidential

Page 22
Tez – Empowering End Users
• Expressive dataflow definition API’s
– Enable definition of complex data flow pipelines using simple
graph connection API’s. Tez expands the logical plan at runtime.
– Targeted towards data processing applications like Hive/Pig but
not limited to it. Hive/Pig query plans naturally map to Tez dataflow
graphs with no translation impedance.
TaskA-1

TaskA-2

TaskD-1

© Hortonworks Inc. 2013 - Confidential

TaskB-1

TaskD-2

TaskB-2

TaskC-1

TaskE-1

TaskC-2

TaskE-2

Page 23
Tez – Empowering End Users
• Expressive dataflow definition API’s
Task-1

Task-2

Task-1

Task-2

Sample
s
Sampler

Preprocessor Stage

Ranges

Distributed Sort

© Hortonworks Inc. 2013 - Confidential

Task-1

Task-2

Partition Stage

Aggregate Stage

Page 24
Tez – Empowering End Users
• Flexible Input-Processor-Output runtime model
– Construct physical runtime executors dynamically by connecting
different inputs, processors and outputs.
– End goal is to have a library of inputs, outputs and processors that
can be programmatically composed to generate useful tasks.

ShuffleInput

ShuffleInput

ReduceProcessor

ReduceProcessor

JoinProcessor

FileSortedOutput

HDFSOutput

FileSortedOutput

IntermediateReduce

FinalReduce

PairwiseJoin

© Hortonworks Inc. 2013 - Confidential

Input1

Input2

Page 25
Tez – Empowering End Users
• Data type agnostic
– Tez is only concerned with the movement of data. Files and
streams of bytes.
– Does not impose any data format on the user application. MR
application can use Key-Value pairs on top of Tez. Hive and Pig
can use tuple oriented formats that are natural and native to them.

Tez
Task

File
Bytes

User Code
Key Value

Bytes
Tuples

Stream

© Hortonworks Inc. 2013 - Confidential

Page 26
Tez – Empowering End Users
• Simplifying deployment
– Tez is a completely client side application.
– No deployments to do. Simply upload to any accessible
FileSystem and change local Tez configuration to point to that.
– Enables running different versions concurrently. Easy to test new
functionality while keeping stable versions for production.
– Leverages YARN local resources.
HDFS
Tez Lib 1

Tez Lib 2

TezClient

TezTask

TezTask

TezClient

Client
Machine

Node
Manager

Node
Manager

Client
Machine

© Hortonworks Inc. 2013 - Confidential

Page 27
Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying usage
With great power API’s come great responsibilities 
Tez is a framework on which end user applications can
be built

© Hortonworks Inc. 2013 - Confidential

Page 28
Tez – Execution Performance
• Performance gains over Map Reduce
• Optimal resource management
• Plan reconfiguration at runtime
• Dynamic physical data flow decisions

© Hortonworks Inc. 2013 - Confidential

Page 29
Tez – Execution Performance
• Performance gains over Map Reduce
– Eliminate replicated write barrier between successive
computations.
– Eliminate job launch overhead of workflow jobs.
– Eliminate extra stage of map reads in every workflow job.
– Eliminate queue and resource contention suffered by workflow
jobs that are started after a predecessor job completes.

Pig/Hive - MR

© Hortonworks Inc. 2013 - Confidential

Pig/Hive - Tez

Page 30
Tez – Execution Performance
• Optimal resource management
– Reuse YARN containers to launch new tasks.
– Reuse YARN containers to enable shared objects across tasks.

Start Task

Tez
Application Master

Task Done

Start Task

YARN Container

© Hortonworks Inc. 2013 - Confidential

TezTask1

TezTask2

Shared Objects

TezTask Host

YARN Container

Page 31
Tez – Execution Performance
• Plan reconfiguration at runtime
– Dynamic runtime concurrency control based on data size, user
operator resources, available cluster resources and locality.
– Advanced changes in dataflow graph structure.
– Progressive graph construction in concert with user optimizer.

HDFS
Blocks
Stage 1
50 maps
100
partitions

Stage 2
100
reducers

Stage 1
50 maps
100
partitions

Only 10GB’s
of data

Stage 2
100 10
reducers

YARN
Resources

© Hortonworks Inc. 2013 - Confidential

Page 32
Tez – Execution Performance
• Dynamic physical data flow decisions
– Decide the type of physical byte movement and storage on the fly.
– Store intermediate data on distributed store, local store or inmemory.
– Transfer bytes via blocking files or streaming and the spectrum in
between.
Producer
(small size)

Producer

Local File

Consumer

© Hortonworks Inc. 2013 - Confidential

At Runtime

In-Memory

Consumer

Page 33
Tez – Deep Dive – API
Simple DAG definition API
DAG dag = new DAG();
Vertex map1 = new Vertex(MapProcessor.class);
Vertex map2 = new Vertex(MapProcessor.class);
Vertex reduce1 = new Vertex(ReduceProcessor.class);
Vertex reduce2 = new Vertex(ReduceProcessor.class);
Vertex join1 = new Vertex(JoinProcessor.class);
…….
Edge edge1 = Edge(map1, reduce1, SCATTER_GATHER,
PERSISTED, SEQUENTIAL, MOutput.class, RInput.class);
Edge edge2 = Edge(map2, reduce2, SCATTER_GATHER,
PERSISTED, SEQUENTIAL, MOutput.class, RInput.class);
Edge edge3 = Edge(reduce1, join1, SCATTER_GATHER,
PERSISTED, SEQUENTIAL, MOutput.class, RInput.class);
Edge edge4 = Edge(reduce2, join1, SCATTER_GATHER,
PERSISTED, SEQUENTIAL, MOutput.class, RInput.class);
…….
dag.addVertex(map1).addVertex(map2)
.addVertex(reduce1).addVertex(reduce2)
.addVertex(join1)
.addEdge(edge1).addEdge(edge2)
.addEdge(edge3).addEdge(edge4);
© Hortonworks Inc. 2013 - Confidential

map1

map2
Scatter_Gather
Bipartite Sequential

reduce1

reduce2
Scatter_Gather
Bipartite Sequential

join1

Page 34
Tez – Deep Dive – API
Edge properties define the connection between
producer and consumer vertices in the DAG
• Data movement – Defines routing of data between tasks
– One-To-One : Data from the ith producer task routes to the ith consumer
task.
– Broadcast : Data from a producer task routes to all consumer tasks.
– Scatter-Gather : Producer tasks scatter data into shards and consumer
tasks gather the data. The ith shard from all producer tasks routes to the ith
consumer task.

• Scheduling – Defines when a consumer task is scheduled
– Sequential : Consumer task may be scheduled after a producer task
completes.
– Concurrent : Consumer task must be co-scheduled with a producer task.

• Data source – Defines the lifetime/reliability of a task output
– Persisted : Output will be available after the task exits. Output may be lost
later on.
– Persisted-Reliable : Output is reliably stored and will always be available
– Ephemeral : Output is available only while the producer task is running

© Hortonworks Inc. 2013 - Confidential

Page 35
Tez – Deep Dive – Scheduling
Start
vertex

• Vertex Scheduler
Determines when
tasks in a vertex
can start

Get container

map1

Get Priority

• DAG Scheduler
Determines
priority of task

Start
vertex

• Task Scheduler
Allocates
containers from
YARN and
assigns them to
tasks

Vertex Scheduler

DAG
Scheduler

Task
Scheduler

Start
tasks

reduce1

Get Priority
Get container

© Hortonworks Inc. 2013 - Confidential

Page 36
Tez – Deep Dive – Task Execution
• Start task shell with
user specified
env, resources etc.
• Fetch and
instantiate
Input, Processor, O
utput objects
• Receive
(incremental) input
information and
process the input
• Provide output
information

© Hortonworks Inc. 2013 - Confidential

Task Attempt
(logical in AM)
Env, cmd
line, resources
Input
Processor
Output

Task Attempt
(real on machine)
Start container

Task JVM

Get Task
Input
Processor

Data
Information

Data Events

Output

Page 37
Tez - Sessions
• The amount of work programmed into a script/query may
not be doable within a single Tez DAG.

© Hortonworks Inc. 2013 - Confidential

Page 38
Tez - Sessions

• Even better performance gains may be achieved through
caching with the session: Within AM or container
© Hortonworks Inc. 2013 - Confidential

Page 39
Tez – Automatic Reduce Parallelism
Event Model
Map tasks send
data statistics
events to the
Reduce Vertex
Manager.
Vertex Manager
Pluggable user logic
that understands the
data statistics and
can formulate the
correct parallelism.
Advises vertex
controller on
parallelism

Data Size Statistics

Vertex Manager

Map Vertex

Set Parallelism
Re-Route

Vertex State
Machine

App Master

Reduce Vertex
Cancel Task

© Hortonworks Inc. 2013 - Confidential

Page 40
Tez – Reduce Slow Start/Pre-launch
Event Model
Map completion
events sent to the
Reduce Vertex
Manager.
Vertex Manager
Pluggable user logic
that understands the
data size. Advises the
vertex controller to
launch the reducers
before all maps have
completed so that
shuffle can start.

© Hortonworks Inc. 2013 - Confidential

Task Completed

Vertex Manager

Map Vertex

Start Tasks

Vertex State
Machine

App Master

Start

Reduce Vertex

Page 41
Tez – Current status
• Apache Incubator Project
– Rapid development. Over 330 jiras opened. Over 220 resolved.
– Growing community.

• Focus on stability
– Testing and quality are highest priority.
– Working on Tez+YARN to fix basic performance overheads.
– Code ready and deployed on multi-node environments.

• DAG of MR processing is working
– Already functionally equivalent to Map Reduce. Existing Map
Reduce jobs can be executed on Tez with few or no changes.
– Working Hive prototype that can target Tez for execution of
queries (HIVE-4660).
– Work started on prototype of Pig that can target Tez.

© Hortonworks Inc. 2013 - Confidential

Page 42
Tez – Current status
Dimension
Table 1

Dimension
Table 1

Fact Table

Fact Table

Join

Dimension
Table 2

Result
Table 1

Optimization for
small data sets

Dimension
Table 1
Dimension
Table 1

Join

Result
Table 2

Dimension
Table 3
Join

Typical pattern in a
TPC-DS query

© Hortonworks Inc. 2013 - Confidential

Result
Table 3

Both can now run
as a single Tez job

Page 43
Tez – MRR Performance
TPC-DS Query 12 with Hive on Tez
80
75

70

65

Elapsed Time (seconds)

60

50

55

55

54

46

40

30

35

34

RC File
Scale 200

ORC File
Scale 200

Traditional
Map-Reduce
Tez Map
Reduce Reduce

20

10

0

© Hortonworks Inc. 2013 - Confidential

RC File
Scale 1000

ORC File
Scale 1000
Page 44
Tez – Roadmap
• Full DAG support
– Multi-way input and output.
– Other graph connection patterns.

• Performance optimizations
– Container reuse
– Cross task shared resources
– Using HDFS data caching

• Runtime plan optimizations
– Automatic input (map) parallelism
– Automatic aggregation (reduce) parallelism

• Usability.
– Stability and testability
– Recovery and history
© Hortonworks Inc. 2013 - Confidential

Page 45
Tez – Community
• Early adopters and contributors welcome
– Adopters to drive more scenarios. Contributors to make them
happen.
– Hive and Pig communities are on-board and making great
progress - HIVE-4660 and PIG-3446

• Stay tuned for Tez meetups with deep dives on Tez
architecture and using Tez
– http://www.meetup.com/Apache-Tez-User-Group

• Useful links
– Work tracking: https://issues.apache.org/jira/browse/TEZ
– Code: https://github.com/apache/incubator-tez
– Developer list: dev@tez.incubator.apache.org
User list: user@tez.incubator.apache.org
Issues list: issues@tez.incubator.apache.org
© Hortonworks Inc. 2013 - Confidential

Page 46
Tez – Takeaways
• Distributed execution framework that works on
computations represented as dataflow graphs
• Naturally maps to execution plans produced by query
optimizers
• Execution architecture designed to enable dynamic
performance optimizations at runtime
• Open source Apache project – your use-cases and
code are welcome
• It works and is already being used by Hive

© Hortonworks Inc. 2013 - Confidential

Page 47
Tez
https://github.com/t3rmin4t0r/tez-autobuild
Tez:

https://github.com/apache/tez.git

Demo:

https://github.com/t3rmin4t0r/tez-autobuild

Thanks for your time and attention!
Questions?

© Hortonworks Inc. 2013 - Confidential

Page 48

Mais conteúdo relacionado

Mais procurados

Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hortonworks
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformHortonworks
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopHortonworks
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 

Mais procurados (20)

Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 

Semelhante a Munich HUG 21.11.2013

Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 

Semelhante a Munich HUG 21.11.2013 (20)

Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Munich HUG 21.11.2013

  • 1. Hortonworks: We Do Hadoop. Our mission is to enable your Modern Data Architecture by delivering One Enterprise Hadoop November 2013 © Hortonworks Inc. 2013 - Confidential Page 1
  • 2. Agenda • Hortonworks Overview of Tez – Quick and painless • A driver for Tez: The Stinger Initiative • Tez Deep Dive • Demo Page 2
  • 3. A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013 2004 2006 2008 2010 2005: Hadoop created at Yahoo! 2012 Focus on INNOVATION 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Focus on OPERATIONS 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo © Hortonworks Inc. 2013 - Confidential Enterprise Hadoop STABILITY Page 3
  • 4. Our Mission: Enable your Modern Data Architecture by delivering One Enterprise Hadoop Our Commitment Headquarters: Palo Alto, CA Employees: 240+ and growing Customers: 120+ and growing Investors: Benchmark, Index, Yahoo, Dragoneer, Tenaya Innovate in the Open We employ the core architects and operators of Hadoop and drive innovation through open source Apache Foundation projects to avoid vendor lock-in Certify for the Enterprise Trusted Partners with: We engineer, test and certify the Hortonworks Data Platform for enterprise usage and deliver the highest quality of support Interoperate with the Ecosystem We work with partners to deeply integrate Hadoop with key technologies so you can leverage existing skills and investments © Hortonworks Inc. 2013 - Confidential Page 4
  • 5. DATA SYSTEM APPLICATIONS Goal: Interoperable and Familiar BusinessObjects BI DEV & DATA TOOLS OPERATIONAL TOOLS RDBMS HANA EDW MPP SOURCES INFRASTRUCTURE Existing Sources Emerging Sources (CRM, ERP, Clickstream, Logs) (Sensor, Sentiment, Geo, Unstructured) © Hortonworks Inc. 2013 - Confidential Page 5
  • 6. Betting on Hortonworks… HDInsight & HDP for Windows Teradata Portfolio for Hadoop • Only Hadoop Distribution for Windows Azure & Windows Server • Seamless data access between Teradata and Hadoop (SQL-H) • Native integration with SQL Server, Excel, and System Center • Simple management & monitoring with Viewpoint integration • Extends Hadoop to .NET community • Flexible deployment options Instant Access + Infinite Scale • SAP can assure their customers they are deploying an SAP HANA + Hadoop architecture fully supported by SAP • Enables analytics apps (BOBJ) to interact with Hadoop Complete Portfolio for Hadoop UDA Diagram Appliances © Hortonworks Inc. 2013 - Confidential Page 6
  • 7. Hortonworks Approach to Enterprise Hadoop Community Driven Enterprise Apache Hadoop Identify and introduce enterprise requirements into the public domain Work with the community to advance and incubate open source projects Apply Enterprise Rigor to provide the most stable and reliable distribution © Hortonworks Inc. 2013 - Confidential
  • 8. Driving Hadoop Innovation Total Net Lines Contributed to Apache Hadoop End Users 449,768 lines Hortonworks engineers focus on making Apache Hadoop an enterprise viable platform that powers modern data architectures and deeply integrates with existing data center technologies 614,041 lines 147,933 lines 10 Others 21 63 total LinkedIn: 3 IBM: 3 Facebook: 5 Yahoo: 10 Cloudera: 7 Total Number of Committers to Apache Hadoop © Hortonworks Inc. 2013 - Confidential
  • 9. HDP: Enterprise Hadoop Platform OPERATIONAL SERVICES AMBARI FLUME HBASE FALCON* OOZIE Hortonworks Data Platform (HDP) DATA SERVICES PIG SQOOP HIVE & HCATALOG • The ONLY 100% open source and complete platform LOAD & EXTRACT HADOOP CORE NFS WebHDFS MAP REDUCE TEZ YARN HDFS Enterprise Readiness PLATFORM SERVICES KNOX* High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OS/VM Cloud © Hortonworks Inc. 2013 - Confidential • Integrates full range of enterprise-ready services • Certified and tested at scale • Engineered for deep ecosystem interoperability Appliance Page 9
  • 10. Hortonworks: The Value of “Open” for You Connect With the Hadoop Community We employ a large number of Apache project committers & innovators so that you are represented in the open source community Avoid Vendor Lock Hortonworks Data Platform remain as close to the open source trunk as possible and is developed 100% in the open so you are never locked in The partners you rely on, rely on Hortonworks We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments Certified for the Enterprise We engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use Support from the experts We provide the highest quality of support for deploying at scale. You are supported by hundreds of years of Hadoop experience © Hortonworks Inc. 2013 - Confidential Page 10
  • 11. SQL-in-Hadoop with Apache Hive Business Analytics Custom Apps SQL Hadoop Hive MapReduce Tez YARN • Apache Hive is the standard for SQL interaction with Hadoop – Enterprise makes final purchasing decision on two key characteristics: 'compatibility' with existing investments (60%) and skills (20%) – Most application claim Hive compatibility TODAY* HDFS • Stinger Initiative: Simple Focus Improves existing tools & preserves investments – Performance – SQL-Compatibility Claims publicly made by: Teradata, Microsoft, Oracle, Microstrategy, IBM, Information Builders, SAS, QlikTech, SAP, Tableau, Tibco, Actuate, Jaspersoft, Alteryx, Datameer, Pentah o © Hortonworks Inc. 2013 - Confidential Page 11
  • 12. Stinger Initiative Goals Execution Engine + Tez Windowing & Subqueries Query Planner Hive + Data Types + File Format = 100X ORC file = SQL Compatible • Enables Hive to support interactive workloads • Improves existing tools & preserves investments © Hortonworks Inc. 2013 - Confidential
  • 13. Stinger: Hive For All Analytics Parameterized Reports Enterprise Reports Dashboard / Scorecard Data Mining Visualization 100X Faster + SQL Compatible Interactive © Hortonworks Inc. 2013 - Confidential Batch
  • 14. Stinger Roadmap • Join optimizations • ORCFile • SQL:2003 windowing functions DATA TYPES • Subqueries for IN, NOT IN, HAVING • Datatypes: CHAR, VARCHAR, DATETIME • Improvements to DECIMAL datatype • Integration with Tez and Tez Service • Vectorization Preview • Intelligent Optimizer • Column Statistics • Authentication and Authorization Enhancements • Full vector query © Hortonworks Inc. 2013 - Confidential Page 14
  • 15. Stinger: Some early Results • Query Engine Work ONLY • Uses TPC “style” benchmark • Just a few weeks of work • OTHER work coming © Hortonworks Inc. 2013 - Confidential Page 15
  • 16. Apache Tez : Accelerating Hadoop Query Processing © Hortonworks Inc. 2013 - Confidential Page 16
  • 17. Tez – Introduction • Distributed execution framework targeted towards data-processing applications. • Based on expressing a computation as a dataflow graph. • Built on top of YARN – the resource management framework for Hadoop. • Open source Apache incubator project and Apache licensed. © Hortonworks Inc. 2013 - Confidential Page 17
  • 18. Old School Hadoop: MapReduce © Hortonworks Inc. 2013 - Confidential
  • 19. Fundamentals of YARN • The fundamental idea of YARN is to split up the two major responsibilities of the JobTracker/TaskTracker into separate entities: – a global ResourceManager – a per-application ApplicationMaster. – a per-node slave NodeManager and – a per-application Container running on a NodeManager © Hortonworks Inc. 2013 - Confidential Page 19
  • 20. New School Hadoop with YARN Node Manager Container App Mstr Client Resource Manager Node Manager Client App Mstr MapReduce Status Job Submission Node Status Resource Request © Hortonworks Inc. 2013 - Confidential Container Node Manager Container Container
  • 21. Tez – Design Themes • Empowering End Users • Execution Performance © Hortonworks Inc. 2013 - Confidential Page 21
  • 22. Tez – Empowering End Users • Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying deployment © Hortonworks Inc. 2013 - Confidential Page 22
  • 23. Tez – Empowering End Users • Expressive dataflow definition API’s – Enable definition of complex data flow pipelines using simple graph connection API’s. Tez expands the logical plan at runtime. – Targeted towards data processing applications like Hive/Pig but not limited to it. Hive/Pig query plans naturally map to Tez dataflow graphs with no translation impedance. TaskA-1 TaskA-2 TaskD-1 © Hortonworks Inc. 2013 - Confidential TaskB-1 TaskD-2 TaskB-2 TaskC-1 TaskE-1 TaskC-2 TaskE-2 Page 23
  • 24. Tez – Empowering End Users • Expressive dataflow definition API’s Task-1 Task-2 Task-1 Task-2 Sample s Sampler Preprocessor Stage Ranges Distributed Sort © Hortonworks Inc. 2013 - Confidential Task-1 Task-2 Partition Stage Aggregate Stage Page 24
  • 25. Tez – Empowering End Users • Flexible Input-Processor-Output runtime model – Construct physical runtime executors dynamically by connecting different inputs, processors and outputs. – End goal is to have a library of inputs, outputs and processors that can be programmatically composed to generate useful tasks. ShuffleInput ShuffleInput ReduceProcessor ReduceProcessor JoinProcessor FileSortedOutput HDFSOutput FileSortedOutput IntermediateReduce FinalReduce PairwiseJoin © Hortonworks Inc. 2013 - Confidential Input1 Input2 Page 25
  • 26. Tez – Empowering End Users • Data type agnostic – Tez is only concerned with the movement of data. Files and streams of bytes. – Does not impose any data format on the user application. MR application can use Key-Value pairs on top of Tez. Hive and Pig can use tuple oriented formats that are natural and native to them. Tez Task File Bytes User Code Key Value Bytes Tuples Stream © Hortonworks Inc. 2013 - Confidential Page 26
  • 27. Tez – Empowering End Users • Simplifying deployment – Tez is a completely client side application. – No deployments to do. Simply upload to any accessible FileSystem and change local Tez configuration to point to that. – Enables running different versions concurrently. Easy to test new functionality while keeping stable versions for production. – Leverages YARN local resources. HDFS Tez Lib 1 Tez Lib 2 TezClient TezTask TezTask TezClient Client Machine Node Manager Node Manager Client Machine © Hortonworks Inc. 2013 - Confidential Page 27
  • 28. Tez – Empowering End Users • Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying usage With great power API’s come great responsibilities  Tez is a framework on which end user applications can be built © Hortonworks Inc. 2013 - Confidential Page 28
  • 29. Tez – Execution Performance • Performance gains over Map Reduce • Optimal resource management • Plan reconfiguration at runtime • Dynamic physical data flow decisions © Hortonworks Inc. 2013 - Confidential Page 29
  • 30. Tez – Execution Performance • Performance gains over Map Reduce – Eliminate replicated write barrier between successive computations. – Eliminate job launch overhead of workflow jobs. – Eliminate extra stage of map reads in every workflow job. – Eliminate queue and resource contention suffered by workflow jobs that are started after a predecessor job completes. Pig/Hive - MR © Hortonworks Inc. 2013 - Confidential Pig/Hive - Tez Page 30
  • 31. Tez – Execution Performance • Optimal resource management – Reuse YARN containers to launch new tasks. – Reuse YARN containers to enable shared objects across tasks. Start Task Tez Application Master Task Done Start Task YARN Container © Hortonworks Inc. 2013 - Confidential TezTask1 TezTask2 Shared Objects TezTask Host YARN Container Page 31
  • 32. Tez – Execution Performance • Plan reconfiguration at runtime – Dynamic runtime concurrency control based on data size, user operator resources, available cluster resources and locality. – Advanced changes in dataflow graph structure. – Progressive graph construction in concert with user optimizer. HDFS Blocks Stage 1 50 maps 100 partitions Stage 2 100 reducers Stage 1 50 maps 100 partitions Only 10GB’s of data Stage 2 100 10 reducers YARN Resources © Hortonworks Inc. 2013 - Confidential Page 32
  • 33. Tez – Execution Performance • Dynamic physical data flow decisions – Decide the type of physical byte movement and storage on the fly. – Store intermediate data on distributed store, local store or inmemory. – Transfer bytes via blocking files or streaming and the spectrum in between. Producer (small size) Producer Local File Consumer © Hortonworks Inc. 2013 - Confidential At Runtime In-Memory Consumer Page 33
  • 34. Tez – Deep Dive – API Simple DAG definition API DAG dag = new DAG(); Vertex map1 = new Vertex(MapProcessor.class); Vertex map2 = new Vertex(MapProcessor.class); Vertex reduce1 = new Vertex(ReduceProcessor.class); Vertex reduce2 = new Vertex(ReduceProcessor.class); Vertex join1 = new Vertex(JoinProcessor.class); ……. Edge edge1 = Edge(map1, reduce1, SCATTER_GATHER, PERSISTED, SEQUENTIAL, MOutput.class, RInput.class); Edge edge2 = Edge(map2, reduce2, SCATTER_GATHER, PERSISTED, SEQUENTIAL, MOutput.class, RInput.class); Edge edge3 = Edge(reduce1, join1, SCATTER_GATHER, PERSISTED, SEQUENTIAL, MOutput.class, RInput.class); Edge edge4 = Edge(reduce2, join1, SCATTER_GATHER, PERSISTED, SEQUENTIAL, MOutput.class, RInput.class); ……. dag.addVertex(map1).addVertex(map2) .addVertex(reduce1).addVertex(reduce2) .addVertex(join1) .addEdge(edge1).addEdge(edge2) .addEdge(edge3).addEdge(edge4); © Hortonworks Inc. 2013 - Confidential map1 map2 Scatter_Gather Bipartite Sequential reduce1 reduce2 Scatter_Gather Bipartite Sequential join1 Page 34
  • 35. Tez – Deep Dive – API Edge properties define the connection between producer and consumer vertices in the DAG • Data movement – Defines routing of data between tasks – One-To-One : Data from the ith producer task routes to the ith consumer task. – Broadcast : Data from a producer task routes to all consumer tasks. – Scatter-Gather : Producer tasks scatter data into shards and consumer tasks gather the data. The ith shard from all producer tasks routes to the ith consumer task. • Scheduling – Defines when a consumer task is scheduled – Sequential : Consumer task may be scheduled after a producer task completes. – Concurrent : Consumer task must be co-scheduled with a producer task. • Data source – Defines the lifetime/reliability of a task output – Persisted : Output will be available after the task exits. Output may be lost later on. – Persisted-Reliable : Output is reliably stored and will always be available – Ephemeral : Output is available only while the producer task is running © Hortonworks Inc. 2013 - Confidential Page 35
  • 36. Tez – Deep Dive – Scheduling Start vertex • Vertex Scheduler Determines when tasks in a vertex can start Get container map1 Get Priority • DAG Scheduler Determines priority of task Start vertex • Task Scheduler Allocates containers from YARN and assigns them to tasks Vertex Scheduler DAG Scheduler Task Scheduler Start tasks reduce1 Get Priority Get container © Hortonworks Inc. 2013 - Confidential Page 36
  • 37. Tez – Deep Dive – Task Execution • Start task shell with user specified env, resources etc. • Fetch and instantiate Input, Processor, O utput objects • Receive (incremental) input information and process the input • Provide output information © Hortonworks Inc. 2013 - Confidential Task Attempt (logical in AM) Env, cmd line, resources Input Processor Output Task Attempt (real on machine) Start container Task JVM Get Task Input Processor Data Information Data Events Output Page 37
  • 38. Tez - Sessions • The amount of work programmed into a script/query may not be doable within a single Tez DAG. © Hortonworks Inc. 2013 - Confidential Page 38
  • 39. Tez - Sessions • Even better performance gains may be achieved through caching with the session: Within AM or container © Hortonworks Inc. 2013 - Confidential Page 39
  • 40. Tez – Automatic Reduce Parallelism Event Model Map tasks send data statistics events to the Reduce Vertex Manager. Vertex Manager Pluggable user logic that understands the data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism Data Size Statistics Vertex Manager Map Vertex Set Parallelism Re-Route Vertex State Machine App Master Reduce Vertex Cancel Task © Hortonworks Inc. 2013 - Confidential Page 40
  • 41. Tez – Reduce Slow Start/Pre-launch Event Model Map completion events sent to the Reduce Vertex Manager. Vertex Manager Pluggable user logic that understands the data size. Advises the vertex controller to launch the reducers before all maps have completed so that shuffle can start. © Hortonworks Inc. 2013 - Confidential Task Completed Vertex Manager Map Vertex Start Tasks Vertex State Machine App Master Start Reduce Vertex Page 41
  • 42. Tez – Current status • Apache Incubator Project – Rapid development. Over 330 jiras opened. Over 220 resolved. – Growing community. • Focus on stability – Testing and quality are highest priority. – Working on Tez+YARN to fix basic performance overheads. – Code ready and deployed on multi-node environments. • DAG of MR processing is working – Already functionally equivalent to Map Reduce. Existing Map Reduce jobs can be executed on Tez with few or no changes. – Working Hive prototype that can target Tez for execution of queries (HIVE-4660). – Work started on prototype of Pig that can target Tez. © Hortonworks Inc. 2013 - Confidential Page 42
  • 43. Tez – Current status Dimension Table 1 Dimension Table 1 Fact Table Fact Table Join Dimension Table 2 Result Table 1 Optimization for small data sets Dimension Table 1 Dimension Table 1 Join Result Table 2 Dimension Table 3 Join Typical pattern in a TPC-DS query © Hortonworks Inc. 2013 - Confidential Result Table 3 Both can now run as a single Tez job Page 43
  • 44. Tez – MRR Performance TPC-DS Query 12 with Hive on Tez 80 75 70 65 Elapsed Time (seconds) 60 50 55 55 54 46 40 30 35 34 RC File Scale 200 ORC File Scale 200 Traditional Map-Reduce Tez Map Reduce Reduce 20 10 0 © Hortonworks Inc. 2013 - Confidential RC File Scale 1000 ORC File Scale 1000 Page 44
  • 45. Tez – Roadmap • Full DAG support – Multi-way input and output. – Other graph connection patterns. • Performance optimizations – Container reuse – Cross task shared resources – Using HDFS data caching • Runtime plan optimizations – Automatic input (map) parallelism – Automatic aggregation (reduce) parallelism • Usability. – Stability and testability – Recovery and history © Hortonworks Inc. 2013 - Confidential Page 45
  • 46. Tez – Community • Early adopters and contributors welcome – Adopters to drive more scenarios. Contributors to make them happen. – Hive and Pig communities are on-board and making great progress - HIVE-4660 and PIG-3446 • Stay tuned for Tez meetups with deep dives on Tez architecture and using Tez – http://www.meetup.com/Apache-Tez-User-Group • Useful links – Work tracking: https://issues.apache.org/jira/browse/TEZ – Code: https://github.com/apache/incubator-tez – Developer list: dev@tez.incubator.apache.org User list: user@tez.incubator.apache.org Issues list: issues@tez.incubator.apache.org © Hortonworks Inc. 2013 - Confidential Page 46
  • 47. Tez – Takeaways • Distributed execution framework that works on computations represented as dataflow graphs • Naturally maps to execution plans produced by query optimizers • Execution architecture designed to enable dynamic performance optimizations at runtime • Open source Apache project – your use-cases and code are welcome • It works and is already being used by Hive © Hortonworks Inc. 2013 - Confidential Page 47

Notas do Editor

  1. I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when the team at yahoo! – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  2. Make Hadoop an enterprise data platformInnovate core platform, data, & operational servicesIntegrate deeply with enterprise ecosystemProvide world-class enterprise supportDrive 100% open source software development and releases through the core Apache projectsAddress enterprise needs in community projectsEstablish Apache foundation projects as “the standard”Promote open community vs. vendor control / lock-inEnable the Hadoop market to functionMake it easy for enterprises to deploy at scaleBe the best at enabling deep ecosystem integrationCreate a pull market with key strategic partners
  3. Make Hadoop an enterprise data platformInnovate core platform, data, & operational servicesIntegrate deeply with enterprise ecosystemProvide world-class enterprise supportDrive 100% open source software development and releases through the core Apache projectsAddress enterprise needs in community projectsEstablish Apache foundation projects as “the standard”Promote open community vs. vendor control / lock-inEnable the Hadoop market to functionMake it easy for enterprises to deploy at scaleBe the best at enabling deep ecosystem integrationCreate a pull market with key strategic partners
  4. Buzz about low latency access in Hadoop
  5. Hortonworks Unveils Stinger Initiative to Make Apache Hive 100X Faster for Interactive QueriesHortonworks leading effort with a group of community contributors focusing on enhancing Apache Hive, the defacto standard for SQL access to HadoopEnterprise Reports – Your cell phone bill is an exampleDashboard – KPI trackingParameterized Reports – What are the hot prospects in my region?Visualization – Visual exploration of dataData Mining – Large scale data processing and extraction usually fed to other toolsHow?Improve Latency & ThroughputQuery engine improvementsNew “Optimized RCFile” column storeNext-gen runtime (elim’s M/R latency)Extend Deep Analytical AbilityAnalytics functionsImproved SQL coverageContinued focus on core Hive use cases
  6. Time (y-axis) in seconds. Smaller is better.