Hadoop Operations & Enterprise Readiness for HDP 1.2

Hadoop Operations &
Enterprise Readiness
HDP 1.2
Jim Walker
Jeff Sposetti

© Hortonworks Inc. 2013 Page 1

Hortonworks Snapshot

We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution

Develop Distribute Support
• We employ the core • We distribute the only 100% • We are uniquely positioned
architects, builders and Open Source Enterprise to deliver the highest quality
operators of Apache Hadoop Hadoop Distribution: of Hadoop support
Hortonworks Data
• We drive innovation within Platform • We enable the ecosystem to
Apache Software work better with Hadoop
Foundation projects • We engineer, test & certify
HDP for enterprise usage

Endorsed by Strategic Partners

Page 2
© Hortonworks Inc. 2013

Hortonworks Process for Enterprise Hadoop
Upstream Community Projects Downstream Enterprise Product
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
Integrate
& Test

Fixed Issues

Apache Design &
Pig Test &
Patch Develop
Apache Release Package
Hadoop & Certify
Apache Stable Project Hortonworks
Hive Releases
Design & Develop Data Platform

Apache
Apache HCatalo
HBase g
Distribute
Apache
Other Ambari
Apache
Projects No Lock-in: Integrated, tested & certified distribution lowers
risk by ensuring close alignment with Apache projects

Page 3

Hortonworks Data Platform 1.2
• Quarterly cadence
– HDP is aligned tightly with the open source community
software releases, not a patchwork
– Regular open source innovation based on an open
community

• Ecosystem validation
– Packaged and tested with our key development partner,
Yahoo! across hundreds of nodes
– Ambari is the preferred management tool for integration with
of Microsoft System Center and Teradata Viewpoint, today.

Page 4

HDP 1.2 Summary
Hortonworks Data Platform 1.2
Hortonworks Data Platform outpaces the competition to extend
leadership through 100% open source Enterprise Apache Hadoop

Focus areas:
1. Ambari: continued innovation with a complete,
free and open cluster management tool
• Existing: Provision, Manage and Monitor your Hadoop infrastructure
• New: Root Cause Analysis with job diagnostics, usage heat maps,
• Improved: Ecosystem integration and user interface
2. Enhanced security model and performance
for Hive and HCatalog
3. Apache Mahout: now included in the HDP distribution

Page 5

HDP Certifies Latest Stable Components

Apache HDP CDH CDH
Project 1.2 3u5 4.1.2
Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541
Pig 0.10.1 0.8.1 +51.39 0.10.0 +48
Hive 0.10.0 0.7.1 +42.56 0.9.0 +148
HCatalog 0.5.0 n/a n/a
HBase 0.94.2 0.90.6 +84.73 0.92.1 +154
Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51
Oozie 3.2.0 3.2.0 3.2.0
Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25
Ambari 1.2.0 n/a n/a
Flume 1.3.0 0.9.4 +25.46 1.2.0 +119
Mahout 0.7.0 0.5 +9.7 0.7 +4

Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf

Page 6

A Brief History of Apache Hadoop

Apache Project Yahoo! begins to Hortonworks
Established Operate at scale Data Platform

2013
2004 2006 2008 2010 2012 Enterprise
Hadoop
2005: Yahoo! creates
team under E14 to Focus on INNOVATION
work on Hadoop

2008: Yahoo team extends focus to
operations to support multiple Focus on OPERATIONS
projects & growing clusters

2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24 STABILITY
key Hadoop engineers from Yahoo

Page 7

HDP: Enterprise Hadoop Distribution

OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage & Store,
Operate at Process and Enterprise Hadoop
Scale Access Data

• The ONLY 100% open source
HADOOP CORE
Distributed and complete distribution
Storage & Processing

PLATFORM SERVICES Enterprise Readiness • Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability

Page 8

Next-Generation Data Architecture
APPLICATIONS

Business Custom Enterprise
Analytics Applications Applications
DEV & DATA
TOOLS
BUILD &
TEST
DATA SYSTEMS

OPERATIONAL
TOOLS
HORTONWORKS MANAGE &
DATA PLATFORM MONITOR
RDBMS EDW MPP
TRADITIONAL REPOS
DATA SOURCES

Traditional Sources New Sources
OLTP, (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media)
MOBILE
POS DATA
SYSTEMS

Page 9

HDP 1.2: Operational Services Improvements

OPERATIONAL DATA Apache Ambari 1.2
SERVICES SERVICES Hortonworks open source
approach continues to accelerate
Manage &
AMBARI Store,
Operate at Process and enterprise adoption of Hadoop
Scale Access Data
OOZIE
– Open Source Approach
The only 100% open source Apache
Distributed Hadoop cluster management tool
HADOOP CORE Storage & Processing
– Baseline Features
Enterprise Readiness Delivers all necessary tools/functions
PLATFORM SERVICES High Availability, Disaster Recovery, to provision, manage and monitor a
Snapshots, Security, etc…
Apache Hadoop cluster

HORTONWORKS – Innovation
Provides ability to zoom into cluster
DATA PLATFORM (HDP) usage and performance metrics for
jobs and tasks to identify root cause of
bottlenecks or operations issues
– Interoperable
Includes APIs for integrating with
Microsoft System Center, Teradata
Viewpoint, and other systems

Also Upgraded Oozie & Zookeeper 10
Page

HDP 1.2: New Ambari Features
• Job Diagnostics
Visualize and troubleshoot Hadoop
job execution and performance

• Cluster History
View historical job execution &
performance

• REST interface
provides external access to Ambari
for existing tools. Facilitates
integration with Microsoft System
Center and Teradata Viewpoint

• Instant Insight
View health of Core Hadoop
(HDFS, MapReduce) and related
projects

• Cluster Navigation
Apache Ambari Dashboard “Quick link” buttons jump into
namenode web UI for a server

Page 11

Demo

Page 12

HDP 1.2: Platform Service Improvements

OPERATIONAL DATA Security
SERVICES SERVICES Extend platform services for
security, a KEY requirement for
Manage & Store,
Operate at Process and enterprise adoption of Hadoop
Scale Access Data
– Enhanced security architecture &
pluggable authentication model
controls access to Hive tables and
Distributed
HADOOP CORE Storage & Processing
metastore
– Aligns and improves Hive & HCatalog
Enterprise Readiness
PLATFORM SERVICES High Availability, Disaster Recovery,
authentication models

HORTONWORKS High Availability
DATA PLATFORM (HDP)
Full stack HA on Hadoop 1.0
– Extended HA to Hive & HCatalog
Metastore

Page 13

HDP 1.2: Data Services Improvements
Data Services Updates
OPERATIONAL DATA
SERVICES SERVICES – Upgraded Pig, and Flume

FLUME
PIG HIVE – Added Mahout (0.7.0) to distribution
Manage & Store,
Operate at MAHOUT
Process and HBASE
Scale SQOOP Access Data
HCATALOG Hive, HCatalog & HBase
Continue to innovate & improve the data
Distributed services with open source contributions
HADOOP CORE Storage & Processing to HCatalog, Hive and HBase

Enterprise Readiness – Concurrency improvements for Hive
PLATFORM SERVICES High Availability, Disaster Recovery, and consistent security for Hive &
HCatalog

HORTONWORKS – Performance and operational
enhancements for HBase
DATA PLATFORM (HDP)
– Improved Java developer productivity
via certified Cascading framework

Page 14

Apache Community Leadership
Apache
Apache Software Foundation
Pig Test & Guiding Principles
Patch Release
Apache • Release early & often
Hadoop
Apache • Transparency, respect, meritocracy
Hive
Design & Develop

Apache
Key Roles held by Hortonworkers
Apache
HBase
HCatalo
g
• PMC Members
– Managing community projects
Apache
Ambari
– Mentoring new incubator projects
Other
Apache – About 20 Hortonworkers managing community
Projects

• Committers
– Authoring, reviewing & editing code
– About 50 Hortonworkers across projects
“We have noticed more activity over the last year
from Hortonworks’ engineers on building out
Apache Hadoop’s more innovative features. These • Release Managers
include YARN, Ambari and HCatalog..” – Testing & releasing projects
– Hortonworkers across key projects like Hadoop,
- Jeff Kelly: Wikibon Hive, Pig, HCatalog, Ambari, HBase

Page 17

True Enterprise Class Open Source
• 100% Open Source. No Holdbacks.
– Only true implementation of OSS Apache Hadoop
– Preferred by the software vendors that you rely on

• Flexible Deployment
– No License Fee for usage

• Community Open Source Mitigates Lock-In
– Proprietary Open Source = Lock-In
– Open communities always trump “open source”

Page 18

THANK YOU!!
Download Hortonworks Sandbox
www.hortonworks.com/sandbox

Download Hortonworks Data Platform
www.hortonworks.com/download

Register for Enterprise Hadoop Series
www.hortonworks.com/webinars

@hortonworks
Follow US! @jaymce
@jsposetti

Page 19

Hadoop Operations & Enterprise Readiness for HDP 1.2

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (10)

Semelhante a Hadoop Operations & Enterprise Readiness for HDP 1.2

Semelhante a Hadoop Operations & Enterprise Readiness for HDP 1.2 (20)

Mais de Hortonworks

Mais de Hortonworks (20)

Hadoop Operations & Enterprise Readiness for HDP 1.2

Notas do Editor