SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
© Hortonworks Inc. 2013
Best Practices
Virtualizing Hadoop
George Trujillo
© Hortonworks Inc. 2013
George Trujillo
Â§ï‚§â€ŻMaster Principal Big Data Specialist - Hortonworks
Â§ï‚§â€ŻTier One Big Data/BCA Specialist – VMware Center of Excellence
Â§ï‚§â€ŻVMware Certified Instructor (VMware Certified Professional)
Â§ï‚§â€ŻMySQL Certified DBA
Â§ï‚§â€ŻSun Microsystem's Ambassador for Java Platforms
Â§ï‚§â€ŻAuthor of Linux Administration and Advanced Linux Administration
Video Training
Â§ï‚§â€ŻRecognized Oracle Double ACE by Oracle Corporation
Â§ï‚§â€ŻServed on Oracle Fusion Council & Oracle Beta Leadership Council,
Independent Oracle Users Group (IOUG) Board of Directors,
Recognized as one of the “Oracles of Oracle” by IOUG
Page 2
© Hortonworks Inc. 2013
Agenda
‱ Hypervisor’s today
‱ Building an enterprise virtual platform
‱ Virtualizing Master and Slave servers
‱ Best practices
‱ Deploying Hadoop in public and private clouds
Page 3
© Hortonworks Inc. 2013
Hypervisors Today: Faster/Less Overhead
‱ VMware vSphere, Microsoft Hyper-V Server, Citrix
XenServer and RedHat RHEV
Page 4
Hypervisor Performance Benchmarks % Overhead
VMware 1M IOPS with 1 microsecond of latency (5.1) 2 – 10%
KVM 1M transactions/minute (IBM hardware RHEL) < 10%
Hypervisor Performance vSphere 5.1
VMware vCPUs 64
RAM per VM, RAM per Host 1TB / 2TB
Network 36 GB/s
IOPS/VM 1,000,000
© Hortonworks Inc. 2013
Why Virtualize Hadoop?
‱ Virtual Servers offer advantages over Physical Servers
‱ Standardization: On a Single Common software stack
‱ Higher consistency and reliability due to abstracting the
hardware environment
‱ Operational flexibility with vMotion, Storage vMotion, Live
Cloning, template deployments, hot memory and CPU add,
Distributed Resource Scheduling, private VLANs, Storage and
Network I/O control, etc.
‱ Virtualization is a natural step towards the cloud
‱ Enabling Hadoop as a service in a public or private cloud
‱ Cloud providers are making it easy to deploy Hadoop for POCs,
dev and test environments
‱ Cloud and virtualization vendors are offering elastic MapReduce
solutions
Page 5
© Hortonworks Inc. 2013
Virtualization Features
Page 6
Faster provisioning Live Cloning
Live migrations Templates
Live storage migrations Distributed Resource Scheduling
High Availability Hot CPU and Memory add
Live Cloning VM Replication
Network isolation using VXLANs Multi-VM trust zones
VM Backups Distributed Power Management
Elasticity Multi-tenancy
Storage/Network I/O Control Private virtual networks
16Gb FC Support iSCSI Jumbo Frame Support
Note: Features/functionality dependent on the hypervisor
© Hortonworks Inc. 2013
Hortonworks Data Platform
Building an Enterprise Virtual Platform
Page 7
Hardware
Linux Windows
Distributed Storage
(HDFS)
Distributed Processing
(MapReduce)
Hive
(Query)
Pig
(Scripting)
HCatalog
(Metadata Mgmt)
Zookeeper
(Coordination)
HBase
(Column DB)
WebHCatalog
(Rest-like APIs)
Ambari
(Management)
Mahout
(Machine Learning)
Oozie
(Workflow)
Ganglia
(Monitoring)
Nagios
(Alerts)
Sqoop
(DB Transfer)
WebHDFS
(REST API)
“Others”
(Talend, Informatica, etc.)
Data Extraction
And Load
Management
Monitoring
Hadoop
Essentials
Core Hadoop
(kernel)
FlumeNG
(Data Transfer)
Hypervisor
© Hortonworks Inc. 2013
Virtualizing Hadoop
‱ The primary goal of virtualizing master and slave servers is the
same, to maximize operational efficiency and leverage existing
hardware.
‱ However the strategy for virtualizing Hadoop master servers is
different than virtualizing Hadoop slave servers.
– Hadoop master servers can follow virtualization best practices and
guidelines for tier1 and business critical environments.
– Hadoop slave servers need to follow virtualization best practices and
also use Hadoop Virtual Extensions so a Hadoop cluster is “virtual
aware”.
Page 8
© Hortonworks Inc. 2013
Virtualizing Master Servers
‱ Virtualize the master servers (NameNode, JobTracker,
HBase Master, Secondary NameNode)
– Consider any key management servers: Ganglia, Nagios, Ambari,
Active Directory, Metadata databases
‱ Goals of a virtual enterprise Hadoop platform:
– Less down time (Live migrations, cloning, 
)
– A more reliable software stack
– A higher Quality of Service
– Reduced CapEx and OpEx
– Increased operational flexibility with virtualization features
– VMware High Availability (with five clicks)
‱ Shared storage for the Hadoop master servers is required
to fully leverage virtualization features.
Page 9
© Hortonworks Inc. 2013
Configure Environment Properly
‱ Do not overcommit SLA or production environments
‱ Size virtual machines to avoid entering host “soft” memory
state and the likely breaking of host large pages into small
pages. Leave at least 6% of memory for the hypervisor
and VM memory overhead is conservative.
– If free memory drops below minFree (“soft” memory state),
memory will be reclaimed through ballooning and other memory
management techniques. All these techniques require breaking
host large pages into small pages.
‱ Leverage hyperthreading – make sure there is hardware
and BIOS support
– Hyper Threading – can improve performance up to 20%
‱ Do not set memory limits on production servers.
Page 10
© Hortonworks Inc. 2013
Configure Environment Properly (2)
‱ Run latest version of hypervisor, BIOS and virtual tools
‱ Verify BIOS settings enable all populated processor
sockets and enable all cores in each socket.
‱ Enable “Turbo Boost” in BIOS if processors support it.
‱ Disabling hardware devices (in BIOS) can free interrupt
resources.
– COM and LPT ports, USB controllers, floppy drives, network
interfaces, optical drives, storage controllers, etc
‱ Enable virtualization features in BIOS (VT-x, AMD-V, EPT,
RVI)
‱ Initially leave memory scrubbing rate at manufacturer’s
default setting.
Page 11
© Hortonworks Inc. 2013
More Best Practices
‱ Configure an OS kernel as a single-core or multi-core
kernel based on the number of vCPUs being used.
‱ Understand how NUMA affects your VMs – try to keep the
VM size within the NUMA node
– Look at disabling node interleaving (leave NUMA enabled)
– Maintain memory locality
‱ Let hypervisor control power mgmt by BIOS setting “OS
Controlled Mode”
‱ Enable C1E in BIOS
‱ Have a very good reason for using CPU affinity otherwise
avoid it like the plague
Page 12
© Hortonworks Inc. 2013
Linux Best Practices
‱ Kernel parameters:
– nofile=16384
– nproc=32000
– Mount with noatime and nodiratime attributes disabled
– File descriptors set to 65535
– File system read-ahead buffer should be increased to 1024 or 2,048.
– Epoll file descriptor limit should be increased to 4096
‱ Turn off swapping
‱ Use ext4 or xfs (mount noatime)
– Ext can be about 5% better on reads than xfs
– XFS can be 12-25% better on writes (and auto defrags in the
background)
‱ Linux 2.6.30+ can give 60% better energy consumption.
Page 13
© Hortonworks Inc. 2013
Networking Best Practices
‱ Separate VM traffic from live migration and management
traffic
– Separate NICs with separate vSwitches
‱ Leverage NIC teaming (at least 2 NICS per vSwitch)
‱ Leverage latest adapters and drivers from hypervisor
vendor
‱ Be careful with multi-queue networking: Hadoop drives a
high packet rate, but not high enough to justify the
overhead of multi-queue.
‱ Network:
– Channel bonding two GbE ports can give better I/O performance
– 8 Queues per port
Page 14
© Hortonworks Inc. 2013
Networking Best Practices (2)
‱ Evaluate these features with network adapters to leverage
hardware features:
– Checksum offload
– TCP segmentation offload(TSO)
– Jumbo frames (JF)
– Large receive offload(LRO)
– Ability to handle high-memory DMA (that is, 64-bit DMA
addresses)
– Ability to handle multiple Scatter Gather elements per Tx frame
‱ Optimize 10 Gigabit Ethernet network adapters
– Features like NetQueue can significantly improve performance of
10 Gigabit Ethernet network adapters in virtualized environments.
Page 15
© Hortonworks Inc. 2013
Storage Best Practices
‱ Make good storage decisions
– i.e. VMFS or Raw Device Mappings (RDM)
– VMDK – leverages all features of virtualization
– RDM – leverages features of storage vendors (replication,
snapshots, 
)
– Run in Advanced Host Controller interface mode (AHCI).
– Native Command Queuing enabled (NCQ)
‱ Use multiple vSCSI adapters and evenly distribute target
devices
‱ Use eagerzeroedthick for VMDK files or uncheck
Windows “Quick Format” option
‱ Makes sure there is block alignment for storage
Page 16
© Hortonworks Inc. 2013
Virtualizing Data Servers
‱ HVE is a new feature that extends the Hadoop topology
awareness mechanism to support rack and node groups
with hosts containing VMs.
– Data locality-related policies maintained within a virtual layer
‱ HVE merged into branch-1
– Available in Apache Hadoop 1.2, HDP 1.2
– https://issues.apache.org/jira/browse/HADOOP-8817
‱ Extensions include:
– Block placement and removal policies
– Balancer policies
– Task scheduling
– Network topology awareness
Page 17
© Hortonworks Inc. 2013
HVE: Virtualization Topology Awareness
Page 18
Host8
Rack1
Data Center
Rack2
NodeG 3 NodeG 4
Host7
VMVM
VMVM
VMVM
VMVM
Host6Host5
VMVM
VMVM
VMVM
VMVM
Host4Host3
VMVM
VMVM
VMVM
VMVM
Host2Host1
VMVM
VMVM
VMVM
VMVM
NodeG 1 NodeG 2
‱ HVE is a new feature that extends the Hadoop topology
awareness mechanism to support rack and node groups
with hosts containing VMs.
– Data locality-related policies maintained within a virtual layer.
© Hortonworks Inc. 2013
HVE: Replica Policies
Page 19
Standard Replica Policies Extension Replica Policies
1st replica is on local (closest) node of
the writer
Multiple replicas are not be placed on
the same node or on nodes under the
same node group
2nd replica is on separate rack of 1st
replica;
1st replica is on the local node or local
node group of the writer
3rd replica is on the same rack as the
2nd replica;
2nd replica is on a remote rack of the
1st replica
Remaining replicas are placed
randomly across rack to meet
minimum restriction.
Multiple replicas are not placed on the same node with standard or extension
replica placement/removal policies. Rules are maintained for the balancer.
© Hortonworks Inc. 2013
Follow Virtualization Best Practices
Page 20
Â§ï‚§â€ŻValidate virtualization and Hadoop configurations with
vendor hardware compatibility lists.
Hardware
Â§ï‚§â€ŻFollow recommended Hadoop reference architectures.Hadoop
Â§ï‚§â€ŻReview storage vendor recommendations.Storage
Â§ï‚§â€ŻFollow virtualization vendors best practices,
deployment guides and workload characterizations.
Virtualization
Â§ï‚§â€ŻValidate internal guidelines and best practices for
configuring and managing corporate VMs.
Internal
Benefits of Running Hadoop in a Private Cloud
Elastic Hadoop
‱  Create pool of cluster
nodes
‱  On demand cluster scale
up/down
Multi-tenant Hadoop
‱  Better isolate workloads
and enforce organizational
security boundaries
CapEx reduction
‱  Better utilization of physical
servers
‱  Cluster ‘timeshare’
‱  Promote responsible usage
through chargeback/showback
OpEx reduction
‱  Rapid provisioning & self
provisioning
‱  Simplify cluster maintenance
LEAD TO
Hortonworks & Rackspace Partnership
‱  Goal:
–  Enable Hadoop to run efficiently in OpenStack based
public and private cloud environments
‱  Where we stand
–  Rackspace public cloud service available soon
( Q3CY13)
–  Continued work on enabling Hortonworks data
platform to run efficiently on Rackspace OpenStack
private cloud platform
‱  Project Savannah
–  Automate the deployment of Hadoop on enterprise
class OpenStack clouds.
© Hortonworks Inc. 2013
Final Thoughts
‱ Virtualization features can provide operational advantages
to a Hadoop cluster.
‱ A lot of companies have expertise in virtualizing tier two/
three platforms but not tier one. Be careful of growing
pains.
‱ Can your organization handle the jump of moving to
Hadoop and managing an enterprise virtual infrastructure
at the same time?
‱ Give Hadoop Virtual Extensions time to bake.
‱ Organizations are increasing their percentage of virtual
servers and cloud deployments. They do not want to take
a step back into physical servers unless they have to.
Page 23
© Hortonworks Inc. 2013
Next Steps
Page 24
Download Hortonworks Sandbox
www.hortonworks.com/sandbox
Download Hortonworks Data Platform
www.hortonworks.com/download
Register for Hadoop Series
www.hortonworks.com/webinars
Hadoop Summit
Page 25Architecting the Future of Big Data
‱  June 26-27, 2013- San Jose Convention Cntr
‱  Co-hosted by Hortonworks & Yahoo!
‱  Theme: Enabling the Next Generation
Enterprise Data Platform
‱  90+ Sessions and 7 Tracks:
‱  Community Focused Event
–  Sessions selected by a Conference Committee
–  Community Choice allowed public to vote for
sessions they want to see
‱  Training classes offered pre event
–  Apache Hadoop Essentials: A Technical
Understanding for Business Users
–  Understanding Microsoft HDInsight and Apache
Hadoop
–  Developing Solutions with Apache Hadoop –
HDFS and MapReduce
–  Applying Data Science using Apache Hadoop
hadoopsummit.org
Thank You
For Attending
Best Practices for Virtualizing Hadoop
George Trujillo
Blog: http://cloud-dba-journey.blogspot.com
Twitter: GeorgeTrujillo

Mais conteĂșdo relacionado

Mais procurados

Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Bikas sahathe next generation of hadoop– hadoop 2 and yarn
Bikas sahathe next generation of hadoop– hadoop 2 and yarnBikas sahathe next generation of hadoop– hadoop 2 and yarn
Bikas sahathe next generation of hadoop– hadoop 2 and yarnhdhappy001
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hortonworks
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About YarnCloudera, Inc.
 
Gunther hagleitnerapache hive & stinger
Gunther hagleitnerapache hive & stingerGunther hagleitnerapache hive & stinger
Gunther hagleitnerapache hive & stingerhdhappy001
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderHortonworks
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionCloudera, Inc.
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop SecurityTimothy Spann
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 DataWorks Summit
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseCloudera, Inc.
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 

Mais procurados (20)

Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Bikas sahathe next generation of hadoop– hadoop 2 and yarn
Bikas sahathe next generation of hadoop– hadoop 2 and yarnBikas sahathe next generation of hadoop– hadoop 2 and yarn
Bikas sahathe next generation of hadoop– hadoop 2 and yarn
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Gunther hagleitnerapache hive & stinger
Gunther hagleitnerapache hive & stingerGunther hagleitnerapache hive & stinger
Gunther hagleitnerapache hive & stinger
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via Slider
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 

Destaque

Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual MachinesRichard McDougall
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesDataWorks Summit
 
certificate 100 best graduates
certificate 100 best graduatescertificate 100 best graduates
certificate 100 best graduatesToma Gaidyte
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePete Kisich
 
Deploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data Platform
Deploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data PlatformDeploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data Platform
Deploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data PlatformRackspace
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Cloudera, Inc.
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1Anuchit Chalothorn
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...Spark Summit
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreMark Wong
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Spark Summit
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanGalder Zamarreño
 

Destaque (20)

Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
certificate 100 best graduates
certificate 100 best graduatescertificate 100 best graduates
certificate 100 best graduates
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
 
Deploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data Platform
Deploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data PlatformDeploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data Platform
Deploy Apache Sparkℱ on Rackspace OnMetalℱ for Cloud Big Data Platform
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
 

Semelhante a Best Practices for Virtualizing Apache Hadoop

Whats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageWhats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageJohn Moran
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...Hendrik van Run
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStackJoe Brockmeier
 
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebulaOpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebulaOpenNebula Project
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)Simon Haslam
 
What is coming for VMware vSphere?
What is coming for VMware vSphere?What is coming for VMware vSphere?
What is coming for VMware vSphere?Duncan Epping
 
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebulaOpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebulaOpenNebula Project
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 120191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1makker_nl
 
SQL PASS Taiwan 䞃月仜聚會-1
SQL PASS Taiwan 䞃月仜聚會-1SQL PASS Taiwan 䞃月仜聚會-1
SQL PASS Taiwan 䞃月仜聚會-1SQLPASSTW
 
TechWiseTV Workshop: HyperFlex 3.0
TechWiseTV Workshop: HyperFlex 3.0TechWiseTV Workshop: HyperFlex 3.0
TechWiseTV Workshop: HyperFlex 3.0Robb Boyd
 
Cisco HyperFlex 3.0
Cisco HyperFlex 3.0Cisco HyperFlex 3.0
Cisco HyperFlex 3.0Chase Rothe
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld
 
Ceph Day New York 2014: Ceph over High Performance Networks
Ceph Day New York 2014: Ceph over High Performance NetworksCeph Day New York 2014: Ceph over High Performance Networks
Ceph Day New York 2014: Ceph over High Performance NetworksCeph Community
 
Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks Ceph Community
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowEd Balduf
 

Semelhante a Best Practices for Virtualizing Apache Hadoop (20)

Whats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and StorageWhats new in Microsoft Windows Server 2016 Clustering and Storage
Whats new in Microsoft Windows Server 2016 Clustering and Storage
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
 
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebulaOpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebula
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
 
What is coming for VMware vSphere?
What is coming for VMware vSphere?What is coming for VMware vSphere?
What is coming for VMware vSphere?
 
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebulaOpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 120191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
 
Txlf2012
Txlf2012Txlf2012
Txlf2012
 
SQL PASS Taiwan 䞃月仜聚會-1
SQL PASS Taiwan 䞃月仜聚會-1SQL PASS Taiwan 䞃月仜聚會-1
SQL PASS Taiwan 䞃月仜聚會-1
 
TechWiseTV Workshop: HyperFlex 3.0
TechWiseTV Workshop: HyperFlex 3.0TechWiseTV Workshop: HyperFlex 3.0
TechWiseTV Workshop: HyperFlex 3.0
 
Cisco HyperFlex 3.0
Cisco HyperFlex 3.0Cisco HyperFlex 3.0
Cisco HyperFlex 3.0
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
 
Ceph Day New York 2014: Ceph over High Performance Networks
Ceph Day New York 2014: Ceph over High Performance NetworksCeph Day New York 2014: Ceph over High Performance Networks
Ceph Day New York 2014: Ceph over High Performance Networks
 
Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 

Mais de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mais de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...gurkirankumar98700
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Best Practices for Virtualizing Apache Hadoop

  • 1. © Hortonworks Inc. 2013 Best Practices Virtualizing Hadoop George Trujillo
  • 2. © Hortonworks Inc. 2013 George Trujillo Â§ï‚§â€ŻMaster Principal Big Data Specialist - Hortonworks Â§ï‚§â€ŻTier One Big Data/BCA Specialist – VMware Center of Excellence Â§ï‚§â€ŻVMware Certified Instructor (VMware Certified Professional) Â§ï‚§â€ŻMySQL Certified DBA Â§ï‚§â€ŻSun Microsystem's Ambassador for Java Platforms Â§ï‚§â€ŻAuthor of Linux Administration and Advanced Linux Administration Video Training Â§ï‚§â€ŻRecognized Oracle Double ACE by Oracle Corporation Â§ï‚§â€ŻServed on Oracle Fusion Council & Oracle Beta Leadership Council, Independent Oracle Users Group (IOUG) Board of Directors, Recognized as one of the “Oracles of Oracle” by IOUG Page 2
  • 3. © Hortonworks Inc. 2013 Agenda ‱ Hypervisor’s today ‱ Building an enterprise virtual platform ‱ Virtualizing Master and Slave servers ‱ Best practices ‱ Deploying Hadoop in public and private clouds Page 3
  • 4. © Hortonworks Inc. 2013 Hypervisors Today: Faster/Less Overhead ‱ VMware vSphere, Microsoft Hyper-V Server, Citrix XenServer and RedHat RHEV Page 4 Hypervisor Performance Benchmarks % Overhead VMware 1M IOPS with 1 microsecond of latency (5.1) 2 – 10% KVM 1M transactions/minute (IBM hardware RHEL) < 10% Hypervisor Performance vSphere 5.1 VMware vCPUs 64 RAM per VM, RAM per Host 1TB / 2TB Network 36 GB/s IOPS/VM 1,000,000
  • 5. © Hortonworks Inc. 2013 Why Virtualize Hadoop? ‱ Virtual Servers offer advantages over Physical Servers ‱ Standardization: On a Single Common software stack ‱ Higher consistency and reliability due to abstracting the hardware environment ‱ Operational flexibility with vMotion, Storage vMotion, Live Cloning, template deployments, hot memory and CPU add, Distributed Resource Scheduling, private VLANs, Storage and Network I/O control, etc. ‱ Virtualization is a natural step towards the cloud ‱ Enabling Hadoop as a service in a public or private cloud ‱ Cloud providers are making it easy to deploy Hadoop for POCs, dev and test environments ‱ Cloud and virtualization vendors are offering elastic MapReduce solutions Page 5
  • 6. © Hortonworks Inc. 2013 Virtualization Features Page 6 Faster provisioning Live Cloning Live migrations Templates Live storage migrations Distributed Resource Scheduling High Availability Hot CPU and Memory add Live Cloning VM Replication Network isolation using VXLANs Multi-VM trust zones VM Backups Distributed Power Management Elasticity Multi-tenancy Storage/Network I/O Control Private virtual networks 16Gb FC Support iSCSI Jumbo Frame Support Note: Features/functionality dependent on the hypervisor
  • 7. © Hortonworks Inc. 2013 Hortonworks Data Platform Building an Enterprise Virtual Platform Page 7 Hardware Linux Windows Distributed Storage (HDFS) Distributed Processing (MapReduce) Hive (Query) Pig (Scripting) HCatalog (Metadata Mgmt) Zookeeper (Coordination) HBase (Column DB) WebHCatalog (Rest-like APIs) Ambari (Management) Mahout (Machine Learning) Oozie (Workflow) Ganglia (Monitoring) Nagios (Alerts) Sqoop (DB Transfer) WebHDFS (REST API) “Others” (Talend, Informatica, etc.) Data Extraction And Load Management Monitoring Hadoop Essentials Core Hadoop (kernel) FlumeNG (Data Transfer) Hypervisor
  • 8. © Hortonworks Inc. 2013 Virtualizing Hadoop ‱ The primary goal of virtualizing master and slave servers is the same, to maximize operational efficiency and leverage existing hardware. ‱ However the strategy for virtualizing Hadoop master servers is different than virtualizing Hadoop slave servers. – Hadoop master servers can follow virtualization best practices and guidelines for tier1 and business critical environments. – Hadoop slave servers need to follow virtualization best practices and also use Hadoop Virtual Extensions so a Hadoop cluster is “virtual aware”. Page 8
  • 9. © Hortonworks Inc. 2013 Virtualizing Master Servers ‱ Virtualize the master servers (NameNode, JobTracker, HBase Master, Secondary NameNode) – Consider any key management servers: Ganglia, Nagios, Ambari, Active Directory, Metadata databases ‱ Goals of a virtual enterprise Hadoop platform: – Less down time (Live migrations, cloning, 
) – A more reliable software stack – A higher Quality of Service – Reduced CapEx and OpEx – Increased operational flexibility with virtualization features – VMware High Availability (with five clicks) ‱ Shared storage for the Hadoop master servers is required to fully leverage virtualization features. Page 9
  • 10. © Hortonworks Inc. 2013 Configure Environment Properly ‱ Do not overcommit SLA or production environments ‱ Size virtual machines to avoid entering host “soft” memory state and the likely breaking of host large pages into small pages. Leave at least 6% of memory for the hypervisor and VM memory overhead is conservative. – If free memory drops below minFree (“soft” memory state), memory will be reclaimed through ballooning and other memory management techniques. All these techniques require breaking host large pages into small pages. ‱ Leverage hyperthreading – make sure there is hardware and BIOS support – Hyper Threading – can improve performance up to 20% ‱ Do not set memory limits on production servers. Page 10
  • 11. © Hortonworks Inc. 2013 Configure Environment Properly (2) ‱ Run latest version of hypervisor, BIOS and virtual tools ‱ Verify BIOS settings enable all populated processor sockets and enable all cores in each socket. ‱ Enable “Turbo Boost” in BIOS if processors support it. ‱ Disabling hardware devices (in BIOS) can free interrupt resources. – COM and LPT ports, USB controllers, floppy drives, network interfaces, optical drives, storage controllers, etc ‱ Enable virtualization features in BIOS (VT-x, AMD-V, EPT, RVI) ‱ Initially leave memory scrubbing rate at manufacturer’s default setting. Page 11
  • 12. © Hortonworks Inc. 2013 More Best Practices ‱ Configure an OS kernel as a single-core or multi-core kernel based on the number of vCPUs being used. ‱ Understand how NUMA affects your VMs – try to keep the VM size within the NUMA node – Look at disabling node interleaving (leave NUMA enabled) – Maintain memory locality ‱ Let hypervisor control power mgmt by BIOS setting “OS Controlled Mode” ‱ Enable C1E in BIOS ‱ Have a very good reason for using CPU affinity otherwise avoid it like the plague Page 12
  • 13. © Hortonworks Inc. 2013 Linux Best Practices ‱ Kernel parameters: – nofile=16384 – nproc=32000 – Mount with noatime and nodiratime attributes disabled – File descriptors set to 65535 – File system read-ahead buffer should be increased to 1024 or 2,048. – Epoll file descriptor limit should be increased to 4096 ‱ Turn off swapping ‱ Use ext4 or xfs (mount noatime) – Ext can be about 5% better on reads than xfs – XFS can be 12-25% better on writes (and auto defrags in the background) ‱ Linux 2.6.30+ can give 60% better energy consumption. Page 13
  • 14. © Hortonworks Inc. 2013 Networking Best Practices ‱ Separate VM traffic from live migration and management traffic – Separate NICs with separate vSwitches ‱ Leverage NIC teaming (at least 2 NICS per vSwitch) ‱ Leverage latest adapters and drivers from hypervisor vendor ‱ Be careful with multi-queue networking: Hadoop drives a high packet rate, but not high enough to justify the overhead of multi-queue. ‱ Network: – Channel bonding two GbE ports can give better I/O performance – 8 Queues per port Page 14
  • 15. © Hortonworks Inc. 2013 Networking Best Practices (2) ‱ Evaluate these features with network adapters to leverage hardware features: – Checksum offload – TCP segmentation offload(TSO) – Jumbo frames (JF) – Large receive offload(LRO) – Ability to handle high-memory DMA (that is, 64-bit DMA addresses) – Ability to handle multiple Scatter Gather elements per Tx frame ‱ Optimize 10 Gigabit Ethernet network adapters – Features like NetQueue can significantly improve performance of 10 Gigabit Ethernet network adapters in virtualized environments. Page 15
  • 16. © Hortonworks Inc. 2013 Storage Best Practices ‱ Make good storage decisions – i.e. VMFS or Raw Device Mappings (RDM) – VMDK – leverages all features of virtualization – RDM – leverages features of storage vendors (replication, snapshots, 
) – Run in Advanced Host Controller interface mode (AHCI). – Native Command Queuing enabled (NCQ) ‱ Use multiple vSCSI adapters and evenly distribute target devices ‱ Use eagerzeroedthick for VMDK files or uncheck Windows “Quick Format” option ‱ Makes sure there is block alignment for storage Page 16
  • 17. © Hortonworks Inc. 2013 Virtualizing Data Servers ‱ HVE is a new feature that extends the Hadoop topology awareness mechanism to support rack and node groups with hosts containing VMs. – Data locality-related policies maintained within a virtual layer ‱ HVE merged into branch-1 – Available in Apache Hadoop 1.2, HDP 1.2 – https://issues.apache.org/jira/browse/HADOOP-8817 ‱ Extensions include: – Block placement and removal policies – Balancer policies – Task scheduling – Network topology awareness Page 17
  • 18. © Hortonworks Inc. 2013 HVE: Virtualization Topology Awareness Page 18 Host8 Rack1 Data Center Rack2 NodeG 3 NodeG 4 Host7 VMVM VMVM VMVM VMVM Host6Host5 VMVM VMVM VMVM VMVM Host4Host3 VMVM VMVM VMVM VMVM Host2Host1 VMVM VMVM VMVM VMVM NodeG 1 NodeG 2 ‱ HVE is a new feature that extends the Hadoop topology awareness mechanism to support rack and node groups with hosts containing VMs. – Data locality-related policies maintained within a virtual layer.
  • 19. © Hortonworks Inc. 2013 HVE: Replica Policies Page 19 Standard Replica Policies Extension Replica Policies 1st replica is on local (closest) node of the writer Multiple replicas are not be placed on the same node or on nodes under the same node group 2nd replica is on separate rack of 1st replica; 1st replica is on the local node or local node group of the writer 3rd replica is on the same rack as the 2nd replica; 2nd replica is on a remote rack of the 1st replica Remaining replicas are placed randomly across rack to meet minimum restriction. Multiple replicas are not placed on the same node with standard or extension replica placement/removal policies. Rules are maintained for the balancer.
  • 20. © Hortonworks Inc. 2013 Follow Virtualization Best Practices Page 20 Â§ï‚§â€ŻValidate virtualization and Hadoop configurations with vendor hardware compatibility lists. Hardware Â§ï‚§â€ŻFollow recommended Hadoop reference architectures.Hadoop Â§ï‚§â€ŻReview storage vendor recommendations.Storage Â§ï‚§â€ŻFollow virtualization vendors best practices, deployment guides and workload characterizations. Virtualization Â§ï‚§â€ŻValidate internal guidelines and best practices for configuring and managing corporate VMs. Internal
  • 21. Benefits of Running Hadoop in a Private Cloud Elastic Hadoop ‱  Create pool of cluster nodes ‱  On demand cluster scale up/down Multi-tenant Hadoop ‱  Better isolate workloads and enforce organizational security boundaries CapEx reduction ‱  Better utilization of physical servers ‱  Cluster ‘timeshare’ ‱  Promote responsible usage through chargeback/showback OpEx reduction ‱  Rapid provisioning & self provisioning ‱  Simplify cluster maintenance LEAD TO
  • 22. Hortonworks & Rackspace Partnership ‱  Goal: –  Enable Hadoop to run efficiently in OpenStack based public and private cloud environments ‱  Where we stand –  Rackspace public cloud service available soon ( Q3CY13) –  Continued work on enabling Hortonworks data platform to run efficiently on Rackspace OpenStack private cloud platform ‱  Project Savannah –  Automate the deployment of Hadoop on enterprise class OpenStack clouds.
  • 23. © Hortonworks Inc. 2013 Final Thoughts ‱ Virtualization features can provide operational advantages to a Hadoop cluster. ‱ A lot of companies have expertise in virtualizing tier two/ three platforms but not tier one. Be careful of growing pains. ‱ Can your organization handle the jump of moving to Hadoop and managing an enterprise virtual infrastructure at the same time? ‱ Give Hadoop Virtual Extensions time to bake. ‱ Organizations are increasing their percentage of virtual servers and cloud deployments. They do not want to take a step back into physical servers unless they have to. Page 23
  • 24. © Hortonworks Inc. 2013 Next Steps Page 24 Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform www.hortonworks.com/download Register for Hadoop Series www.hortonworks.com/webinars
  • 25. Hadoop Summit Page 25Architecting the Future of Big Data ‱  June 26-27, 2013- San Jose Convention Cntr ‱  Co-hosted by Hortonworks & Yahoo! ‱  Theme: Enabling the Next Generation Enterprise Data Platform ‱  90+ Sessions and 7 Tracks: ‱  Community Focused Event –  Sessions selected by a Conference Committee –  Community Choice allowed public to vote for sessions they want to see ‱  Training classes offered pre event –  Apache Hadoop Essentials: A Technical Understanding for Business Users –  Understanding Microsoft HDInsight and Apache Hadoop –  Developing Solutions with Apache Hadoop – HDFS and MapReduce –  Applying Data Science using Apache Hadoop hadoopsummit.org
  • 26. Thank You For Attending Best Practices for Virtualizing Hadoop George Trujillo Blog: http://cloud-dba-journey.blogspot.com Twitter: GeorgeTrujillo