SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
Even as virtualization has spread throughout the data center, Apache Hadoop continues to be deployed almost exclusively on
bare-metal physical servers. Processing overhead and I/O latency typically associated with virtualization have prevented big
data architects from virtualizing Hadoop implementations.
As a result, most Hadoop initiatives have been limited in terms of agility, with infrastructure changes such as provisioning a
new server for Hadoop often taking weeks or even months. This infrastructure complexity continues to slow down adoption
in enterprise deployments. Apache Spark is a relatively new big data technology, but interest is growing rapidly; many of these
same deployment challenges apply to on-premises Spark implementations.
The BlueData EPIC (Elastic Private Instant Clusters) software platform addresses these limitations, enabling data center
operators to accelerate Hadoop and Spark implementations on Intel® architecture-based servers.
The BlueData EPIC* software platform offers data center
operators the agility and cost performance of virtualized
infrastructure for big data, with high manageability and flexibility
when integrating into existing data center environments.
Introduction to BlueData EPIC
The BlueData EPIC software platform
reduces the complexity of big data
infrastructure deployments, providing
the ability for end users to quickly and
easily deploy Hadoop or Spark clusters
in a virtualized environment running on
Docker containers. These clusters can
deliver faster time-to-value for big data,
providing the cloud-like experience of
Hadoop-as-a-Service or Spark-as-a-
Service in their own data centers.
The BlueData EPIC platform helps
improve hardware utilization, reduces
cluster sprawl, and minimizes the
need to move data for big data
analytics. BlueData EPIC also provides
for simplified deployment and
administration, while making virtual
clusters look and feel like physical
clusters for big data analytics.
Taking advantage of the power of
containers and virtualization, BlueData’s
software helps deliver greater agility
and cost-efficiency for on-premises
big data infrastructure. The benefits of
these capabilities include the following:
• Business agility. Virtual clusters
can be spun up or down in minutes,
providing elasticity for capacity spikes,
as well as rapid response to emerging
business needs.
• Data protection. Multiple virtual
workloads can co-exist on the same
multi-tenant physical cluster, while
isolating data on each virtual cluster
from the others.
• Resource efficiency. Multiple business
units and user groups can share
physical cluster resources, avoiding
the cost and complexity of each having
its own big data infrastructure.
To meet varying customer needs, the
EPIC software platform is available in
two editions. EPIC Lite is a community
edition of the platform that is available
for a single instance, free of charge; it
is intended for evaluation purposes
and for personal use. EPIC Enterprise
is a fully supported, highly scalable
commercial edition that is available on
a subscription basis for up to hundreds
of physical nodes. For a full comparison
of the two product editions, see
www.bluedata.com/product/comparison.
white paper
BlueData Enables
Virtualization of Enterprise
Hadoop* and Spark* Workloads
Virtualization for Big Data
Intel® Xeon® Processor E5-2600 v3 Product Family
BlueData EPIC Software Architecture
The core components of the EPIC platform—ElasticPlane*, IOBoost*, and
DataTap*—are illustrated in Figure 1 and described below.
ElasticPlane: Virtual Clusters on Demand
ElasticPlane enables spinning up virtual
clusters on demand via self-service in a
secure multi-tenant environment, with
a policy engine for automated QoS and
SLA management. End users can easily
create virtual Hadoop or Spark clusters
with BlueData EPIC ’s ElasticPlane
functionality and self-service interface.
BlueData also provides multi-tenancy
and data isolation to help ensure logical
separation between each group within
the organization.
The solution enables different project
teams or departments across the
enterprise to share the same physical
infrastructure—and access the same
data sources—for their big data
analytics. The platform integrates with
enterprise security and authentication
mechanisms such as LDAP, Active
Directory, and Kerberos*.
IOBoost: Enhanced Performance
IOBoost enhances the I/O performance
and scalability of virtual clusters
with hierarchical data caching and
tiering, plus single-copy data transfer
from physical storage to the virtual
cluster. The IOBoost functionality of
the BlueData EPIC platform provides
application-aware caching and elastic
resource management that adapts
dynamically to changing application
requirements, helping drive up
performance.
Write-dominant workloads in particular
benefit from IOBoost, which takes
advantage of knowing how the
application will access data. BlueData’s
IOBoost technology provides a non-
persistent memory cache, the behavior
of which changes to improve the
efficiency of access to physical storage
devices. IOBoost accesses the external
file system by means of BlueData’s
DataTap file system connector.
Hadoop-as-a-Service or
Spark-as-a-Service
in an On-Premises
Deployment Model
The BlueData EPIC* software
platform gives business users
the ability to set up self-service
virtual Hadoop* or Spark*
clusters without having to submit
requests for scarce IT resources
and then wait for an environment
to be set up for them. Within
minutes, data scientists and
analysts can deploy big data
services and applications to meet
their needs on demand.
The ability to explore, analyze,
and draw insights from data
allows users to seize business
opportunities while they are still
relevant.
• Ad hoc analytics. Identify
emerging trends and
relationships to enhance
decision support.
• “Fail-fast” experimentation.
Try out new approaches to big
data challenges with minimal
investment.
• Rapid response. Spin virtual
clusters up and down fast,
as changing needs and
opportunities dictate.
Figure 1. BlueData EPIC* software architecture.
Marketing
Local HDFS
Remote
HDFS
CEPHNFS Gluster Object
Store
R  D Support Sales
BI/Analytics Tools
Manufacturing
ElasticPlane™ - Self-service, multi-tenant clusters
BlueData EPIC™ Platform
IOBoost™ - Extreme performance and scalability
DataTap™ - In-place access to enterprise data stores
2BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads
Specifically, as the application writes
data to the Hadoop Distributed File
System (HDFS*), IOBoost functions as
a write-behind cache, optimizing the
performance of sequential writes.
DataTap: In-Place Processing of Data
DataTap allows in-place processing of
data, eliminating the need to duplicate
data across Hadoop systems. DataTap
provides HDFS protocol abstraction
that allows big data applications to
run unmodified with fast access to
data sources other than HDFS. With
BlueData EPIC’s DataTap capability,
organizations can access data from any
shared storage system (including HDFS
as well as NFS*, GlusterFS*, CEPH*, and
Swift*) for big data analytics.
That means organizations don’t need to
make multiple copies of data or move
data into HDFS before running their
analysis. Sensitive data can stay in their
secure storage system with enterprise-
grade data governance, without the cost
and risks of creating and maintaining
multiple copies.
DataTap effectively decouples compute
from storage, providing the ability
to independently scale compute and
storage on an as-needed basis. This
approach helps enable more effective
utilization of infrastructure resources
and lower data center operating costs.
Deployment Considerations
and Guidance
BlueData’s software applies patent-
pending innovations to enable
virtualization that is specifically tailored
to the needs of big data. The use
of Docker containers is completely
transparent, but BlueData customers
benefit from greater performance
and deployment flexibility due to
their lightweight nature. This enables
enterprises to quickly and easily deploy
Hadoop or Spark in a lightweight
container environment, running on
either bare-metal physical servers or on
virtual machines.
BlueData EPIC supports Hadoop and
Spark applications without requiring
those applications to be modified in
any way. Likewise, the platform utilizes
the underlying features of the physical
storage devices for data backup,
replication, and high availability, so
it is not necessary for organizations
to modify their existing processes to
facilitate security and durability of
their data.
Synergies with Intel® Architecture
The ability to run on large clusters of
mainstream two-socket servers extends
the cost-performance advantages
of virtualizing big data workloads.
BlueData software running on the Intel®
Xeon® processor E5-2600 v3 product
family is a powerful combination to
overcome key virtualization challenges
such as network latency, infrastructure
security, and power inefficiencies.
Deploying BlueData EPIC environments
on two-socket servers powered by
these processors takes advantage of
the following benefits:
• Improved performance and
virtualization density. With increased
core counts, larger cache, and higher
memory bandwidth, the processor
delivers dramatic improvements over
its predecessors.
• Hardware-based security. Intel®
Platform Protection Technology,
including Intel® Trusted Execution
Technology, Intel® OS Guard, and BIOS
Guard, enhances protection against
malicious attacks.
• Increased power efficiency. Per-
core P states dynamically respond to
changing workloads and adapt power
levels on each individual core, to
deliver better performance per watt
than predecessor platforms.
Beyond the processor platform used
with BlueData deployments, using Intel®
Solid-State Drives (Intel® SSDs) helps
optimize the execution environment at
the system level. For example, the
Intel® SSD Data Center (Intel SSD
DC) P3608 Series1
delivers high
performance and low latency that
help accelerate virtualized Hadoop
workloads, using connectivity based
on the Non-Volatile Memory Express
(NVMe) standard and eight lanes of PCI
Express* (PCIe*) 3.0.
Created by an industry coalition
including Intel, NVMe replaces the older
SATA standard with a new technology
developed specifically to deliver
latency and throughput advantages
for high-speed SSDs and other non-
volatile memory-based storage. The
Intel SSD DC P3608 Series builds on
those capabilities with a unique dual-
controller architecture that improves
scaling across the execution cores of
Intel® Xeon® processors. It is available
in a low-profile PCIe form factor, in
capacities up to 4 TB.
Intel® Ethernet Controllers help
accelerate workloads including
those based on virtualized Hadoop
with purpose-built capabilities for
virtualization, such as intelligent offload
of traffic management to network
hardware. By handling traffic functions
in network silicon, Intel Ethernet
removes the associated burden from
the processor, freeing execution
resources for other work.
Configuration Best Practices
Ongoing experimentation by Intel and
BlueData indicates that the following
suggested guidelines may help data
center operators achieve optimal
throughput on virtualized Hadoop and
Spark workloads using the BlueData
EPIC software platform. While detailed
examination is beyond the scope of
this paper, the following guidance is
particularly relevant to I/O-bound
workloads.
• Configure systems to enhance disk
performance. The performance
of the storage where the files are
stored must be sufficient to avoid a
bottleneck.
3BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads
• Provide sufficient network
throughput. The performance of
the network connectivity between
the EPIC hosts must be sufficient
to avoid bottlenecks.
• Deploy using current-generation
Intel® processors. The architecture
of each generation of Intel Xeon
processors provides advances in terms
of performance and power efficiency,
which can have significant benefits to
Hadoop or Spark workloads running
on BlueData EPIC.
Conclusion
The BlueData EPIC software platform
enables virtualization of Hadoop and
Spark workloads as a viable alternative
to bare-metal implementations.
This breakthrough capability makes
BlueData and Intel Xeon processors key
ingredients in big data deployments for
customers that want to take advantage
of containers and virtualization to
enhance the performance, efficiency,
and agility of their implementations.
As a result, Hadoop or Spark on
BlueData EPIC software and Intel
architecture has become the solution
stack of choice for many big data
initiatives, providing an optimized set
of building blocks to meet emerging
needs faced by businesses of all
types and sizes. Going forward, big
data deployments can be a part of
broader virtualization initiatives,
including taking advantage of ongoing
improvements to Intel Xeon processor-
based platforms.
In particular, companies can provide
Hadoop-as-a-Service or Spark-as-
a-Service for their business users,
empowering them to design and
implement analytics on demand while
leveraging on-premises infrastructure,
governance, and security. These
capabilities allow them to put data to
use more easily as changing business
needs warrant, supporting the evolving
demands of business units for self-
determination while also making IT
more responsive to the needs of the
business as a whole.
Supporting Resources
• Intel blog: Simplify
Big Data Deployment
http://blogs.intel.com/
evangelists/2015/08/25/
simplify-big-data-deployment/
• BlueData blog: New Funding and
Strategic Collaboration with Intel
http://bluedata.com/blog/2015/08/
new-funding-and-strategic-
collaboration-with-intel
For more information, visit intel.com/bigdata and bluedata.com
BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads
	1	
www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-p3608-series.html.
		Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.
No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
		Copyright © 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
		*Other names may be trademarks of their respective owners.
	
	Printed in USA 1115/MW/MESH/PDF Please Recycle 332621-001US

Mais conteúdo relacionado

Mais procurados

Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSMesosphere Inc.
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...DataWorks Summit
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Jax Cloud 2016 Microsoft Ignite Recap
Jax Cloud 2016 Microsoft Ignite RecapJax Cloud 2016 Microsoft Ignite Recap
Jax Cloud 2016 Microsoft Ignite RecapBen Stegink
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANA
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANAJoint NetApp and Cisco Solutions for SAP: FlexPod and HANA
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANANetApp
 
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopMigrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopDataWorks Summit
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageIBM Power Systems
 
EDB Postgres Platform
EDB Postgres PlatformEDB Postgres Platform
EDB Postgres PlatformEDB
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem tableMohamed Magdy
 

Mais procurados (20)

Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
Jax Cloud 2016 Microsoft Ignite Recap
Jax Cloud 2016 Microsoft Ignite RecapJax Cloud 2016 Microsoft Ignite Recap
Jax Cloud 2016 Microsoft Ignite Recap
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANA
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANAJoint NetApp and Cisco Solutions for SAP: FlexPod and HANA
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANA
 
Migrating legacy ERP data into Hadoop
Migrating legacy ERP data into HadoopMigrating legacy ERP data into Hadoop
Migrating legacy ERP data into Hadoop
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
 
EDB Postgres Platform
EDB Postgres PlatformEDB Postgres Platform
EDB Postgres Platform
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
 

Destaque

UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"CTSI at UCSF
 
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...CTSI at UCSF
 
빅데이터 기본개념
빅데이터 기본개념빅데이터 기본개념
빅데이터 기본개념현주 유
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)Wonjin Lee
 
Linux Introduction (Commands)
Linux Introduction (Commands)Linux Introduction (Commands)
Linux Introduction (Commands)anandvaidya
 
Linux ppt
Linux pptLinux ppt
Linux pptlincy21
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Introduction to linux ppt
Introduction to linux pptIntroduction to linux ppt
Introduction to linux pptOmi Vichare
 
Linux.ppt
Linux.ppt Linux.ppt
Linux.ppt onu9
 

Destaque (14)

UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
 
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
UCSF Informatics Day 2014 - Doug Berman, "A Brief Tour of UCSF’s Clinical Dat...
 
Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
빅데이터 기본개념
빅데이터 기본개념빅데이터 기본개념
빅데이터 기본개념
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Final report ehr1
Final report ehr1Final report ehr1
Final report ehr1
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
 
Linux Introduction (Commands)
Linux Introduction (Commands)Linux Introduction (Commands)
Linux Introduction (Commands)
 
Linux ppt
Linux pptLinux ppt
Linux ppt
 
UNIX/Linux training
UNIX/Linux trainingUNIX/Linux training
UNIX/Linux training
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Introduction to linux ppt
Introduction to linux pptIntroduction to linux ppt
Introduction to linux ppt
 
Linux.ppt
Linux.ppt Linux.ppt
Linux.ppt
 

Semelhante a Hadoop Virtualization - Intel White Paper

Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
 
An Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google CloudAn Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google CloudFibonalabs
 
Cisco Big Data Use Case
Cisco Big Data Use CaseCisco Big Data Use Case
Cisco Big Data Use CaseErni Susanti
 
cisco_bigdata_case_study_1
cisco_bigdata_case_study_1cisco_bigdata_case_study_1
cisco_bigdata_case_study_1Erni Susanti
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsSparity1
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Ory Chhean
 
Cloudwatt pioneers big_data
Cloudwatt pioneers big_dataCloudwatt pioneers big_data
Cloudwatt pioneers big_dataxband
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsRay Février
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionAppfluent Technology
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerMicrosoft Tech Community
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Andrew Underwood
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primerpartha69
 
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMLEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMmyteratak
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Conduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminarConduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminarSyed Shaaf
 
Open stack london keynote
Open stack london keynoteOpen stack london keynote
Open stack london keynoteJohn Griffith
 

Semelhante a Hadoop Virtualization - Intel White Paper (20)

BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
An Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google CloudAn Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google Cloud
 
Cisco Big Data Use Case
Cisco Big Data Use CaseCisco Big Data Use Case
Cisco Big Data Use Case
 
cisco_bigdata_case_study_1
cisco_bigdata_case_study_1cisco_bigdata_case_study_1
cisco_bigdata_case_study_1
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358
 
Cloudwatt pioneers big_data
Cloudwatt pioneers big_dataCloudwatt pioneers big_data
Cloudwatt pioneers big_data
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layer
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
 
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMLEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Conduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminarConduct JBoss EAP 6 seminar
Conduct JBoss EAP 6 seminar
 
Open stack london keynote
Open stack london keynoteOpen stack london keynote
Open stack london keynote
 

Mais de BlueData, Inc.

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData, Inc.
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData, Inc.
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made EasyBlueData, Inc.
 

Mais de BlueData, Inc. (12)

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes Meetup
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 

Último

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Último (20)

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Hadoop Virtualization - Intel White Paper

  • 1. Even as virtualization has spread throughout the data center, Apache Hadoop continues to be deployed almost exclusively on bare-metal physical servers. Processing overhead and I/O latency typically associated with virtualization have prevented big data architects from virtualizing Hadoop implementations. As a result, most Hadoop initiatives have been limited in terms of agility, with infrastructure changes such as provisioning a new server for Hadoop often taking weeks or even months. This infrastructure complexity continues to slow down adoption in enterprise deployments. Apache Spark is a relatively new big data technology, but interest is growing rapidly; many of these same deployment challenges apply to on-premises Spark implementations. The BlueData EPIC (Elastic Private Instant Clusters) software platform addresses these limitations, enabling data center operators to accelerate Hadoop and Spark implementations on Intel® architecture-based servers. The BlueData EPIC* software platform offers data center operators the agility and cost performance of virtualized infrastructure for big data, with high manageability and flexibility when integrating into existing data center environments. Introduction to BlueData EPIC The BlueData EPIC software platform reduces the complexity of big data infrastructure deployments, providing the ability for end users to quickly and easily deploy Hadoop or Spark clusters in a virtualized environment running on Docker containers. These clusters can deliver faster time-to-value for big data, providing the cloud-like experience of Hadoop-as-a-Service or Spark-as-a- Service in their own data centers. The BlueData EPIC platform helps improve hardware utilization, reduces cluster sprawl, and minimizes the need to move data for big data analytics. BlueData EPIC also provides for simplified deployment and administration, while making virtual clusters look and feel like physical clusters for big data analytics. Taking advantage of the power of containers and virtualization, BlueData’s software helps deliver greater agility and cost-efficiency for on-premises big data infrastructure. The benefits of these capabilities include the following: • Business agility. Virtual clusters can be spun up or down in minutes, providing elasticity for capacity spikes, as well as rapid response to emerging business needs. • Data protection. Multiple virtual workloads can co-exist on the same multi-tenant physical cluster, while isolating data on each virtual cluster from the others. • Resource efficiency. Multiple business units and user groups can share physical cluster resources, avoiding the cost and complexity of each having its own big data infrastructure. To meet varying customer needs, the EPIC software platform is available in two editions. EPIC Lite is a community edition of the platform that is available for a single instance, free of charge; it is intended for evaluation purposes and for personal use. EPIC Enterprise is a fully supported, highly scalable commercial edition that is available on a subscription basis for up to hundreds of physical nodes. For a full comparison of the two product editions, see www.bluedata.com/product/comparison. white paper BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads Virtualization for Big Data Intel® Xeon® Processor E5-2600 v3 Product Family
  • 2. BlueData EPIC Software Architecture The core components of the EPIC platform—ElasticPlane*, IOBoost*, and DataTap*—are illustrated in Figure 1 and described below. ElasticPlane: Virtual Clusters on Demand ElasticPlane enables spinning up virtual clusters on demand via self-service in a secure multi-tenant environment, with a policy engine for automated QoS and SLA management. End users can easily create virtual Hadoop or Spark clusters with BlueData EPIC ’s ElasticPlane functionality and self-service interface. BlueData also provides multi-tenancy and data isolation to help ensure logical separation between each group within the organization. The solution enables different project teams or departments across the enterprise to share the same physical infrastructure—and access the same data sources—for their big data analytics. The platform integrates with enterprise security and authentication mechanisms such as LDAP, Active Directory, and Kerberos*. IOBoost: Enhanced Performance IOBoost enhances the I/O performance and scalability of virtual clusters with hierarchical data caching and tiering, plus single-copy data transfer from physical storage to the virtual cluster. The IOBoost functionality of the BlueData EPIC platform provides application-aware caching and elastic resource management that adapts dynamically to changing application requirements, helping drive up performance. Write-dominant workloads in particular benefit from IOBoost, which takes advantage of knowing how the application will access data. BlueData’s IOBoost technology provides a non- persistent memory cache, the behavior of which changes to improve the efficiency of access to physical storage devices. IOBoost accesses the external file system by means of BlueData’s DataTap file system connector. Hadoop-as-a-Service or Spark-as-a-Service in an On-Premises Deployment Model The BlueData EPIC* software platform gives business users the ability to set up self-service virtual Hadoop* or Spark* clusters without having to submit requests for scarce IT resources and then wait for an environment to be set up for them. Within minutes, data scientists and analysts can deploy big data services and applications to meet their needs on demand. The ability to explore, analyze, and draw insights from data allows users to seize business opportunities while they are still relevant. • Ad hoc analytics. Identify emerging trends and relationships to enhance decision support. • “Fail-fast” experimentation. Try out new approaches to big data challenges with minimal investment. • Rapid response. Spin virtual clusters up and down fast, as changing needs and opportunities dictate. Figure 1. BlueData EPIC* software architecture. Marketing Local HDFS Remote HDFS CEPHNFS Gluster Object Store R D Support Sales BI/Analytics Tools Manufacturing ElasticPlane™ - Self-service, multi-tenant clusters BlueData EPIC™ Platform IOBoost™ - Extreme performance and scalability DataTap™ - In-place access to enterprise data stores 2BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads
  • 3. Specifically, as the application writes data to the Hadoop Distributed File System (HDFS*), IOBoost functions as a write-behind cache, optimizing the performance of sequential writes. DataTap: In-Place Processing of Data DataTap allows in-place processing of data, eliminating the need to duplicate data across Hadoop systems. DataTap provides HDFS protocol abstraction that allows big data applications to run unmodified with fast access to data sources other than HDFS. With BlueData EPIC’s DataTap capability, organizations can access data from any shared storage system (including HDFS as well as NFS*, GlusterFS*, CEPH*, and Swift*) for big data analytics. That means organizations don’t need to make multiple copies of data or move data into HDFS before running their analysis. Sensitive data can stay in their secure storage system with enterprise- grade data governance, without the cost and risks of creating and maintaining multiple copies. DataTap effectively decouples compute from storage, providing the ability to independently scale compute and storage on an as-needed basis. This approach helps enable more effective utilization of infrastructure resources and lower data center operating costs. Deployment Considerations and Guidance BlueData’s software applies patent- pending innovations to enable virtualization that is specifically tailored to the needs of big data. The use of Docker containers is completely transparent, but BlueData customers benefit from greater performance and deployment flexibility due to their lightweight nature. This enables enterprises to quickly and easily deploy Hadoop or Spark in a lightweight container environment, running on either bare-metal physical servers or on virtual machines. BlueData EPIC supports Hadoop and Spark applications without requiring those applications to be modified in any way. Likewise, the platform utilizes the underlying features of the physical storage devices for data backup, replication, and high availability, so it is not necessary for organizations to modify their existing processes to facilitate security and durability of their data. Synergies with Intel® Architecture The ability to run on large clusters of mainstream two-socket servers extends the cost-performance advantages of virtualizing big data workloads. BlueData software running on the Intel® Xeon® processor E5-2600 v3 product family is a powerful combination to overcome key virtualization challenges such as network latency, infrastructure security, and power inefficiencies. Deploying BlueData EPIC environments on two-socket servers powered by these processors takes advantage of the following benefits: • Improved performance and virtualization density. With increased core counts, larger cache, and higher memory bandwidth, the processor delivers dramatic improvements over its predecessors. • Hardware-based security. Intel® Platform Protection Technology, including Intel® Trusted Execution Technology, Intel® OS Guard, and BIOS Guard, enhances protection against malicious attacks. • Increased power efficiency. Per- core P states dynamically respond to changing workloads and adapt power levels on each individual core, to deliver better performance per watt than predecessor platforms. Beyond the processor platform used with BlueData deployments, using Intel® Solid-State Drives (Intel® SSDs) helps optimize the execution environment at the system level. For example, the Intel® SSD Data Center (Intel SSD DC) P3608 Series1 delivers high performance and low latency that help accelerate virtualized Hadoop workloads, using connectivity based on the Non-Volatile Memory Express (NVMe) standard and eight lanes of PCI Express* (PCIe*) 3.0. Created by an industry coalition including Intel, NVMe replaces the older SATA standard with a new technology developed specifically to deliver latency and throughput advantages for high-speed SSDs and other non- volatile memory-based storage. The Intel SSD DC P3608 Series builds on those capabilities with a unique dual- controller architecture that improves scaling across the execution cores of Intel® Xeon® processors. It is available in a low-profile PCIe form factor, in capacities up to 4 TB. Intel® Ethernet Controllers help accelerate workloads including those based on virtualized Hadoop with purpose-built capabilities for virtualization, such as intelligent offload of traffic management to network hardware. By handling traffic functions in network silicon, Intel Ethernet removes the associated burden from the processor, freeing execution resources for other work. Configuration Best Practices Ongoing experimentation by Intel and BlueData indicates that the following suggested guidelines may help data center operators achieve optimal throughput on virtualized Hadoop and Spark workloads using the BlueData EPIC software platform. While detailed examination is beyond the scope of this paper, the following guidance is particularly relevant to I/O-bound workloads. • Configure systems to enhance disk performance. The performance of the storage where the files are stored must be sufficient to avoid a bottleneck. 3BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads
  • 4. • Provide sufficient network throughput. The performance of the network connectivity between the EPIC hosts must be sufficient to avoid bottlenecks. • Deploy using current-generation Intel® processors. The architecture of each generation of Intel Xeon processors provides advances in terms of performance and power efficiency, which can have significant benefits to Hadoop or Spark workloads running on BlueData EPIC. Conclusion The BlueData EPIC software platform enables virtualization of Hadoop and Spark workloads as a viable alternative to bare-metal implementations. This breakthrough capability makes BlueData and Intel Xeon processors key ingredients in big data deployments for customers that want to take advantage of containers and virtualization to enhance the performance, efficiency, and agility of their implementations. As a result, Hadoop or Spark on BlueData EPIC software and Intel architecture has become the solution stack of choice for many big data initiatives, providing an optimized set of building blocks to meet emerging needs faced by businesses of all types and sizes. Going forward, big data deployments can be a part of broader virtualization initiatives, including taking advantage of ongoing improvements to Intel Xeon processor- based platforms. In particular, companies can provide Hadoop-as-a-Service or Spark-as- a-Service for their business users, empowering them to design and implement analytics on demand while leveraging on-premises infrastructure, governance, and security. These capabilities allow them to put data to use more easily as changing business needs warrant, supporting the evolving demands of business units for self- determination while also making IT more responsive to the needs of the business as a whole. Supporting Resources • Intel blog: Simplify Big Data Deployment http://blogs.intel.com/ evangelists/2015/08/25/ simplify-big-data-deployment/ • BlueData blog: New Funding and Strategic Collaboration with Intel http://bluedata.com/blog/2015/08/ new-funding-and-strategic- collaboration-with-intel For more information, visit intel.com/bigdata and bluedata.com BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads 1 www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-p3608-series.html. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Copyright © 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries. *Other names may be trademarks of their respective owners. Printed in USA 1115/MW/MESH/PDF Please Recycle 332621-001US