SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
White Paper




Cisco and Greenplum
Partner to Deliver
High-Performance
Hadoop Reference
Configurations




September 2012
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012



                                                    Contents
                                                    Next-Generation Hadoop Solution....................................................................	3

                                                    Greenplum MR: Hadoop Reengineered.................................................................	3

                                                    Cisco UCS: The Exclusive Platform for Greenplum MR..........................................	6

                                                    Reference Configurations.................................................................................	7

                                                    Excellence from Cisco and Greenplum.............................................................	12

                                                    Complete Big Data Analysis Solution.....................................................................	12

                                                    Designed for High Availability and Reliability...........................................................	12

                                                    High Performance and Exceptional Scalability.......................................................	12

                                                    Simplified Management.........................................................................................	13

                                                    Coexistence with Enterprise Applications..............................................................	13

                                                    Rapid Deployment..................................................................................................	13

                                                    Enterprise Service and Support ............................................................................	13

                                                    For More Information........................................................................................	13




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                            Page 2
Cisco and Greenplum Partner to                                                                          White Paper
                                                                                                        September 2012
Deliver High-Performance
Hadoop Reference Configurations




     Highlights
                                                    Greenplum MR on Cisco UCS provides companies with
                                                    an integrated Hadoop solution that delivers advanced
     Optimized for Performance                      performance, full data protection, no single point of failure,
   • Greenplum and Cisco deliver
     an integrated Hadoop solution
                                                    and improved data-access features that can expedite the
     specifically engineered to handle the          implementation of big data analytics environments.
     most demanding Hadoop workloads.
     Cisco UCS Creates a Flexible
     Appliance Platform                             Next-Generation Hadoop Solution
   • The Cisco Unified Computing
     System™ (Cisco UCS®) provides a                The worldwide leader in data center networking, and now a leading competitor in
     flexible, high-performance platform            the server market, Cisco is partnering with Greenplum to provide a best-in-class
     that can be optimized and easily               big data solutions that meet a range of needs. The Greenplum MR on Cisco UCS®
     scaled for any size of Hadoop cluster.
                                                    Reference Configurations deliver integrated, end-to-end software and hardware
     Ease of Management                             infrastructure that accelerates big data initiatives with a choice of performance and
   • Cisco UCS Manager provides unified,            capacity. The combination of world-leading Cisco Unified Computing System™
     embedded management of all
                                                    (Cisco UCS) and Greenplum MR enables companies to significantly reduce time-to-
     computing, networking, and storage-
     access resources.                              value and the operating expenses associated with Apache Hadoop implementations.

     Choice of Configurations                       Greenplum MR: Hadoop Reengineered
   • The solution provides a choice of
                                                    Greenplum MR, based on the MapR M5 Distribution, is an implementation of the
     Cisco UCS configurations, letting
     organizations select performance and           Apache Hadoop stack that enables near-real-time collection and organization
     capacity as their needs dictate                of high volumes of structured, semistructured, and unstructured data distributed
     Enterprise-Class Support and                   across a cluster of servers. Greenplum MR provides direct data input and output to
     Services                                       the cluster with MapR Direct Access Network File System (NFS), offers real-time
   • The Greenplum MR on Cisco UCS                  analytics, and is the first distribution to provide true high availability at all levels.
     Reference Configurations combine               Greenplum MR introduces the concept of logical volumes to Hadoop: a means of
     the support and services of two of the
                                                    grouping data and applying policy consistently across an entire data set. Greenplum
     world’s largest technology companies.
                                                    MR provides hardware status information and control with the MapR Control System,
                                                    a comprehensive user interface that includes a heatmap that displays the health of
                                                    the entire cluster at a glance.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                  Page 3
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012


                                                     Cisco UCS and Greenplum MR can help businesses manage many different big data
                                                     scenarios. The examples in Table 1 show how the Greenplum MR on Cisco UCS
                                                     Reference Configurations can accelerate big data initiatives.




Table 1. Sample Use Cases for Cisco UCS and Greenplum MR

  Scenario                                  Cisco Greenplum MR Reference Configuration Capabilities


  Content management                        Collect and store unstructured and semi-structured data in a fault-resilient, scalable data store that can be
                                            organized and sorted for indexing and analysis.


  Batch processing unstructured data        Batch-process large quantities of unstructured and semi-structured data: for example, data warehousing extract,
                                            transform, and load (ETL) processing.


  Medium-term data archive                  Archive data (medium-term, 12 to 36 months) from an enterprise data warehouse (EDW) database management
                                            system (DBMS) to increase the length of time that data is retained or to meet data retention and compliance
                                            policies.


  Integration with data warehouse           Transfer data stored in Hadoop to and from a separate DBMS for advanced analytics.


  Customer risk analysis                    Perform a comprehensive data assessment of customer-side risk, based on activity and behavior across products
                                            and accounts.


  Personalization and asset management      Create and model investor strategy and goals based on market data, individual asset characteristics, and reports
                                            entered into an online recommendation system.


  Trade analytics                           Analyze historical volume and trading data for individual stock symbols, variable cost of trades, and allocation of
                                            expenses.


  Credit scoring                            Update credit-scoring models using cross-functional transaction data and recent outcomes, to respond to changes
                                            such as the collapse of bubble markets. Sweep recent credit history to build transactional and temporal models.


  Retailer compromise                       Prevent or catch fraud resulting from a breach of retailer cards or accounts by monitoring, modeling, and analyzing
                                            high volumes of transaction data and extracting features and patterns.


  Miscategorized credit card fraud          Reduce false positives and prevent miscategorization of legitimate transactions as fraud, using high volumes of data
                                            to build good models.


  Next-generation credit card fraud         Perform daily cross-sectional analysis of portfolio using transaction similarities to find accounts that are being
                                            cultivated for eventual fraud, using common application elements, temporal patterns, vendors, and transaction
                                            amounts to detect similar accounts before the fraud is perpetrated.


  Customer retention                        Combine transactional customer contact information and social network data to perform attrition modeling to learn
                                            social and transaction markers for attrition and retention.


  Sentiment analysis (opinion mining) and   Find better indicators to predict bankruptcy among existing customers using sentiment analysis from social
  bankruptcy                                networking, responding quickly before the warning horizon.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                             Page 4
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012


                                                    As shown in Figure 1, the major components of Greenplum MR provide a Hadoop
   Greenplum MR                                     solution that is easy, dependable, and fast, with components that include:

   •	 Complete enterprise-class solution
                                                    •	 Advanced storage services: A replacement for the Hadoop Distributed File
                                                       System (HDFS), MapR advanced storage services provide multidimensional
   •	 Elimination of the common
      problems experienced with HDFS                   scalability and accelerate MapReduce performance. The services allow random
                                                       read and write operations while automatically compressing data in real time.
   •	 Direct access through NFS
   •	 High availability through JobTracker          •	 MapR Heatmap: MapR Heatmap provides visibility, access, and tools that offer
      enhancements and a no-                           insight into the state of the cluster. Graphical and programmatic interfaces are
      NameNode architecture                            designed to scale with the largest clusters.
   •	 From two to five times faster                 •	 MapR Control System: MapR Control System provides real-time monitoring of
      performance than other Hadoop
      distributions
                                                       the cluster health, including alarms to notify you of conditions that need to be
                                                       corrected. Alarms can also be configured to trigger email notifications.
   •	 Advanced management for clusters
      of all sizes                                  •	 MapReduce: Part of the Apache Hadoop framework, MapReduce simplifies
   •	 Robust data protection features,                 the creation of applications that process large amounts of unstructured and
      including snapshots and intercluster             structured data in parallel. Underlying hardware failures are handled transparently
      mirroring                                        for user applications, providing a reliable and fault-tolerant capability.
   •	 Comprehensive network of
                                                    •	 Hive: Hive is a data warehouse system for Hadoop that facilitates data
      enterprise business intelligence
      tools                                            summarization, impromptu queries, and analysis of large data sets. This SQL-like
                                                       interface increases the compression of stored data for improved storage resource
                                                       utilization without affecting access speed.
                                                    •	 Pig: Pig is a high-level procedural language for processing data sets in parallel
                                                       using the Hadoop MapReduce platform. Its intuitive syntax simplifies the

                                                                                  Greenplum MR for Apache Hadoop

                                                                                         MapR Control System

                                                                                     LDAP and NIS          Quotas, Alerts,              CLI and
                                                            MapR Heatmap
                                                                                      Integration           and Alarms                  REST API




                                                              Hive          Pig            Oozie           Sqoop             HBase          Whirr



                                                                                                   Nagios        Ganglia
                                                          HCatalog     Mahout       Cascading                                   Flume       Zookeeper
                                                                                                 Integration   Integration



                                                                     Easy                       Dependable                           Fast

                                                            Direct
                                                                         Real-Time                                                           Data
                                                            Access                        Volumes          Mirrors       Snapshots
                                                                         Streaming                                                        Placement
                                                             NFS




                                                                No NameNode                  High-Performance                  Stateful Failover
                                                                 Architecture                  Direct Shuffle                  and Self-Healing




                                                                                           MapR Storage Services




                                                      Figure 1. Greenplum MR Architecture and Components




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                    Page 5
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012


                                                                                                                                                                                                                                                                development of MapReduce jobs, providing an alternative programming language
                                                                                                                                                                                                                                                                to Java.
                                                                                                                                                                                                                                                             •	 HBase: HBase is the distributed, versioned, column-oriented database that
                                                                                                                                                                                                                                                                delivers random, real-time, read-write access to big data.
                                                                                                                                                                                                                                                             •	 ZooKeeper: ZooKeeper is a highly available system for coordinating distributed
                                                                                                                                                                                                                                                                processes. Applications use ZooKeeper to store and mediate updates to
                                                                                                                                                                                                                                                                important configuration information.

                                                                                                                                                                                                                                                             Cisco UCS: The Exclusive Platform for Greenplum MR
                                                                                                                                                                                                                                                             Validated through an extensive testing and development process at Greenplum and
                                                                                                                                                                                                                                                             Cisco, Cisco UCS is the exclusive hardware platform for Greenplum MR. Cisco UCS
                                                                                                                                                                                                                                                             innovations combine industry-standard, x86-architecture servers with networking
                                                                                                                                                                                                                                                             and storage access into a single converged system (Figure 2). The system is
                                                                                                                                                                                                                                                             entirely programmable using unified, model-based management to simplify and
                                                                                                                                                                                                                                                             accelerate the deployment of enterprise-class applications and services running in
                                                                                                                                                                                                                                                             bare-metal, virtualized, and cloud-computing environments.



                                                                                       Cisco Unified Computing System Components
                                                                                                                                                                                                                     Cisco UCS 6200 Series Fabric Interconnects                                                                                            Cisco UCS Fabric Interconnects
                                                                                                                                                                                                                                                                                                                                                           Integrate all components into a single
                                                                                                                                                                                                                                                                                                                                                               management domain
                                                                                                                                                                                                                                                                                                                   Cisco UCS                               Up to 2 matching interconnects per system
                                    1   2   3   4   5   6   7   8 9   10 11   12 13   14 15   16   17   18 19   20 21   22 23   24 25    26 27   28 29   30 31   32

                                                                                                                                                                                                                                                                                                                   Manager                                 Low-latency, lossless, 10 GE and FCoE connectivity
                                                                                                                                                                                               6   7   8 9   10 11   12 13   14 15   16




                                                                                                                                                                                                                                                                                                                   (Embedded)
                                                                                                                                                                                                                                                                                                                                                           Uplinks to data center networks: 10 GE, native
                                                                                                                                                                           1   2   3   4   5




                                   Cisco UCS 6248UP                                                                                                                                                                                       – or –              Cisco UCS 6296UP
                                                                                                                                                                                                                                                                                                                                                               Fibre Channel, or FCoE
                                                                                                                                                                                                                                                                                                                                                           Embedded unified, model-based management



                                                                          Data and                                                                                                                                                                                                                 Data and                                                Cisco Fabric Extenders
                                                                      Management Planes                                                                                                                                                                                                            Management Planes                                       Distribute the unified fabric to blade and rack servers
                                                                                                                                                                                                                                                                                                                                                           Scale data and management planes without complexity
                                                                                                                                                                                                                                                                                                                   Cisco Nexus
                                                                                                                                                                                                                                                                                                                                                           Cisco UCS 2208XP: Up to 160 Gbps per blade chassis
                                                                                                                                                                                                                                                                                                                   2232PP 10GE
                                                                                                                                                                                                                                                                                                                   Fabric Extender                            (with Cisco UCS 6200 Series Fabric Interconnects)
                        Cisco UCS                                                                                                                                                                                                         Cisco UCS                                                                                                        Cisco UCS 2104XP: Up to 80 Gbps per blade chassis
                     2208XP Fabric                                                                                                                                                                                                        2104XP Fabric                                                                                                    Cisco Nexus 2232PP: Integrates rack-mount servers
                          Extender                                                                                                                                                                                                        Extender                                                                                                            into the system



                                                                                                                                                                                                                                                                                                                                                           Cisco UCS 5108 Blade Chassis
                                                                                                                                                                                                                                                                                                                                                           Accommodates up to 8 half-width blade servers or
                                                                                                                                                                                                                                               Cisco UCS 5108                                                                                                 4 full-width blade servers
                                                                                                                                                                                                                                               Blade Server Chassis                                                                                        Accommodates up to two fabric extenders for
                                                                                                                                                                                                                                                                                                                                                              connectivity and management
                                                                                                                                                                                                                                                                                                                                                           Straight-through airflow, 92% efficient power supplies,
                                                                                                                                                                                                                                                                                                                                                              N+1, and grid redundancy



                                 Cisco UCS B-Series Blade Servers                                                                                                                                                                                                                            Cisco UCS C-Series Rack Servers

                                                                                                                                                                                                                                                               Scale Out
                                                                                                                                                                                                                                                                - and -
                                                                                                                            UCS B22 M3




                                                                                                                                                                 CONSOLE                                                                                                                                  UCS                                     UCS
                                                                                                                                                                                                                                                                                                                                                  C24 M3
                                                                                                                                                                                                                                                                                                          C22 M3




                                                Cisco UCS B22 M3                                                                                                                                                                                               Web 2.0           Cisco UCS C22 M3                             Cisco UCS C24 M3


                                                                                                                                                                                                                                                                                                                                                           Cisco UCS Servers
                     •                                                                                                                                                                                                                                                                                                                                     Powered exclusively by Intel Xeon processors
                                                                                                                                                                                                                                                                                                                                                           World-record-setting performance
                                                                                                                                                                                                                                                   CONSOLE




                  Cisco UCS B200 M3                                                                                                                        Cisco UCS B420 M3                                                                                                                                                                               Comprehensive product line for ease of matching
                                                                                                                                                                                                                                                               Enterprise
                                                                                                                                                                                                                                                                                                                    CONSOLE




                                                                                                                                                                                                                                                                                                                                                              servers to workloads
                                                                                                                                                                                                                                                                                   CONSOLE




                                                                                                                                                                                                                                                               - Class -
                                                                                                                                                                                                                                                                                 Cisco UCS C220 M3                            Cisco UCS C240 M3            Every aspect of identity, personality, and connectivity
                                        Cisco UCS B250 M2                                                                                                                                                                                                                                                                                                     configured by Cisco UCS Manager




                                                                                                                                                                                                                                                                 Mission
                         1   2




                                                                                                                                                                                                                                                               - Critical -
                    Cisco UCS B230 M2                                                                                                                Cisco UCS B440 M2                                                                                                           Cisco UCS C260 M2                            Cisco UCS C460 M2




                   Figure 2. Cisco UCS Components



© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                                                                                                                                                                                                                                                                          Page 6
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012



                                                    Cisco UCS helps organizations gain more than efficiency: it helps them become
   Cisco UCS
                                                    more effective through technologies that foster simplicity rather than complexity.
   The Cisco Unified Computing System               The result is a flexible, agile, and high-performance platform that reduces operating
   delivers a radical simplification of             costs with increased uptime through automation and enables more rapid return on
   traditional architecture with the
   first self-aware, self-integrating,
                                                    investment (ROI).
   converged system that automates
   system configuration in a
   reproducible, scalable manner
                                                    Reference Configurations
   •	 More than 60 world records on                 The Greenplum MR on Cisco UCS Reference Configurations are based on Cisco’s
      critical benchmarks                           big data common platform architecture (CPA), a highly scalable architecture
   •	 The benefits of centralized                   designed to meet variety of scale-out application demands with transparent data
      computing, through a single point of          integration and management integration capabilities using the following components:
      management, delivered to massive
      scale-out applications
                                                    •	 Cisco UCS 6200 Series Fabric Interconnects: The Cisco UCS 6200 Series
   •	 Self-aware and self-integrating                  Fabric Interconnects are a core part of Cisco UCS, providing both network
      system
                                                       connectivity and management capabilities across Cisco UCS 5100 Series Blade
   •	 Automatic server provisioning
                                                       Server Chassis and Cisco UCS C-Series Rack Servers. Deployed in redundant
      through association of models with
      system resources                                 pairs, the fabric Interconnects offer line-rate, low-latency, lossless 10 Gigabit
                                                       Ethernet connectivity and unified management with Cisco UCS Manager in a
   •	 Standards-based, high-bandwidth,
      low-latency, lossless Ethernet                   highly available management domain.
      network                                       •	 Cisco UCS 2200 Series Fabric Extenders: Cisco UCS 2200 Series Fabric
                                                       Extenders behave like remote line cards for a parent switch and provide a highly
                                                       scalable and extremely cost-effective unified server-access platform.
                                                    •	 Cisco UCS rack servers: Specific models are used to support the base, high-
                                                       performance, and high-capacity configurations:
                                                       -- Cisco UCS C210 M2 General-Purpose Rack Server: Cisco UCS C210 M2
                                                          servers are general-purpose 2-socket platforms based on the Intel® Xeon®
                                                          processor 5600 series. These servers support up to 192 GB of main memory
                                                          and 16 internal front-accessible, hot-swappable, Small Form Factor (SFF) disk
                                                          drives, with a choice of one or two RAID controllers for data performance and
                                                          protection.
                                                       -- Cisco UCS C240 M3 Rack Server: Cisco UCS C240 M3 Servers are designed
                                                          for both performance and expandability over a wide range of storage-
                                                          intensive infrastructure workloads. Each server provides sockets for up to two
                                                          processors from the Intel Xeon processor E5-2600 product family and up to
                                                          768 GB of main memory. Up to 24 SFF or 12 Large Form Factor (LFF) disk
                                                          drives are supported, along with four Gigabit Ethernet LAN-on-motherboard
                                                          (LOM) ports.
                                                       -- Cisco UCS virtual interface cards (VICs): Unique to Cisco, Cisco UCS VICs
                                                          incorporate next-generation converged network adapter (CNA) technology
                                                          from Cisco and offer dual10-Gbps ports designed for use with Cisco UCS
                                                          C-Series Rack Servers. Optimized for virtualized networking, these cards
                                                          deliver high performance and bandwidth utilization and support up to 256
                                                          virtual devices.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                               Page 7
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012


                                                    •	 Cisco UCS Manager: Cisco UCS Manager resides in the Cisco UCS 6200
                                                       Series Fabric Interconnects. It makes the system self-aware and self-integrating,
                                                       managing all the system components as a single logical entity. Cisco UCS
                                                       Manager can be accessed through an intuitive GUI, a command-line interface
                                                       (CLI), or an XML API. Cisco UCS Manager uses service profiles to define the
                                                       personality, configuration, and connectivity of all resources within Cisco UCS,
                                                       radically simplifying provisioning of resources so that the process takes minutes
                                                       instead of days. This simplification allows IT departments to shift their focus
                                                       from constant maintenance to strategic business initiatives. It also provides the
                                                       most streamlined, simplified approach commercially available today to firmware
                                                       updating for all server components.

                                                    Organizations deploying Hadoop clusters have different needs depending on
                                                    the nature of their computational applications and storage requirements. Cisco
                                                    understands these important distinctions and has structured its Hadoop reference
                                                    configurations to accommodate a range of diverse needs

                                                    •	 Base reference configuration: Cisco’s base reference configuration is designed
                                                       for organizations that require balanced computing and storage capacity. The
                                                       reference configuration uses Cisco UCS C210 M2 servers, each with two Intel®
                                                       Xeon® processors X5675, 96 GB of memory, and 16 1-TB 7,200-rpm SATA disk
                                                       drives (Figure 3).




                                                      •	 2 Intel Xeon Processors X5675		                   • 1 Cisco UCS P81E VIC (2x 10 Gbps)
                                                      •	 96 GB Memory 				                                 • Embedded Cisco IMC (2x 1 Gbps)
                                                      •	 1 LSI 6G MegaRAID 9261-8i Card		                  • 2 Integrated Gigabit Ethernet Ports
                                                      •	 16 1-TB 200 RPM SFF SATA Disk Drives		            • Red Hat Enterprise Linux Server Standard
                                                      •	 Redundant Hot-Swappable Power Supplies and Fans




                                                       Figure 3. Cisco UCS B210 M2 Cluster Node for Base Configuration




                                                    •	 High-performance reference configuration: Many organizations need to optimize
                                                       computing and memory performance in their Hadoop clusters, so Cisco’s high-
                                                       performance reference configuration uses Cisco UCS C240 M3 servers, each
                                                       with two Intel Xeon processors E5-2665, 256 GB of memory, and 24 1-TB
                                                       7,200-rpm SFF SATA disk drives (Figure 4).
                                                    •	 High-capacity reference configuration:. For organizations that require
                                                       abundant storage capacity for the Hadoop cluster, the high-capacity reference
                                                       configuration uses Cisco UCS C240 M3 servers, each with two Intel Xeon
                                                       processors E5-2640, 128 GB of memory, and 12 3-TB 7,200-rpm SAS disk
                                                       drives (Figure 4).




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                               Page 8
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012




                                                      •	 2 Intel Xeon Processors E5 Family		                  High-Performance Configuration (shown):
                                                      •	 Cisco UCS VIC 1225 (2x 10 Gbps) 		                   • 256 GB Memory
                                                      •	 Embedded Cisco IMC (2x 1 Gbps)		                     • 24 1-TB SATA 7200 RPM SFF Disk Drive
                                                      •	 LSI MegaRAID SAS 9226CV-8i Card		                    High-Capacity Configuration (not shown):
                                                      •	 4 Integrated Gigabit Ethernet Ports		                • 128 GB Memory
                                                      •	 Red Hat Enterprise Linux Server Standard		           • 12 3-TB SAS 7200 RPM LFF Disk Drives
                                                      •	 Redundant Hot-Swappable Power Supplies

                                                       Figure 4. Cisco UCS C240 M3 Cluster Node for High Performance and High Capacity




                                                    Table 2 summarizes the capabilities of the three reference configurations.

                                                    Table 2. Base, High-Performance, and High-Capacity Reference Configurations.

                                                                         Component               Base                 High-                High
                                                                                                 Configuration        Performance          Capacity
                                                                                                                      Configuration        Configuration


                                                       Server-level      Cisco UCS servers       Cisco UCS            Cisco UCS            Cisco UCS
                                                       capacity                                  C210 M2              C240 M3              C240 M3


                                                                         Processor               Intel Xeon           Intel Xeon           Intel Xeon
                                                                                                 X5675                M5-2665              M5-2640


                                                                         Memory                  96 GB                256 GB               128 GB


                                                                         Storage                 16 1-TB SFF          24 1-TB SFF          12 3-TB LFF
                                                                                                 7.2K-RPM SATA        7.2K-RPM SATA        7.2K-RPM SAS


                                                       Rack-level        Processor cores         192 cores and        256 cores and        192 cores and
                                                       capacity          and threads             384 threads          512 threads          384 threads
                                                       (16 servers)
                                                                         Memory                  1536 GB              4096 GB              2048 GB


                                                                         Typical user            320 terabytes        480 TB               720 TB
                                                                         storage capacity        (TB)
                                                                         (3-way replication,
                                                                         compressed)


                                                                         I/O bandwidth           21 GBps              32 GBps              17 GBps




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                    Page 9
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012



                                                                            Component                   Base             High-                   High
                                                                                                        Configuration    Performance             Capacity
                                                                                                                         Configuration           Configuration


                                                       Cisco UCS            Processor cores             1920 cores and   2560 cores and          1920 cores and
                                                       Domain-              and threads                 3840 threads     5120 threads            3840 threads
                                                       level
                                                       capacity
                                                       (10 racks)


                                                                            Typical user                3.2 petabytes    4.8 PB                  7.2 PB
                                                                            storage capacity            (PB)
                                                                            (3-way replication,
                                                                            compressed)


                                                                            I/O bandwidth               210 GBps         320 GBps                170 GBps




                                                    The Greenplum MR on Cisco UCS Reference Configurations come in single- and
                                                    multiple-rack form factors. The single-rack configuration consists of two fully
                                                    redundant Cisco UCS 6248UP 48-Port Fabric Interconnects (for up to five racks)
                                                    or Cisco UCS 6296UP 96-port Fabric Interconnects (for up to 10 racks) along
                                                    with two Cisco Nexus® 2232PP 10GE Fabric Extenders (Figure 5). Each node in
                                                    the configuration connects to the unified fabric through two active-active 10-Gbps
                                                    links using a Cisco UCS VIC (for data traffic) and Cisco Integrated Management
                                                    Controller (IMC; for management traffic). Multiple-rack configurations include the
                                                    components for a single rack and two Cisco Nexus 2232PP fabric extenders for
                                                    every additional rack.


                                                       Cisco UCS 6248UP or 6296UP
                                                       Fabric Interconnects




                                                       Cisco Nexus 2232PP 10GE
                                                       Fabric Extenders




                                                                                    Cisco UCS C210 M2
                                                                                    or C240 M3                                            Combined Data
                                                                                    Rack Server                                           and Management Traffic
                                                                                                                                          Management Traffic
                                                                                                                                          Data Traffic




                                                       Figure 5. Cisco UCS Fabric Architecture




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                      Page 10
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012


                                                    Figure 6 shows the components of the single-rack and multiple-rack configurations.
                                                    Table 3 provides general guidelines for the number of instances of each service to
                                                    run in a solution.



                                                     2x Cisco UCS 6248UP or
                                                     6296UP Fabric Interconnects
                                                                                                                                                  2x Cisco Nexus 2232PP
                                                     2x Cisco Nexus 2232PP                                                                        10GE Fabric Extenders
                                                     10GE Fabric Extenders                                                                        for Each Additional Rack




                                                                                                                                                  16x Cisco UCS Servers
                                                     16x Cisco UCS Servers                                                                        for Each Additional Rack




                                                                               Single-Rack Configuration            Multiple-Rack Configuration




                                                       Figure 6. Base, High-Performance, and High-Capacity Configurations Can Be Built as
                                                       Single- and Multiple-Rack Configurations


                                                    Table 3. Node Recommendations for Greenplum MR Services

                                                       Service                                             Number of Nodes (per Rack)


                                                       Container location database                         1 to 3


                                                       FileServer                                          Most or all nodes


                                                       HBase Master                                        1 to 3


                                                       HBase RegionServer                                  Varies


                                                       JobTracker                                          1 to 3


                                                       NFS                                                 Varies


                                                       TaskTracker                                         Most or all nodes


                                                       WebServer                                           1 or more


                                                       Zookeeper                                           1, 3, 5, or a higher odd number




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                            Page 11
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012



                                                    Excellence from Cisco and Greenplum
                                                    Hadoop implementations can present a number of challenges to enterprise
                                                    environments. Many of these challenges arise from the dichotomy between the
                                                    introduction of innovative new technology and the enterprise-class performance,
                                                    reliability, and support demanded by mission-critical systems. The collaboration
                                                    between Cisco and Greenplum is specifically designed to provide a solution to these
                                                    challenges. The joint Cisco and Greenplum solution delivers all the characteristics
                                                    expected of a fully integrated solution, including radically simplified deployment and
                                                    management, high availability, excellent performance, exceptional scalability, and
                                                    enterprise-class service and support.

                                                    Complete Big Data Analysis Solution
                                                    The comprehensive solution from Cisco and Greenplum helps organizations
                                                    deploy big data solutions quickly, with validated configurations that scale easily
                                                    and predictably, as demand dictates. The Greenplum MR on Cisco UCS Reference
                                                    Configurations provide an end-to-end solution that has been tested and validated
                                                    and that enables enterprise customers to accelerate big data initiatives.

                                                    Designed for High Availability and Reliability
                                                    Every component in the Cisco and Greenplum MR Hadoop solution is fully
                                                    redundant. The Greenplum MR architecture provides JobTracker high availability
                                                    (HA) and a no-NameNode architeture to prevent lost jobs from causing time-
                                                    consuming restarts or failover incidents. Combining the core networking capabilities
                                                    of Cisco, this solution can be extended to include remote mirroring to help ensure
                                                    data reliability by synchronizing a copy of the cluster’s data at a remote site so that
                                                    data analysis can continue in the event of a disaster. Locally, snapshots protect data
                                                    from application errors or accidental deletion. Snapshots also enable easy data
                                                    recovery to a specific point in time by simply copying a file or directory from the
                                                    snapshot location to the desired destination directory.

                                                    High Performance and Exceptional Scalability
                                                    Cisco UCS unified fabric architecture provides fully redundant, highly scalable
                                                    lossless 10-Gbps unified fabric connectivity for big data traffic. Powered by the
                                                    latest Intel Xeon processor, the Cisco and Greenplum solution delivers best-in-
                                                    class performance and internal storage capacity to gain at least two times faster
                                                    performance than Apache Hadoop. The Greenplum MR on Cisco UCS Reference
                                                    Configurations can easily scale to support a large number of nodes when required
                                                    by business demands. The advanced management capabilities of Cisco UCS
                                                    radically simplify this process with a single point of management that spans all
                                                    nodes in the cluster.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                Page 12
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012


                                                    Simplified Management
                                                    Hadoop implementations tend to involve very large numbers of servers. In traditional
                                                    environments, it can be challenging to manage these large numbers of servers
                                                    effectively. Cisco UCS Manager delivers unified, model-based management
                                                    that applies personality and configures server, network, and storage connectivity
                                                    resources, making it as easy to deploy hundreds of servers as it is to deploy a single
                                                    server. Additionally, Cisco UCS Manager can perform system maintenance activities
                                                    such as firmware update operations across the entire cluster as a single operation.
                                                    To ease the monitoring of these large clusters, the Greenplum MR control system
                                                    raises alarms and sends notifications to alert IT personnel about cluster health and
                                                    the status of services.

                                                    Coexistence with Enterprise Applications
                                                    In addition to introducing a Hadoop deployment, organizations need ways to
                                                    transfer data transparently between their enterprise applications and Hadoop. This
                                                    solution can connect, across the same management plane, to other Cisco UCS
                                                    deployments running enterprise applications, thereby radically simplifying data
                                                    center management and connectivity.

                                                    Rapid Deployment
                                                    Deployment of large numbers of servers can take time. Systems need to be racked,
                                                    networked, configured, and provisioned before they can be put into use. Cisco UCS
                                                    Manager uses a model-based approach to provision servers by applying a desired
                                                    configuration to the physical infrastructure quickly, accurately, and automatically.
                                                    The capability to create consistent configurations improves business agility and
                                                    eliminates a major source of errors that can cause downtime.

                                                    Enterprise Service and Support
                                                    Enterprises want know that the vendors providing a solution have the expertise to
                                                    help them quickly proceed through the design, deployment, and testing of strategic
                                                    big data initiatives. Businesses also need to have confidence that if a critical system
                                                    fails, they will be able to get timely and competent support. The Greenplum MR on
                                                    Cisco UCS Reference Configurations bring together world-class service and support
                                                    from long-time collaborators Cisco and Greenplum.


                                                    For More Information
                                                    For complete details about Cisco UCS, please visit http://www.cisco.com/go/ucs.

                                                    For more information about Greenplum MR on Cisco UCS, please visit:
                                                    http://www.cisco.com/go/greenplum.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                               Page 13
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
September 2012




Americas Headquarters                                        Asia Pacific Headquarters                                    Europe Headquarters
Cisco Systems, Inc.                                          Cisco Systems (USA) Pte. Ltd.                                Cisco Systems International BV Amsterdam,
San Jose, CA                                                 Singapore                                                    The Netherlands
Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices.

Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this
URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership
relationship between Cisco and any other company. (1110R)                                                                                                       LE-35101-04 09/12

Mais conteúdo relacionado

Mais procurados

HCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCL Technologies
 
Managing data to improve disaster recovery preparedness » data center knowledge
Managing data to improve disaster recovery preparedness » data center knowledgeManaging data to improve disaster recovery preparedness » data center knowledge
Managing data to improve disaster recovery preparedness » data center knowledgegeekmodeboy
 
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...João Gabriel Lima
 
Dell - HPC-29mai2012
Dell - HPC-29mai2012Dell - HPC-29mai2012
Dell - HPC-29mai2012Agora Group
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Idc paper on disaster recovery
Idc paper on disaster recoveryIdc paper on disaster recovery
Idc paper on disaster recoveryEMC Forum India
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environmentDavid Walker
 
6dec2011 - APC Solutions
6dec2011 - APC Solutions6dec2011 - APC Solutions
6dec2011 - APC SolutionsAgora Group
 
Erlang Cache
Erlang CacheErlang Cache
Erlang Cacheice j
 
Summit 2011 ods edw technical
Summit 2011 ods edw technicalSummit 2011 ods edw technical
Summit 2011 ods edw technicalGreg Turmel
 

Mais procurados (17)

HCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of Servers
 
Managing data to improve disaster recovery preparedness » data center knowledge
Managing data to improve disaster recovery preparedness » data center knowledgeManaging data to improve disaster recovery preparedness » data center knowledge
Managing data to improve disaster recovery preparedness » data center knowledge
 
Hadoop and-cisco-ucs
Hadoop and-cisco-ucsHadoop and-cisco-ucs
Hadoop and-cisco-ucs
 
Poster for ISGC
Poster for ISGCPoster for ISGC
Poster for ISGC
 
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...
 
SunGard Data Profiling
SunGard Data ProfilingSunGard Data Profiling
SunGard Data Profiling
 
Greenplum hadoop
Greenplum hadoopGreenplum hadoop
Greenplum hadoop
 
Dell - HPC-29mai2012
Dell - HPC-29mai2012Dell - HPC-29mai2012
Dell - HPC-29mai2012
 
DCIM
DCIMDCIM
DCIM
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Idc paper on disaster recovery
Idc paper on disaster recoveryIdc paper on disaster recovery
Idc paper on disaster recovery
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environment
 
6dec2011 - APC Solutions
6dec2011 - APC Solutions6dec2011 - APC Solutions
6dec2011 - APC Solutions
 
Credium case study en
Credium case study enCredium case study en
Credium case study en
 
Erlang Cache
Erlang CacheErlang Cache
Erlang Cache
 
Summit 2011 ods edw technical
Summit 2011 ods edw technicalSummit 2011 ods edw technical
Summit 2011 ods edw technical
 

Destaque

Atlantis and the end of crete 1
Atlantis and the end of crete 1Atlantis and the end of crete 1
Atlantis and the end of crete 1charliebb3
 
Atl sg245946
Atl sg245946Atl sg245946
Atl sg245946Accenture
 
Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011
Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011
Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011rtablmnl
 
The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012Accenture
 
Certify 2014trends-report
Certify 2014trends-reportCertify 2014trends-report
Certify 2014trends-reportAccenture
 
Microsoft Draft Competitors & Marketing
Microsoft Draft Competitors & MarketingMicrosoft Draft Competitors & Marketing
Microsoft Draft Competitors & MarketingJaz Blakeston-Petch
 
Calabrio analyze
Calabrio analyzeCalabrio analyze
Calabrio analyzeAccenture
 

Destaque (8)

Atlantis and the end of crete 1
Atlantis and the end of crete 1Atlantis and the end of crete 1
Atlantis and the end of crete 1
 
Atl sg245946
Atl sg245946Atl sg245946
Atl sg245946
 
Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011
Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011
Bureau Presentatie Januari 2011 Lmb D.D. 1 08 2011
 
Rock Your Profile
Rock Your ProfileRock Your Profile
Rock Your Profile
 
The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012The power of_mobile_and_social_data_webinar_slides_21_may2012
The power of_mobile_and_social_data_webinar_slides_21_may2012
 
Certify 2014trends-report
Certify 2014trends-reportCertify 2014trends-report
Certify 2014trends-report
 
Microsoft Draft Competitors & Marketing
Microsoft Draft Competitors & MarketingMicrosoft Draft Competitors & Marketing
Microsoft Draft Competitors & Marketing
 
Calabrio analyze
Calabrio analyzeCalabrio analyze
Calabrio analyze
 

Semelhante a Wp greenplum

ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosAccenture Italia
 
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC
 
Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890Gregory Pence
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionAppfluent Technology
 
White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum  White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum EMC
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 
V mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paperV mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paperEMC
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...IRJET Journal
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Integrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloudIntegrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloudJohn Atchison
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureEmiliano Fusaglia
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Swiss Data Forum Swiss Data Forum
 

Semelhante a Wp greenplum (20)

ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
 
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
 
Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum  White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
IBM zEnterprise Strategy for the Private Cloud
IBM zEnterprise Strategy for the Private CloudIBM zEnterprise Strategy for the Private Cloud
IBM zEnterprise Strategy for the Private Cloud
 
V mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paperV mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paper
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Integrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloudIntegrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloud
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 

Mais de Accenture

Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011Accenture
 
Perf stat windows
Perf stat windowsPerf stat windows
Perf stat windowsAccenture
 
Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...Accenture
 
NetApp system installation workbook Spokane
NetApp system installation workbook SpokaneNetApp system installation workbook Spokane
NetApp system installation workbook SpokaneAccenture
 
Migrate volume in akfiler7
Migrate volume in akfiler7Migrate volume in akfiler7
Migrate volume in akfiler7Accenture
 
Migrate vol in akfiler7
Migrate vol in akfiler7Migrate vol in akfiler7
Migrate vol in akfiler7Accenture
 
Data storage requirements AK
Data storage requirements AKData storage requirements AK
Data storage requirements AKAccenture
 
C mode class
C mode classC mode class
C mode classAccenture
 
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012Accenture
 
Reporting demo
Reporting demoReporting demo
Reporting demoAccenture
 
Net app virtualization preso
Net app virtualization presoNet app virtualization preso
Net app virtualization presoAccenture
 
Providence net app upgrade plan PPMC
Providence net app upgrade plan PPMCProvidence net app upgrade plan PPMC
Providence net app upgrade plan PPMCAccenture
 
WSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsWSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsAccenture
 
50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deployment50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deploymentAccenture
 
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...Accenture
 
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11Accenture
 
Snap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenarioSnap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenarioAccenture
 
Ref arch for ve sg248155
Ref arch for ve sg248155Ref arch for ve sg248155
Ref arch for ve sg248155Accenture
 
PAM g.tr 3832
PAM g.tr 3832PAM g.tr 3832
PAM g.tr 3832Accenture
 

Mais de Accenture (20)

Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011
 
Perf stat windows
Perf stat windowsPerf stat windows
Perf stat windows
 
Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...Performance problems on ethernet networks when the e0m management interface i...
Performance problems on ethernet networks when the e0m management interface i...
 
NetApp system installation workbook Spokane
NetApp system installation workbook SpokaneNetApp system installation workbook Spokane
NetApp system installation workbook Spokane
 
Migrate volume in akfiler7
Migrate volume in akfiler7Migrate volume in akfiler7
Migrate volume in akfiler7
 
Migrate vol in akfiler7
Migrate vol in akfiler7Migrate vol in akfiler7
Migrate vol in akfiler7
 
Data storage requirements AK
Data storage requirements AKData storage requirements AK
Data storage requirements AK
 
C mode class
C mode classC mode class
C mode class
 
Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012Akfiler upgrades providence july 2012
Akfiler upgrades providence july 2012
 
NA notes
NA notesNA notes
NA notes
 
Reporting demo
Reporting demoReporting demo
Reporting demo
 
Net app virtualization preso
Net app virtualization presoNet app virtualization preso
Net app virtualization preso
 
Providence net app upgrade plan PPMC
Providence net app upgrade plan PPMCProvidence net app upgrade plan PPMC
Providence net app upgrade plan PPMC
 
WSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutionsWSC Net App storage for windows challenges and solutions
WSC Net App storage for windows challenges and solutions
 
50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deployment50,000-seat_VMware_view_deployment
50,000-seat_VMware_view_deployment
 
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
Tr 3998 -deployment_guide_for_hosted_shared_desktops_and_on-demand_applicatio...
 
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
Tr 3749 -net_app_storage_best_practices_for_v_mware_vsphere,_dec_11
 
Snap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenarioSnap mirror source to tape to destination scenario
Snap mirror source to tape to destination scenario
 
Ref arch for ve sg248155
Ref arch for ve sg248155Ref arch for ve sg248155
Ref arch for ve sg248155
 
PAM g.tr 3832
PAM g.tr 3832PAM g.tr 3832
PAM g.tr 3832
 

Wp greenplum

  • 1. White Paper Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012
  • 2. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Contents Next-Generation Hadoop Solution.................................................................... 3 Greenplum MR: Hadoop Reengineered................................................................. 3 Cisco UCS: The Exclusive Platform for Greenplum MR.......................................... 6 Reference Configurations................................................................................. 7 Excellence from Cisco and Greenplum............................................................. 12 Complete Big Data Analysis Solution..................................................................... 12 Designed for High Availability and Reliability........................................................... 12 High Performance and Exceptional Scalability....................................................... 12 Simplified Management......................................................................................... 13 Coexistence with Enterprise Applications.............................................................. 13 Rapid Deployment.................................................................................................. 13 Enterprise Service and Support ............................................................................ 13 For More Information........................................................................................ 13 © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 2
  • 3. Cisco and Greenplum Partner to White Paper September 2012 Deliver High-Performance Hadoop Reference Configurations Highlights Greenplum MR on Cisco UCS provides companies with an integrated Hadoop solution that delivers advanced Optimized for Performance performance, full data protection, no single point of failure, • Greenplum and Cisco deliver an integrated Hadoop solution and improved data-access features that can expedite the specifically engineered to handle the implementation of big data analytics environments. most demanding Hadoop workloads. Cisco UCS Creates a Flexible Appliance Platform Next-Generation Hadoop Solution • The Cisco Unified Computing System™ (Cisco UCS®) provides a The worldwide leader in data center networking, and now a leading competitor in flexible, high-performance platform the server market, Cisco is partnering with Greenplum to provide a best-in-class that can be optimized and easily big data solutions that meet a range of needs. The Greenplum MR on Cisco UCS® scaled for any size of Hadoop cluster. Reference Configurations deliver integrated, end-to-end software and hardware Ease of Management infrastructure that accelerates big data initiatives with a choice of performance and • Cisco UCS Manager provides unified, capacity. The combination of world-leading Cisco Unified Computing System™ embedded management of all (Cisco UCS) and Greenplum MR enables companies to significantly reduce time-to- computing, networking, and storage- access resources. value and the operating expenses associated with Apache Hadoop implementations. Choice of Configurations Greenplum MR: Hadoop Reengineered • The solution provides a choice of Greenplum MR, based on the MapR M5 Distribution, is an implementation of the Cisco UCS configurations, letting organizations select performance and Apache Hadoop stack that enables near-real-time collection and organization capacity as their needs dictate of high volumes of structured, semistructured, and unstructured data distributed Enterprise-Class Support and across a cluster of servers. Greenplum MR provides direct data input and output to Services the cluster with MapR Direct Access Network File System (NFS), offers real-time • The Greenplum MR on Cisco UCS analytics, and is the first distribution to provide true high availability at all levels. Reference Configurations combine Greenplum MR introduces the concept of logical volumes to Hadoop: a means of the support and services of two of the grouping data and applying policy consistently across an entire data set. Greenplum world’s largest technology companies. MR provides hardware status information and control with the MapR Control System, a comprehensive user interface that includes a heatmap that displays the health of the entire cluster at a glance. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 3
  • 4. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Cisco UCS and Greenplum MR can help businesses manage many different big data scenarios. The examples in Table 1 show how the Greenplum MR on Cisco UCS Reference Configurations can accelerate big data initiatives. Table 1. Sample Use Cases for Cisco UCS and Greenplum MR Scenario Cisco Greenplum MR Reference Configuration Capabilities Content management Collect and store unstructured and semi-structured data in a fault-resilient, scalable data store that can be organized and sorted for indexing and analysis. Batch processing unstructured data Batch-process large quantities of unstructured and semi-structured data: for example, data warehousing extract, transform, and load (ETL) processing. Medium-term data archive Archive data (medium-term, 12 to 36 months) from an enterprise data warehouse (EDW) database management system (DBMS) to increase the length of time that data is retained or to meet data retention and compliance policies. Integration with data warehouse Transfer data stored in Hadoop to and from a separate DBMS for advanced analytics. Customer risk analysis Perform a comprehensive data assessment of customer-side risk, based on activity and behavior across products and accounts. Personalization and asset management Create and model investor strategy and goals based on market data, individual asset characteristics, and reports entered into an online recommendation system. Trade analytics Analyze historical volume and trading data for individual stock symbols, variable cost of trades, and allocation of expenses. Credit scoring Update credit-scoring models using cross-functional transaction data and recent outcomes, to respond to changes such as the collapse of bubble markets. Sweep recent credit history to build transactional and temporal models. Retailer compromise Prevent or catch fraud resulting from a breach of retailer cards or accounts by monitoring, modeling, and analyzing high volumes of transaction data and extracting features and patterns. Miscategorized credit card fraud Reduce false positives and prevent miscategorization of legitimate transactions as fraud, using high volumes of data to build good models. Next-generation credit card fraud Perform daily cross-sectional analysis of portfolio using transaction similarities to find accounts that are being cultivated for eventual fraud, using common application elements, temporal patterns, vendors, and transaction amounts to detect similar accounts before the fraud is perpetrated. Customer retention Combine transactional customer contact information and social network data to perform attrition modeling to learn social and transaction markers for attrition and retention. Sentiment analysis (opinion mining) and Find better indicators to predict bankruptcy among existing customers using sentiment analysis from social bankruptcy networking, responding quickly before the warning horizon. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 4
  • 5. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 As shown in Figure 1, the major components of Greenplum MR provide a Hadoop Greenplum MR solution that is easy, dependable, and fast, with components that include: • Complete enterprise-class solution • Advanced storage services: A replacement for the Hadoop Distributed File System (HDFS), MapR advanced storage services provide multidimensional • Elimination of the common problems experienced with HDFS scalability and accelerate MapReduce performance. The services allow random read and write operations while automatically compressing data in real time. • Direct access through NFS • High availability through JobTracker • MapR Heatmap: MapR Heatmap provides visibility, access, and tools that offer enhancements and a no- insight into the state of the cluster. Graphical and programmatic interfaces are NameNode architecture designed to scale with the largest clusters. • From two to five times faster • MapR Control System: MapR Control System provides real-time monitoring of performance than other Hadoop distributions the cluster health, including alarms to notify you of conditions that need to be corrected. Alarms can also be configured to trigger email notifications. • Advanced management for clusters of all sizes • MapReduce: Part of the Apache Hadoop framework, MapReduce simplifies • Robust data protection features, the creation of applications that process large amounts of unstructured and including snapshots and intercluster structured data in parallel. Underlying hardware failures are handled transparently mirroring for user applications, providing a reliable and fault-tolerant capability. • Comprehensive network of • Hive: Hive is a data warehouse system for Hadoop that facilitates data enterprise business intelligence tools summarization, impromptu queries, and analysis of large data sets. This SQL-like interface increases the compression of stored data for improved storage resource utilization without affecting access speed. • Pig: Pig is a high-level procedural language for processing data sets in parallel using the Hadoop MapReduce platform. Its intuitive syntax simplifies the Greenplum MR for Apache Hadoop MapR Control System LDAP and NIS Quotas, Alerts, CLI and MapR Heatmap Integration and Alarms REST API Hive Pig Oozie Sqoop HBase Whirr Nagios Ganglia HCatalog Mahout Cascading Flume Zookeeper Integration Integration Easy Dependable Fast Direct Real-Time Data Access Volumes Mirrors Snapshots Streaming Placement NFS No NameNode High-Performance Stateful Failover Architecture Direct Shuffle and Self-Healing MapR Storage Services Figure 1. Greenplum MR Architecture and Components © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 5
  • 6. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 development of MapReduce jobs, providing an alternative programming language to Java. • HBase: HBase is the distributed, versioned, column-oriented database that delivers random, real-time, read-write access to big data. • ZooKeeper: ZooKeeper is a highly available system for coordinating distributed processes. Applications use ZooKeeper to store and mediate updates to important configuration information. Cisco UCS: The Exclusive Platform for Greenplum MR Validated through an extensive testing and development process at Greenplum and Cisco, Cisco UCS is the exclusive hardware platform for Greenplum MR. Cisco UCS innovations combine industry-standard, x86-architecture servers with networking and storage access into a single converged system (Figure 2). The system is entirely programmable using unified, model-based management to simplify and accelerate the deployment of enterprise-class applications and services running in bare-metal, virtualized, and cloud-computing environments. Cisco Unified Computing System Components Cisco UCS 6200 Series Fabric Interconnects Cisco UCS Fabric Interconnects Integrate all components into a single management domain Cisco UCS Up to 2 matching interconnects per system 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Manager Low-latency, lossless, 10 GE and FCoE connectivity 6 7 8 9 10 11 12 13 14 15 16 (Embedded) Uplinks to data center networks: 10 GE, native 1 2 3 4 5 Cisco UCS 6248UP – or – Cisco UCS 6296UP Fibre Channel, or FCoE Embedded unified, model-based management Data and Data and Cisco Fabric Extenders Management Planes Management Planes Distribute the unified fabric to blade and rack servers Scale data and management planes without complexity Cisco Nexus Cisco UCS 2208XP: Up to 160 Gbps per blade chassis 2232PP 10GE Fabric Extender (with Cisco UCS 6200 Series Fabric Interconnects) Cisco UCS Cisco UCS Cisco UCS 2104XP: Up to 80 Gbps per blade chassis 2208XP Fabric 2104XP Fabric Cisco Nexus 2232PP: Integrates rack-mount servers Extender Extender into the system Cisco UCS 5108 Blade Chassis Accommodates up to 8 half-width blade servers or Cisco UCS 5108 4 full-width blade servers Blade Server Chassis Accommodates up to two fabric extenders for connectivity and management Straight-through airflow, 92% efficient power supplies, N+1, and grid redundancy Cisco UCS B-Series Blade Servers Cisco UCS C-Series Rack Servers Scale Out - and - UCS B22 M3 CONSOLE UCS UCS C24 M3 C22 M3 Cisco UCS B22 M3 Web 2.0 Cisco UCS C22 M3 Cisco UCS C24 M3 Cisco UCS Servers • Powered exclusively by Intel Xeon processors World-record-setting performance CONSOLE Cisco UCS B200 M3 Cisco UCS B420 M3 Comprehensive product line for ease of matching Enterprise CONSOLE servers to workloads CONSOLE - Class - Cisco UCS C220 M3 Cisco UCS C240 M3 Every aspect of identity, personality, and connectivity Cisco UCS B250 M2 configured by Cisco UCS Manager Mission 1 2 - Critical - Cisco UCS B230 M2 Cisco UCS B440 M2 Cisco UCS C260 M2 Cisco UCS C460 M2 Figure 2. Cisco UCS Components © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 6
  • 7. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Cisco UCS helps organizations gain more than efficiency: it helps them become Cisco UCS more effective through technologies that foster simplicity rather than complexity. The Cisco Unified Computing System The result is a flexible, agile, and high-performance platform that reduces operating delivers a radical simplification of costs with increased uptime through automation and enables more rapid return on traditional architecture with the first self-aware, self-integrating, investment (ROI). converged system that automates system configuration in a reproducible, scalable manner Reference Configurations • More than 60 world records on The Greenplum MR on Cisco UCS Reference Configurations are based on Cisco’s critical benchmarks big data common platform architecture (CPA), a highly scalable architecture • The benefits of centralized designed to meet variety of scale-out application demands with transparent data computing, through a single point of integration and management integration capabilities using the following components: management, delivered to massive scale-out applications • Cisco UCS 6200 Series Fabric Interconnects: The Cisco UCS 6200 Series • Self-aware and self-integrating Fabric Interconnects are a core part of Cisco UCS, providing both network system connectivity and management capabilities across Cisco UCS 5100 Series Blade • Automatic server provisioning Server Chassis and Cisco UCS C-Series Rack Servers. Deployed in redundant through association of models with system resources pairs, the fabric Interconnects offer line-rate, low-latency, lossless 10 Gigabit Ethernet connectivity and unified management with Cisco UCS Manager in a • Standards-based, high-bandwidth, low-latency, lossless Ethernet highly available management domain. network • Cisco UCS 2200 Series Fabric Extenders: Cisco UCS 2200 Series Fabric Extenders behave like remote line cards for a parent switch and provide a highly scalable and extremely cost-effective unified server-access platform. • Cisco UCS rack servers: Specific models are used to support the base, high- performance, and high-capacity configurations: -- Cisco UCS C210 M2 General-Purpose Rack Server: Cisco UCS C210 M2 servers are general-purpose 2-socket platforms based on the Intel® Xeon® processor 5600 series. These servers support up to 192 GB of main memory and 16 internal front-accessible, hot-swappable, Small Form Factor (SFF) disk drives, with a choice of one or two RAID controllers for data performance and protection. -- Cisco UCS C240 M3 Rack Server: Cisco UCS C240 M3 Servers are designed for both performance and expandability over a wide range of storage- intensive infrastructure workloads. Each server provides sockets for up to two processors from the Intel Xeon processor E5-2600 product family and up to 768 GB of main memory. Up to 24 SFF or 12 Large Form Factor (LFF) disk drives are supported, along with four Gigabit Ethernet LAN-on-motherboard (LOM) ports. -- Cisco UCS virtual interface cards (VICs): Unique to Cisco, Cisco UCS VICs incorporate next-generation converged network adapter (CNA) technology from Cisco and offer dual10-Gbps ports designed for use with Cisco UCS C-Series Rack Servers. Optimized for virtualized networking, these cards deliver high performance and bandwidth utilization and support up to 256 virtual devices. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 7
  • 8. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 • Cisco UCS Manager: Cisco UCS Manager resides in the Cisco UCS 6200 Series Fabric Interconnects. It makes the system self-aware and self-integrating, managing all the system components as a single logical entity. Cisco UCS Manager can be accessed through an intuitive GUI, a command-line interface (CLI), or an XML API. Cisco UCS Manager uses service profiles to define the personality, configuration, and connectivity of all resources within Cisco UCS, radically simplifying provisioning of resources so that the process takes minutes instead of days. This simplification allows IT departments to shift their focus from constant maintenance to strategic business initiatives. It also provides the most streamlined, simplified approach commercially available today to firmware updating for all server components. Organizations deploying Hadoop clusters have different needs depending on the nature of their computational applications and storage requirements. Cisco understands these important distinctions and has structured its Hadoop reference configurations to accommodate a range of diverse needs • Base reference configuration: Cisco’s base reference configuration is designed for organizations that require balanced computing and storage capacity. The reference configuration uses Cisco UCS C210 M2 servers, each with two Intel® Xeon® processors X5675, 96 GB of memory, and 16 1-TB 7,200-rpm SATA disk drives (Figure 3). • 2 Intel Xeon Processors X5675 • 1 Cisco UCS P81E VIC (2x 10 Gbps) • 96 GB Memory • Embedded Cisco IMC (2x 1 Gbps) • 1 LSI 6G MegaRAID 9261-8i Card • 2 Integrated Gigabit Ethernet Ports • 16 1-TB 200 RPM SFF SATA Disk Drives • Red Hat Enterprise Linux Server Standard • Redundant Hot-Swappable Power Supplies and Fans Figure 3. Cisco UCS B210 M2 Cluster Node for Base Configuration • High-performance reference configuration: Many organizations need to optimize computing and memory performance in their Hadoop clusters, so Cisco’s high- performance reference configuration uses Cisco UCS C240 M3 servers, each with two Intel Xeon processors E5-2665, 256 GB of memory, and 24 1-TB 7,200-rpm SFF SATA disk drives (Figure 4). • High-capacity reference configuration:. For organizations that require abundant storage capacity for the Hadoop cluster, the high-capacity reference configuration uses Cisco UCS C240 M3 servers, each with two Intel Xeon processors E5-2640, 128 GB of memory, and 12 3-TB 7,200-rpm SAS disk drives (Figure 4). © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 8
  • 9. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 • 2 Intel Xeon Processors E5 Family High-Performance Configuration (shown): • Cisco UCS VIC 1225 (2x 10 Gbps) • 256 GB Memory • Embedded Cisco IMC (2x 1 Gbps) • 24 1-TB SATA 7200 RPM SFF Disk Drive • LSI MegaRAID SAS 9226CV-8i Card High-Capacity Configuration (not shown): • 4 Integrated Gigabit Ethernet Ports • 128 GB Memory • Red Hat Enterprise Linux Server Standard • 12 3-TB SAS 7200 RPM LFF Disk Drives • Redundant Hot-Swappable Power Supplies Figure 4. Cisco UCS C240 M3 Cluster Node for High Performance and High Capacity Table 2 summarizes the capabilities of the three reference configurations. Table 2. Base, High-Performance, and High-Capacity Reference Configurations. Component Base High- High Configuration Performance Capacity Configuration Configuration Server-level Cisco UCS servers Cisco UCS Cisco UCS Cisco UCS capacity C210 M2 C240 M3 C240 M3 Processor Intel Xeon Intel Xeon Intel Xeon X5675 M5-2665 M5-2640 Memory 96 GB 256 GB 128 GB Storage 16 1-TB SFF 24 1-TB SFF 12 3-TB LFF 7.2K-RPM SATA 7.2K-RPM SATA 7.2K-RPM SAS Rack-level Processor cores 192 cores and 256 cores and 192 cores and capacity and threads 384 threads 512 threads 384 threads (16 servers) Memory 1536 GB 4096 GB 2048 GB Typical user 320 terabytes 480 TB 720 TB storage capacity (TB) (3-way replication, compressed) I/O bandwidth 21 GBps 32 GBps 17 GBps © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 9
  • 10. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Component Base High- High Configuration Performance Capacity Configuration Configuration Cisco UCS Processor cores 1920 cores and 2560 cores and 1920 cores and Domain- and threads 3840 threads 5120 threads 3840 threads level capacity (10 racks) Typical user 3.2 petabytes 4.8 PB 7.2 PB storage capacity (PB) (3-way replication, compressed) I/O bandwidth 210 GBps 320 GBps 170 GBps The Greenplum MR on Cisco UCS Reference Configurations come in single- and multiple-rack form factors. The single-rack configuration consists of two fully redundant Cisco UCS 6248UP 48-Port Fabric Interconnects (for up to five racks) or Cisco UCS 6296UP 96-port Fabric Interconnects (for up to 10 racks) along with two Cisco Nexus® 2232PP 10GE Fabric Extenders (Figure 5). Each node in the configuration connects to the unified fabric through two active-active 10-Gbps links using a Cisco UCS VIC (for data traffic) and Cisco Integrated Management Controller (IMC; for management traffic). Multiple-rack configurations include the components for a single rack and two Cisco Nexus 2232PP fabric extenders for every additional rack. Cisco UCS 6248UP or 6296UP Fabric Interconnects Cisco Nexus 2232PP 10GE Fabric Extenders Cisco UCS C210 M2 or C240 M3 Combined Data Rack Server and Management Traffic Management Traffic Data Traffic Figure 5. Cisco UCS Fabric Architecture © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 10
  • 11. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Figure 6 shows the components of the single-rack and multiple-rack configurations. Table 3 provides general guidelines for the number of instances of each service to run in a solution. 2x Cisco UCS 6248UP or 6296UP Fabric Interconnects 2x Cisco Nexus 2232PP 2x Cisco Nexus 2232PP 10GE Fabric Extenders 10GE Fabric Extenders for Each Additional Rack 16x Cisco UCS Servers 16x Cisco UCS Servers for Each Additional Rack Single-Rack Configuration Multiple-Rack Configuration Figure 6. Base, High-Performance, and High-Capacity Configurations Can Be Built as Single- and Multiple-Rack Configurations Table 3. Node Recommendations for Greenplum MR Services Service Number of Nodes (per Rack) Container location database 1 to 3 FileServer Most or all nodes HBase Master 1 to 3 HBase RegionServer Varies JobTracker 1 to 3 NFS Varies TaskTracker Most or all nodes WebServer 1 or more Zookeeper 1, 3, 5, or a higher odd number © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 11
  • 12. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Excellence from Cisco and Greenplum Hadoop implementations can present a number of challenges to enterprise environments. Many of these challenges arise from the dichotomy between the introduction of innovative new technology and the enterprise-class performance, reliability, and support demanded by mission-critical systems. The collaboration between Cisco and Greenplum is specifically designed to provide a solution to these challenges. The joint Cisco and Greenplum solution delivers all the characteristics expected of a fully integrated solution, including radically simplified deployment and management, high availability, excellent performance, exceptional scalability, and enterprise-class service and support. Complete Big Data Analysis Solution The comprehensive solution from Cisco and Greenplum helps organizations deploy big data solutions quickly, with validated configurations that scale easily and predictably, as demand dictates. The Greenplum MR on Cisco UCS Reference Configurations provide an end-to-end solution that has been tested and validated and that enables enterprise customers to accelerate big data initiatives. Designed for High Availability and Reliability Every component in the Cisco and Greenplum MR Hadoop solution is fully redundant. The Greenplum MR architecture provides JobTracker high availability (HA) and a no-NameNode architeture to prevent lost jobs from causing time- consuming restarts or failover incidents. Combining the core networking capabilities of Cisco, this solution can be extended to include remote mirroring to help ensure data reliability by synchronizing a copy of the cluster’s data at a remote site so that data analysis can continue in the event of a disaster. Locally, snapshots protect data from application errors or accidental deletion. Snapshots also enable easy data recovery to a specific point in time by simply copying a file or directory from the snapshot location to the desired destination directory. High Performance and Exceptional Scalability Cisco UCS unified fabric architecture provides fully redundant, highly scalable lossless 10-Gbps unified fabric connectivity for big data traffic. Powered by the latest Intel Xeon processor, the Cisco and Greenplum solution delivers best-in- class performance and internal storage capacity to gain at least two times faster performance than Apache Hadoop. The Greenplum MR on Cisco UCS Reference Configurations can easily scale to support a large number of nodes when required by business demands. The advanced management capabilities of Cisco UCS radically simplify this process with a single point of management that spans all nodes in the cluster. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 12
  • 13. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Simplified Management Hadoop implementations tend to involve very large numbers of servers. In traditional environments, it can be challenging to manage these large numbers of servers effectively. Cisco UCS Manager delivers unified, model-based management that applies personality and configures server, network, and storage connectivity resources, making it as easy to deploy hundreds of servers as it is to deploy a single server. Additionally, Cisco UCS Manager can perform system maintenance activities such as firmware update operations across the entire cluster as a single operation. To ease the monitoring of these large clusters, the Greenplum MR control system raises alarms and sends notifications to alert IT personnel about cluster health and the status of services. Coexistence with Enterprise Applications In addition to introducing a Hadoop deployment, organizations need ways to transfer data transparently between their enterprise applications and Hadoop. This solution can connect, across the same management plane, to other Cisco UCS deployments running enterprise applications, thereby radically simplifying data center management and connectivity. Rapid Deployment Deployment of large numbers of servers can take time. Systems need to be racked, networked, configured, and provisioned before they can be put into use. Cisco UCS Manager uses a model-based approach to provision servers by applying a desired configuration to the physical infrastructure quickly, accurately, and automatically. The capability to create consistent configurations improves business agility and eliminates a major source of errors that can cause downtime. Enterprise Service and Support Enterprises want know that the vendors providing a solution have the expertise to help them quickly proceed through the design, deployment, and testing of strategic big data initiatives. Businesses also need to have confidence that if a critical system fails, they will be able to get timely and competent support. The Greenplum MR on Cisco UCS Reference Configurations bring together world-class service and support from long-time collaborators Cisco and Greenplum. For More Information For complete details about Cisco UCS, please visit http://www.cisco.com/go/ucs. For more information about Greenplum MR on Cisco UCS, please visit: http://www.cisco.com/go/greenplum. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 13
  • 14. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations September 2012 Americas Headquarters Asia Pacific Headquarters Europe Headquarters Cisco Systems, Inc. Cisco Systems (USA) Pte. Ltd. Cisco Systems International BV Amsterdam, San Jose, CA Singapore The Netherlands Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices. Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R) LE-35101-04 09/12