SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
The Infrastructure of Tomorrow, Today –
  Integrating Supermicro, Greenplum and SAS
           to enable Big Data Analytics




                            Jeff Tsai 蔡穎碩
                           Solution Manager


                                              © Supermicro 2012
Agenda


Big Data Analytics Platform & Infrastructure
EMC+Supermicro
   1,000 Nodes Hadoop Cluster
!!!
“Big Data Is Less                                        !!!
 About Size, And
 More About
 Freedom”
          ―Techcrunch
                                                                             !!!
                        THE ERA OF
                                                                       !!!

            BIG DATA
                            “Findings: „Big Data‟
                         !!! Is More Extreme
                             Than Volume”
                                                         “Big Data! It‟s Real,
                          IS HERE…           ― Gartner    It‟s Real-time, and
                                                          It‟s Already
               “Total data: „bigger‟                      Changing Your
                than big data”                            World”
        !!!              ― 451 Group
                                       !!!
                                                         !!!                 ―IDB
Data Sources Are Expanding




                                           THE DIGITAL UNIVERSE WILL



                                            GROW 44X
                                           IN THE NEXT 10 YEARS




Source : 2011 IDC Digital Universe Study
BIG Data is Just a Bunch of Data to Store…?                                               OR
                    90
                    80
                    70
                    60
                    50

       Big          40
      Data          30
     Sources        20
                    10
                      0
                            2009 2010 2011 2012 2013 2014
                        File Based: 60.7% CAGR       Block Based: 21.8% CAGR

                  By 2012, 80% of all storage capacity sold will be for file-based data

                     Source: IDC
To Create Significant value to your business…




                                      HOW?...
Make BIG Data
Accessible
   Identify the data source
   Store the data
   Connect applications and users
   Utilize the data in different views
EMC UAP Solutions – Analytics Platform



 This is what my
     analytics
environment looks
      like…
Building The Big Data Analytics
               “Stack”
                                    Analytic Toolsets
                              (Business Analytics, BI, Statistics, etc.)



                                   Greenplum Chorus
                            Enterprise Collaboration Platform for Data



                  Greenplum Data Computing Appliances
                               Purpose-built for Big Data Analytics



    Greenplum Database                                               Greenplum HD
    Enterprise & Community Editions                       Hadoop Enterprise & Community Editions

World’s Most Scalable MPP Database Platform              Enterprise Analytics Platform for Unstructured Data
Greenplum Becomes the Foundation
 of EMC’s Data Computing Division
    E M C A C Q U I R E S G R E E N P L U M O N J U LY 2 0 1 0




  “For three years, Gartner has identified Greenplum as
       the most advanced vendor in the visionary
quadrant of its data warehouse DBMS Magic Quadrant….”
                         – Gartner
SAS at a Glance
Company Highlight:
•   Founded 1976: 11,000+ employees in 400+
    offices
•   2010 worldwide revenue $2.43 B
•   IDC: SAS is leader in Analytics with a 34.5%
    market share : Analytics and Reporting
•   4.5 million users worldwide
•   50,000+sites in 114 countries
•   From Tools to Vertical Solutions
                                                                        Services
                                                               Retail
                                                                         11%
                                                         Other 4%                            Financial Services
                                                          2%                                       42%
                                                   Manufacturing
                                                       6%
                                                     Healthcare
                                                                                                 Communications
                                                   & Life Sciences
                                                                                                     8%
                                                          8%
                                                          Government                         Education
                                                             14%        Energy & Utilities     3%
                                                                              2%
Overview

                                                             SMC Inc., HQ       SMC BV,
                                                             San Jose, CA       The Netherlands




                                                                                 SMC TW,
                                                                                 Taiwan



   Founded in 1993, HQ– San Jose, CA, 2007 NASDAQ: SMCI

Revenues:                      FY09    $500M, FY10        $721M , FY11   ~$1B
Global Footprint:   >100 Countries
Production:                    US, EU and Asia Production facilities
Engineering:        70% of workforce in engineering (30% growth through recession)
Market Share:       #1 Server Channel (SMCI enables ~10% of global server market)
Brand Equity:       Growing public profile since 2007 IPO

Corporate Focus:    Energy Efficiency, Earth-friendly,   Green Technology Innovation
Product Family
Resource Optimized (WIO/UIO)           Twin Architecture   GPU SuperComputing




   Data Center Optimized                                      Embedded




Application Optimized: Multi I/O                              SuperBlade




                                       Workstation
Mainstream Business Solutions                               Storage Server
In-House Design and Server Building Block Solutions®

                Technology Partners Server Building Block Solutions®      Customer Requirements
                                         Application Optimized
                                                                            OEM
                                                                            Specs
                                                                                          Tri-Lab
                                                                            Optimized
                                                                           Data Center

                                              In-House Design

                                    Server Building Block Solutions®
                                              > 350                                       Operating
               >550            >1300                      > 140 Power      Open
                                             Cooling                                      Systems /
           Motherboards       Chassis                        Supplies   CPU/ Memory
                                             Modules                                     Applications




(1) As of Q2, 2009
Big Data Analytics on Hadoop
Internet companies are not built on SQL but are building Analytics on Hadoop/NoSQL


                             Existing Hadoop Users (Internet)

      This is what I think                                                      BI &
                                                               ETL Tools                    Web Apps
         my analytics                                                         Reporting
      environment looks
             like…




                                   Management & Coordination
                                                                 Pig            Hive        HBase



                                        Hadoop System                  MapReduce Layer



                                                                           Hadoop Storage

   Web Portal,
 Social Networks
Hadoop Components (hadoop.apache.org)
    HDFS      • Hadoop Distributed File System


 MapReduce    • Framework for writing scalable data applications


     Pig      • Procedural language that abstracts lower level MapReduce


  Zookeeper   • Highly reliable distributed coordination


    Hive      • Data warehouse infrastructure built on top of Hadoop


   HBase      • Database for random, real time read/write access


    Oozie     • workflow/coordination to manage jobs


   Mahout     • Scalable machine learning libraries
What can Hadoop do for you?

 Financial Services                        Web & e-Tailing
    Better knowing customers                    Web usage, click stream behavior
    Risk analysis and management.               Market & customer segmentation
    Fraud detection and security                Ad customer targeting
     analytics.                                  On-line fraud detection


 Telecommunications                        Government
      Customer churn prevention.              Fraud detection
      Price optimization and marketing        Compliance and regulatory analytics
      Network analysis and optimization
      Customer experience management       Retail
                                               Market and consumer segmentation
 Healthcare                                   Merchandizing and cross-selling
    Patient care quality                      Promotion and campaign analysis
    Drug development




                                                                   Data Source: Cloudera
Hadoop Use Cases


 Linkedin – “People You May Know” and other facts

 Yahoo! – Hadoop to support AdSystems and web search

 Visa – Credit card fraud detection and analysis

 T-Mobile – Churn analysis, user experience

 Amazon, Baidu, AOL, eBay, Facebook, Twitter, …




                                                    Data Source: Cloudera
Hadoop Cluster HW selection
 What’s the HW configuration for Hadoop clusters?...
  It depends, workloads matter.

            CPU Intensive                  I/O Intensive

         Machine learning                Data importing and exporting
         Natural language processing     Indexing
         Complex data mining             Searching
         Feature extraction              Grouping
                                         Decoding/decompressing



            Data Storage
          Capacity
                                       General Configuration
          # of data mirroring
                                         2 Quad Core CPUs
                                         16-96GB Memory
              TCO                        2 x GE
          Rack space                     1TB-2TB Disk x n
          Power consumption              1U/2U Rack mount
          Different workloads
Proven at Scale with Worldwide Support
Production-scale testing of Apache Trunk & hosted environment for customer POC‟s


                                               Industry’s largest Hadoop
                                                support team
                                                    Industry‟s most accomplished
                                                     Hadoop talents (from Yahoo!,
                                                     LinkedIn, Talend, etc.)
                                               Tested at scale on the
                                                Greenplum Analytics
                                                Workbench
                                                    1,000-node, 24-petabyte cluster
                                                    Multi-million dollar investment
                                                     by EMC and partners
                                                    Reduced risk for EMC
         Bringing Rapid Innovation                   customers
                to Hadoop
                                                    Certification of partner products
Supermicro Server Functions in the Cluster
Supermicro
Data Nodes




2U Storage Server




Supermicro Infrastructure
Nodes
                                        • 1,000+ Physical Supermicro Server Nodes
                                         (10k virtual nodes)
                                        • 12,000 Processor Cores
                                        • 24 Petabytes of Storage Capacity (6Gbps SATA)
                                        • 48 Terabytes RAM
     2U Twin2 Server                    • 56 Gbps Infiniband Connectivity
Supermicro Multi-Node Server Solutions




                Switch Data Center - Las Vegas NV
Minutes                Initial Benchmark Data




…Results before fine-tuning.
     World record performance results expected to be announced before 2013.
Other testing programs – Supermicro & Intel
              CPU Benchmark
Supermicro Advantages
 Why Supermicro…


   Building Blocks for different                     High Efficiency, High Quality
    Workloads & Requirement
                                                  -Green IT
  -Meet any Hadoop workloads by models            -High Efficiency Power
        -I/O, CPU, Disks, Density                 -High Quality for highest system availability and
  - Customize by specific workload requirement    best utilization




             Proven solutions                                      TCO

  -EMC Greenplum proven solutions                 Solutions to Cost-Effective Hadoop Clusters
  -100% Apache Hadoop Compatible                  Best choice of Hadoop Hardware platforms
  -Benchmark and testing programs with partners
Turnkey Hadoop:
          Supermicro Complete Rack Solutions

   One Stop Shop for Hardware, End to End Total
   Solutions


        Speedup Deployment With Ready to Run Rack
        Systems


          Single Source, Consistent Build Quality and
          Delivery Time


        Multi-Vendor Compatibility Test, Zero
        Compatibility Issue



   Premium Service With Competitive Pricing



Shipped Directly From US, NL, TW
Broad Product Portfolios and Building Blocks




    Best platform to your Hadoop cluster
SMC Inc., HQ   SMC BV,
            San Jose, CA   The Netherlands




                           SMC TW,
                           Taiwan




  Q&A
Thank You

Mais conteúdo relacionado

Mais procurados

Information Management in the Age of Big Data
Information Management in the Age of Big DataInformation Management in the Age of Big Data
Information Management in the Age of Big Databigdatasyd
 
Connecting The Dots from Data to Value
Connecting The Dots from Data to Value Connecting The Dots from Data to Value
Connecting The Dots from Data to Value Core Solutions, Inc.
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaManoj Saxena
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTEdward Curry
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
 
Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08
Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08
Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08sydcarr
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
Digi-Tech Marketing Data Strategy
Digi-Tech Marketing Data StrategyDigi-Tech Marketing Data Strategy
Digi-Tech Marketing Data StrategyDatalicious
 
Data Curation at the New York Times
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York TimesEdward Curry
 
Hadoop, oracle and the industrial revolution of data
Hadoop, oracle and the industrial revolution of data Hadoop, oracle and the industrial revolution of data
Hadoop, oracle and the industrial revolution of data Guy Harrison
 
Great Net Power Point 1109
Great Net Power Point 1109Great Net Power Point 1109
Great Net Power Point 1109markschimel
 
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries  The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries CONFENIS 2012
 
BCS APSG The landscape of enterprise applications
BCS APSG The landscape of enterprise applicationsBCS APSG The landscape of enterprise applications
BCS APSG The landscape of enterprise applicationsGeoff Sharman
 
The State of Open Source BI Adoption
The State of Open Source BI AdoptionThe State of Open Source BI Adoption
The State of Open Source BI Adoptionmark madsen
 
Analyze to Optimize
Analyze to OptimizeAnalyze to Optimize
Analyze to OptimizeDatalicious
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
PCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentationPCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentationAnn Treacy
 
Tapping into the Neglected $4B Market
Tapping into the Neglected $4B MarketTapping into the Neglected $4B Market
Tapping into the Neglected $4B Marketgaryeflores
 

Mais procurados (20)

Information Management in the Age of Big Data
Information Management in the Age of Big DataInformation Management in the Age of Big Data
Information Management in the Age of Big Data
 
Connecting The Dots from Data to Value
Connecting The Dots from Data to Value Connecting The Dots from Data to Value
Connecting The Dots from Data to Value
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. Saxena
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's Journey
 
Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08
Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08
Esg Wp Isilon Scale Out Nas Comes Of Age Sep 08
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
Digi-Tech Marketing Data Strategy
Digi-Tech Marketing Data StrategyDigi-Tech Marketing Data Strategy
Digi-Tech Marketing Data Strategy
 
Data Curation at the New York Times
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York Times
 
Hadoop, oracle and the industrial revolution of data
Hadoop, oracle and the industrial revolution of data Hadoop, oracle and the industrial revolution of data
Hadoop, oracle and the industrial revolution of data
 
Great Net Power Point 1109
Great Net Power Point 1109Great Net Power Point 1109
Great Net Power Point 1109
 
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries  The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries
 
BCS APSG The landscape of enterprise applications
BCS APSG The landscape of enterprise applicationsBCS APSG The landscape of enterprise applications
BCS APSG The landscape of enterprise applications
 
The State of Open Source BI Adoption
The State of Open Source BI AdoptionThe State of Open Source BI Adoption
The State of Open Source BI Adoption
 
Analyze to Optimize
Analyze to OptimizeAnalyze to Optimize
Analyze to Optimize
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Presentación de Duk Hee Lee
Presentación de Duk Hee LeePresentación de Duk Hee Lee
Presentación de Duk Hee Lee
 
PCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentationPCs for People october 2012 broadband taskforce presentation
PCs for People october 2012 broadband taskforce presentation
 
Mobile game changer 2011
Mobile game changer 2011Mobile game changer 2011
Mobile game changer 2011
 
Tapping into the Neglected $4B Market
Tapping into the Neglected $4B MarketTapping into the Neglected $4B Market
Tapping into the Neglected $4B Market
 

Destaque (8)

101 cd 1315-1345
101 cd 1315-1345101 cd 1315-1345
101 cd 1315-1345
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
Greenplum hadoop
Greenplum hadoopGreenplum hadoop
Greenplum hadoop
 
101 cd 1630-1700
101 cd 1630-1700101 cd 1630-1700
101 cd 1630-1700
 
101 ab 1530-1600
101 ab 1530-1600101 ab 1530-1600
101 ab 1530-1600
 
102 1630 1700
102 1630 1700102 1630 1700
102 1630 1700
 
101 ab 1600-1630
101 ab 1600-1630101 ab 1600-1630
101 ab 1600-1630
 
101 cd 1345-1415
101 cd 1345-1415101 cd 1345-1415
101 cd 1345-1415
 

Semelhante a 101 ab 1415-1445

Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...
Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...
Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...Hitachi Data Systems France
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldBigDataViz
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...Denodo
 
Building a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in CanadaBuilding a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in CanadaTechAlliance of Southwestern Ontario
 
Smarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldSmarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldKim Escherich
 
Big data and Analytics
Big data and AnalyticsBig data and Analytics
Big data and AnalyticsKevin Magee
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
01 roland top storage trends_praha_02
01 roland top storage trends_praha_0201 roland top storage trends_praha_02
01 roland top storage trends_praha_02IDC_CEMA
 
The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...
The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...
The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...auexpo Conference
 
Keynote by Mario Derba at Oracle event in Luxembourg
Keynote by Mario Derba at Oracle event in LuxembourgKeynote by Mario Derba at Oracle event in Luxembourg
Keynote by Mario Derba at Oracle event in LuxembourgMario Derba
 
Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...
Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...
Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...SAP Analytics
 
The Zen and Art of IT Management (VM World Keynote 2012)
The Zen and Art of IT Management (VM World Keynote 2012)The Zen and Art of IT Management (VM World Keynote 2012)
The Zen and Art of IT Management (VM World Keynote 2012)CA Technologies
 
CentriLogic's Downtown Toronto Data Center Grand Opening
CentriLogic's Downtown Toronto Data Center Grand OpeningCentriLogic's Downtown Toronto Data Center Grand Opening
CentriLogic's Downtown Toronto Data Center Grand OpeningCentriLogic
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analyticskatsoulis
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
SessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptxSessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptxssuser993127
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Cloudera, Inc.
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalCOMMON Europe
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 

Semelhante a 101 ab 1415-1445 (20)

Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...
Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...
Vision et Stratégie d'Hitachi Data Systems Randy DEMONT, Executive Vice Presi...
 
Kim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our WorldKim Escherich - How Big Data Transforms Our World
Kim Escherich - How Big Data Transforms Our World
 
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
NIIT and Denodo: Business Continuity Planning in the times of the Covid-19 Pa...
 
Building a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in CanadaBuilding a Globally Competitive Position for Digital Media in Canada
Building a Globally Competitive Position for Digital Media in Canada
 
Smarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldSmarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our world
 
Big data and Analytics
Big data and AnalyticsBig data and Analytics
Big data and Analytics
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
01 roland top storage trends_praha_02
01 roland top storage trends_praha_0201 roland top storage trends_praha_02
01 roland top storage trends_praha_02
 
The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...
The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...
The End of the Wild-West of Data – Relevance and Regulation: the Cornerstones...
 
Keynote by Mario Derba at Oracle event in Luxembourg
Keynote by Mario Derba at Oracle event in LuxembourgKeynote by Mario Derba at Oracle event in Luxembourg
Keynote by Mario Derba at Oracle event in Luxembourg
 
Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...
Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...
Inspiring Analytics: Tips and Examples for Achieving Better Business, Not Jus...
 
The Zen and Art of IT Management (VM World Keynote 2012)
The Zen and Art of IT Management (VM World Keynote 2012)The Zen and Art of IT Management (VM World Keynote 2012)
The Zen and Art of IT Management (VM World Keynote 2012)
 
CentriLogic's Downtown Toronto Data Center Grand Opening
CentriLogic's Downtown Toronto Data Center Grand OpeningCentriLogic's Downtown Toronto Data Center Grand Opening
CentriLogic's Downtown Toronto Data Center Grand Opening
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analytics
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
SessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptxSessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptx
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
 
MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 final
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 

Mais de Chiou-Nan Chen

Mais de Chiou-Nan Chen (20)

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Emc keynote 1030 1130
Emc keynote 1030 1130Emc keynote 1030 1130
Emc keynote 1030 1130
 
Emc keynote 0945 1030
Emc keynote 0945 1030Emc keynote 0945 1030
Emc keynote 0945 1030
 
Emc keynote 0930 0945
Emc keynote 0930 0945Emc keynote 0930 0945
Emc keynote 0930 0945
 
102 1600-1630
102 1600-1630102 1600-1630
102 1600-1630
 
102 1530-1600
102 1530-1600102 1530-1600
102 1530-1600
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
102 1315-1345
102 1315-1345102 1315-1345
102 1315-1345
 
102 1445 1515
102 1445 1515102 1445 1515
102 1445 1515
 
101 cd 1600-1630
101 cd 1600-1630101 cd 1600-1630
101 cd 1600-1630
 
101 cd 1445-1515
101 cd 1445-1515101 cd 1445-1515
101 cd 1445-1515
 
101 cd 1415-1445
101 cd 1415-1445101 cd 1415-1445
101 cd 1415-1445
 

101 ab 1415-1445

  • 1. The Infrastructure of Tomorrow, Today – Integrating Supermicro, Greenplum and SAS to enable Big Data Analytics Jeff Tsai 蔡穎碩 Solution Manager © Supermicro 2012
  • 2. Agenda Big Data Analytics Platform & Infrastructure EMC+Supermicro  1,000 Nodes Hadoop Cluster
  • 3. !!! “Big Data Is Less !!! About Size, And More About Freedom” ―Techcrunch !!! THE ERA OF !!! BIG DATA “Findings: „Big Data‟ !!! Is More Extreme Than Volume” “Big Data! It‟s Real, IS HERE… ― Gartner It‟s Real-time, and It‟s Already “Total data: „bigger‟ Changing Your than big data” World” !!! ― 451 Group !!! !!! ―IDB
  • 4. Data Sources Are Expanding THE DIGITAL UNIVERSE WILL GROW 44X IN THE NEXT 10 YEARS Source : 2011 IDC Digital Universe Study
  • 5. BIG Data is Just a Bunch of Data to Store…? OR 90 80 70 60 50 Big 40 Data 30 Sources 20 10 0 2009 2010 2011 2012 2013 2014 File Based: 60.7% CAGR Block Based: 21.8% CAGR By 2012, 80% of all storage capacity sold will be for file-based data Source: IDC
  • 6. To Create Significant value to your business… HOW?...
  • 7. Make BIG Data Accessible  Identify the data source  Store the data  Connect applications and users  Utilize the data in different views
  • 8. EMC UAP Solutions – Analytics Platform This is what my analytics environment looks like…
  • 9. Building The Big Data Analytics “Stack” Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Greenplum HD Enterprise & Community Editions Hadoop Enterprise & Community Editions World’s Most Scalable MPP Database Platform Enterprise Analytics Platform for Unstructured Data
  • 10. Greenplum Becomes the Foundation of EMC’s Data Computing Division E M C A C Q U I R E S G R E E N P L U M O N J U LY 2 0 1 0 “For three years, Gartner has identified Greenplum as the most advanced vendor in the visionary quadrant of its data warehouse DBMS Magic Quadrant….” – Gartner
  • 11.
  • 12. SAS at a Glance Company Highlight: • Founded 1976: 11,000+ employees in 400+ offices • 2010 worldwide revenue $2.43 B • IDC: SAS is leader in Analytics with a 34.5% market share : Analytics and Reporting • 4.5 million users worldwide • 50,000+sites in 114 countries • From Tools to Vertical Solutions Services Retail 11% Other 4% Financial Services 2% 42% Manufacturing 6% Healthcare Communications & Life Sciences 8% 8% Government Education 14% Energy & Utilities 3% 2%
  • 13. Overview SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Founded in 1993, HQ– San Jose, CA, 2007 NASDAQ: SMCI Revenues: FY09 $500M, FY10 $721M , FY11 ~$1B Global Footprint: >100 Countries Production: US, EU and Asia Production facilities Engineering: 70% of workforce in engineering (30% growth through recession) Market Share: #1 Server Channel (SMCI enables ~10% of global server market) Brand Equity: Growing public profile since 2007 IPO Corporate Focus: Energy Efficiency, Earth-friendly, Green Technology Innovation
  • 14. Product Family Resource Optimized (WIO/UIO) Twin Architecture GPU SuperComputing Data Center Optimized Embedded Application Optimized: Multi I/O SuperBlade Workstation Mainstream Business Solutions Storage Server
  • 15. In-House Design and Server Building Block Solutions® Technology Partners Server Building Block Solutions® Customer Requirements Application Optimized OEM Specs Tri-Lab Optimized Data Center In-House Design Server Building Block Solutions® > 350 Operating >550 >1300 > 140 Power Open Cooling Systems / Motherboards Chassis Supplies CPU/ Memory Modules Applications (1) As of Q2, 2009
  • 16. Big Data Analytics on Hadoop Internet companies are not built on SQL but are building Analytics on Hadoop/NoSQL Existing Hadoop Users (Internet) This is what I think BI & ETL Tools Web Apps my analytics Reporting environment looks like… Management & Coordination Pig Hive HBase Hadoop System MapReduce Layer Hadoop Storage Web Portal, Social Networks
  • 17. Hadoop Components (hadoop.apache.org) HDFS • Hadoop Distributed File System MapReduce • Framework for writing scalable data applications Pig • Procedural language that abstracts lower level MapReduce Zookeeper • Highly reliable distributed coordination Hive • Data warehouse infrastructure built on top of Hadoop HBase • Database for random, real time read/write access Oozie • workflow/coordination to manage jobs Mahout • Scalable machine learning libraries
  • 18. What can Hadoop do for you?  Financial Services  Web & e-Tailing  Better knowing customers  Web usage, click stream behavior  Risk analysis and management.  Market & customer segmentation  Fraud detection and security  Ad customer targeting analytics.  On-line fraud detection  Telecommunications  Government  Customer churn prevention.  Fraud detection  Price optimization and marketing  Compliance and regulatory analytics  Network analysis and optimization  Customer experience management  Retail  Market and consumer segmentation  Healthcare  Merchandizing and cross-selling  Patient care quality  Promotion and campaign analysis  Drug development Data Source: Cloudera
  • 19. Hadoop Use Cases  Linkedin – “People You May Know” and other facts  Yahoo! – Hadoop to support AdSystems and web search  Visa – Credit card fraud detection and analysis  T-Mobile – Churn analysis, user experience  Amazon, Baidu, AOL, eBay, Facebook, Twitter, … Data Source: Cloudera
  • 20. Hadoop Cluster HW selection  What’s the HW configuration for Hadoop clusters?... It depends, workloads matter. CPU Intensive I/O Intensive Machine learning Data importing and exporting Natural language processing Indexing Complex data mining Searching Feature extraction Grouping Decoding/decompressing Data Storage Capacity General Configuration # of data mirroring 2 Quad Core CPUs 16-96GB Memory TCO 2 x GE Rack space 1TB-2TB Disk x n Power consumption 1U/2U Rack mount Different workloads
  • 21. Proven at Scale with Worldwide Support Production-scale testing of Apache Trunk & hosted environment for customer POC‟s  Industry’s largest Hadoop support team  Industry‟s most accomplished Hadoop talents (from Yahoo!, LinkedIn, Talend, etc.)  Tested at scale on the Greenplum Analytics Workbench  1,000-node, 24-petabyte cluster  Multi-million dollar investment by EMC and partners  Reduced risk for EMC Bringing Rapid Innovation customers to Hadoop  Certification of partner products
  • 22. Supermicro Server Functions in the Cluster Supermicro Data Nodes 2U Storage Server Supermicro Infrastructure Nodes • 1,000+ Physical Supermicro Server Nodes (10k virtual nodes) • 12,000 Processor Cores • 24 Petabytes of Storage Capacity (6Gbps SATA) • 48 Terabytes RAM 2U Twin2 Server • 56 Gbps Infiniband Connectivity
  • 23. Supermicro Multi-Node Server Solutions Switch Data Center - Las Vegas NV
  • 24. Minutes Initial Benchmark Data …Results before fine-tuning.  World record performance results expected to be announced before 2013.
  • 25. Other testing programs – Supermicro & Intel CPU Benchmark
  • 26. Supermicro Advantages  Why Supermicro… Building Blocks for different High Efficiency, High Quality Workloads & Requirement -Green IT -Meet any Hadoop workloads by models -High Efficiency Power -I/O, CPU, Disks, Density -High Quality for highest system availability and - Customize by specific workload requirement best utilization Proven solutions TCO -EMC Greenplum proven solutions Solutions to Cost-Effective Hadoop Clusters -100% Apache Hadoop Compatible Best choice of Hadoop Hardware platforms -Benchmark and testing programs with partners
  • 27. Turnkey Hadoop: Supermicro Complete Rack Solutions One Stop Shop for Hardware, End to End Total Solutions Speedup Deployment With Ready to Run Rack Systems Single Source, Consistent Build Quality and Delivery Time Multi-Vendor Compatibility Test, Zero Compatibility Issue Premium Service With Competitive Pricing Shipped Directly From US, NL, TW
  • 28. Broad Product Portfolios and Building Blocks Best platform to your Hadoop cluster
  • 29. SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Q&A Thank You