SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Architecting business
critical enterprise
application:
Automated Support


Kumar Palaniappan
Enterprise Architect, NetApp
Agenda

¡  NetApp’s Business Challenge
¡  Solution Architecture
¡  Best Practices
¡  Performance Benchmarks
¡  Questions




                                  2
The AutoSupport Family
The Foundation of NetApp Support Strategies

            ¡  Catch issues before they become critical
            ¡  Secure automated “call-home” service
            ¡  System monitoring and nonintrusive
                alerting
            ¡  RMA requests without customer action
            ¡  Enables faster incident management

      “My AutoSupport Upgrade Advisor tool does all the hard work
       for me, saving me 4 to 5 hours of work per storage system and
       providing an upgrade plan that’s complete and easy to follow.”

                                                                        3
AutoSupport – Why Does it Matter?
                   Customers                    Partners                            NetApp
                                                                                     Product Adoption & Usage
Product Planning
                                                                       Install Base Mgmt
 & Development
                                                                           Data Mining
                                                                        Lead Generation
   Pre Sales                                                          Stickiness Measurements
                                          “What If’ Scenarios & Capacity Planning
                                                 Establish Initial Call Home

  Deployment                                 Measure Implementation Effectiveness
                                             Storage usage Monitoring & Billing (NAFS)
                                               Event-Based Triggers & Alerts                         Automated
                                                                                                     E2E Case
   Technical                                     Automated Case Creation                              Handling
    Support
                        Automated…                                                  …Parts & Support Dispatch

                              SAM Services: 1) Proactive Health Checks 2) Upgrade Planning
    Proactive
   Planning &                      Storage Efficiency Measurements & Recommendations
  Optimization       PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning

                                                                                      Critical to Quality Metrics
    Product                                                                           Adoption & Usage Metrics
   Feedback
                                                                                    Quality & Reliability Metrics

                                                               NetApp Confidential – Limited Use                4
Business Challenges




   Gateways                ETL               Data Warehouse                              Reporting
                                        •  Only 5% of data goes into the
•  600K ASUPs        •  Data needs to                                            •  Numerous mining
                                           data warehouse, rest
   every week           be parsed and                                               requests are not satisfied
                                           unstructured. It’s growing
                        loaded in 15                                                currently
•  40% coming over                         6-8TB per month
   the weekend          mins                                                     •  Huge untapped potential
                                        •  Oracle DBMS struggling to
                                                                                    of valuable information for
•  .5% growth week                         scale, maintenance and
                                                                                    lead generation,
   over week                               backups challenging
                                                                                    supportability, and BI
                                        •  No easy way to access this
                                           unstructured content


       Finally, the incoming load doubles every 16 months!
                                             NetApp Confidential – Limited Use                               5
Incoming AutoSupport Volumes
   and TB Consumption
6,000
                          Actual (tb)                            Projected
5,000                     Double                                 High Count & Size

                          Low Count & Size
4,000


3,000


2,000


1,000


   0
        Jan-00


                 Jan-01


                          Jan-02


                                   Jan-03




                                                     Jan-05


                                                              Jan-06


                                                                       Jan-07


                                                                                Jan-08


                                                                                         Jan-09


                                                                                                  Jan-10


                                                                                                           Jan-11


                                                                                                                    Jan-12


                                                                                                                             Jan-13




                                                                                                                                               Jan-15


                                                                                                                                                        Jan-16


                                                                                                                                                                 Jan-17
                                            Jan-04




                                                                                                                                      Jan-14
    ¡  At projected current rate of growth,
        total storage requirements continue
        doubling every 16 months
    ¡  Cost Model:
        > $15M per year Ecosystem costs

                                                                        NetApp Confidential – Limited Use                                                                 6
New Functionality Needed


 Weeks
                                          Product
                                          Analysis
                                                                   Service
                Cross Sell &                  Performance
                  Up Sell                      Planning
                                Customer
                               Intelligence                         Sales
                   License
                 Management           Proactive
                                      Support
              Customer                                             Product
             Self Service                                        Development
Seconds
          Gigabytes                                  Petabytes


                                                                             7
Solution Architecture




                        8
Hadoop Architecture




Ingest   F Ingest HDFS       Ingest                           Lookup
         l                                           ASUP
         u          Logs,
         m                                           Config      R
         e       Performance                                           Tools
                 and raw config                       Data       E
                                                                 S
                                                                 T


                                         Subscribe
                             MapReduce                  Pig
                   Analyze




                Metrics, Analytics, EBI
                                                                               9
Solution Architecture




                        10
Data Ingestion
¡  Use of Flume (v1) to consume large XML objects up to
  20 MB compressed ea.
¡  4 agents feed 2 collectors in production
¡  Basic Process Control using supervisord (ZK in R2?)
¡  Reliability Mode: Disk Failover (Store on Failure)
¡  Separate sinks for Text and Binary sections
¡  Arrival time bucketing by minute
¡  Snappy Sequence Files with JSON values
¡  Evaluating Flume NG
¡  Ingesting 4.5 TB uncompressed/week 80% in an 8
    hour window
Data Transformation
¡  Ingested data processed every 1 min. (w/ 5 min. lag)
  –  Relies on Fair Scheduler to meet SLA
  –  Oozie (R0) -> Pentaho PDI (R1) for scheduling
¡  Configuration data written to HBase using Avro
¡  Duplicate data written to HDFS as Hive / JSON for ad
    hoc queries
¡  User scans of HBase for ad hoc queries avoided to
    meet SLA
¡  Also simplifies data access
    –  query tools don’t yet have support for Avro
       serialization in HBase
    –  they all assume String keys and values (evolving to
       support Avro)
Low Latency Application Data Access
¡  High performance REST lookups
¡  Data stored as Avro serialized objects for
    performance and versioning
¡  Solr used to search for objects (one core per region)
¡  Then details pulled from HBase
¡  Large objects (logs) indexed and pulled from HDFS
¡  ~100 HBase regions (500 GB ea.)
  –  no splitting
  –  Snappy compressed tables
¡  Future: HBase coprocessors to keep Solr indexes up
    to date
Export to Oracle DSS

¡  Pentaho pulls data from HBase and HDFS
¡  Pushes into Oracle star schema
¡  Daily export
 –  530 million rows and 350 GB on peak days
¡  Runs on 2 VMs
 –  64 GB RAM, 12 cores
¡  Enables existing BI tools (OBIE) to query DSS
    database
Disaster Recovery
¡  DR cluster with 75% of production capacity
    –  in Release 2
¡  Active/active from Flume back
    –  Primary cluster the one HTTP/SMTP responder
¡  SLA: cannot lose >1 hour of data
  –  can be lost in front-end switchover
¡  HBase incremental backups
¡  Staging used frequently for engineering test,
    operationally expensive so not used for DR
NetApp Open
Solution for Hadoop
(NOSH)




                      16
HDFS Storage: Key Needs
Attribute     Key Drivers                                 Requirement

Performance   •  Fast response time for                   •  Minimize Network bottlenecks
                 search, ad-hoc, and real-                •  Optimize server workload
                 time queries                             •  Leverage storage HW to
              •  High replication counts                     increase cluster performance
                 impact throughput

Opex          •  Lower operational costs for              •  Optimize usable storage
                 managing huge amounts of                    capacity
                 data                                     •  Decouple storage from
              •  Controlling staff costs and                 compute nodes to decrease
                 cluster management costs                    the need to add more
                 as clusters scale                           compute nodes

Enterprise    •  Protect SPOF at the                      •  Protect cluster metadata from
Robustness       Hadoop name node                            SPOF
              •  Minimize cluster rebuild                 •  Minimize risks where
                                                             equipment tends to fail

                              NetApp Confidential – Limited Use                              17
NetApp Open Solution for Hadoop
                                     NFS over 1GbE
                      HDFS                              ¡  Easy to Deploy, Manage and Scale
 10GbE
                     NameNode                           ¡  Uses High Performance storage
                                           FAS2040          –  Resilient and Compact
                     Secondary                              –  RAID Protection of Data
                     NameNode
                                                            –  Less Network Congestion
                                                        ¡  Raw Capacity and density
Map                                                         –  120TB or 180TB in 4U
Reduce
                    DataNodes /                             –  Fully serviceable storage system
                    TaskTracker        4 separate shared
JobTracker
                          :                             ¡  Reliability
                                       nothing partitions
                                         per datanode
                                                            –  Hardware RAID & hot swap prevent
                                                               job restart due to node go off-line in
                                                               case of media failure
                                            E2660
                    DataNodes /                             –  Reliable metadata (Name Node)
                    TaskTracker
                                    6Gb/s SAS Direct
                                     Connect (1 per
                                      DataNode)
                                                                 Enterprise Class Hadoop
         10GbE Links (1 per Node)


                                             NetApp Confidential – Limited Use                          18
Performance and
Scaling




                  19
Linear Throughput Scaling as
             DataNode Count Increases
                            Read/Write Throughput
             6000
                    Tot Read Throughput (MB/s)
             5000   Tot Write Throughput (MB/s)

             4000
Throughput




             3000

             2000

             1000

                0
                    4           8              12                       24
                               DataNodes per Configuration Tested

                                    NetApp Confidential – Limited Use        20
Summary




          21
Takeaways
¡  Hadoop-based Big Data architecture
    enables
  –  Cost effective scaling
  –  Low latency access to data
  –  Ad hoc issues & pattern detection
  –  Predictive modeling in future
¡  Using our own innovative Hadoop storage
    technology NOSH
¡  An enterprise transformation


                                              22
¡  Kumar Palaniappan
                                                                                  @megamda


© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without
prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp,
the NetApp logo, and Go further, faster, are trademarks or registered trademarks of NetApp, Inc.
in the United States and/or other countries. All other brands or products are trademarks or
registered trademarks of their respective holders and should be treated as such.

Mais conteúdo relacionado

Destaque

Contratos aleatórios CIVIL 3
Contratos aleatórios CIVIL 3Contratos aleatórios CIVIL 3
Contratos aleatórios CIVIL 3tuliomedeiross
 
Estipulações contratuais em relação a terceiros - CIVIL 3
Estipulações contratuais em relação a terceiros - CIVIL 3Estipulações contratuais em relação a terceiros - CIVIL 3
Estipulações contratuais em relação a terceiros - CIVIL 3tuliomedeiross
 
Amor TráGico Marco Antonio Y Cleopatra
Amor TráGico Marco Antonio Y CleopatraAmor TráGico Marco Antonio Y Cleopatra
Amor TráGico Marco Antonio Y Cleopatraguestf7d577
 
Archaeology from the Air: Lecture 1, Nottingham Autumn 2014
Archaeology from the Air: Lecture 1,   Nottingham Autumn 2014Archaeology from the Air: Lecture 1,   Nottingham Autumn 2014
Archaeology from the Air: Lecture 1, Nottingham Autumn 2014Keith Challis
 
Enriquecimento sem causa
Enriquecimento sem causa Enriquecimento sem causa
Enriquecimento sem causa tuliomedeiross
 
Classificação dos contratos - CIVIL 3
Classificação dos contratos  - CIVIL 3Classificação dos contratos  - CIVIL 3
Classificação dos contratos - CIVIL 3tuliomedeiross
 
Apps r us #niedcamp 2015
Apps r us #niedcamp 2015Apps r us #niedcamp 2015
Apps r us #niedcamp 2015Amanda Salt
 
NICILT June 2016
NICILT June 2016NICILT June 2016
NICILT June 2016Amanda Salt
 
Lavage des mains
Lavage des mainsLavage des mains
Lavage des mainsLalema Inc.
 

Destaque (9)

Contratos aleatórios CIVIL 3
Contratos aleatórios CIVIL 3Contratos aleatórios CIVIL 3
Contratos aleatórios CIVIL 3
 
Estipulações contratuais em relação a terceiros - CIVIL 3
Estipulações contratuais em relação a terceiros - CIVIL 3Estipulações contratuais em relação a terceiros - CIVIL 3
Estipulações contratuais em relação a terceiros - CIVIL 3
 
Amor TráGico Marco Antonio Y Cleopatra
Amor TráGico Marco Antonio Y CleopatraAmor TráGico Marco Antonio Y Cleopatra
Amor TráGico Marco Antonio Y Cleopatra
 
Archaeology from the Air: Lecture 1, Nottingham Autumn 2014
Archaeology from the Air: Lecture 1,   Nottingham Autumn 2014Archaeology from the Air: Lecture 1,   Nottingham Autumn 2014
Archaeology from the Air: Lecture 1, Nottingham Autumn 2014
 
Enriquecimento sem causa
Enriquecimento sem causa Enriquecimento sem causa
Enriquecimento sem causa
 
Classificação dos contratos - CIVIL 3
Classificação dos contratos  - CIVIL 3Classificação dos contratos  - CIVIL 3
Classificação dos contratos - CIVIL 3
 
Apps r us #niedcamp 2015
Apps r us #niedcamp 2015Apps r us #niedcamp 2015
Apps r us #niedcamp 2015
 
NICILT June 2016
NICILT June 2016NICILT June 2016
NICILT June 2016
 
Lavage des mains
Lavage des mainsLavage des mains
Lavage des mains
 

Semelhante a Architecting BigData Enterprise Application-HadoopSummit2012

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Cloudera, Inc.
 
Collaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater valueCollaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater valueSAP Ariba
 
SAP Analytics for Procurement
SAP Analytics for ProcurementSAP Analytics for Procurement
SAP Analytics for ProcurementHenner Schliebs
 
Oracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And RoadmapOracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And RoadmapJerome Leonard
 
Radium presentation sap.upload
Radium presentation   sap.uploadRadium presentation   sap.upload
Radium presentation sap.uploadbobj-vivek
 
Analytics for procurement health care
Analytics for procurement health careAnalytics for procurement health care
Analytics for procurement health careHenner Schliebs
 
Analytics For Procurement Health Care
Analytics For Procurement Health CareAnalytics For Procurement Health Care
Analytics For Procurement Health CareHenner Schliebs
 
MVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNOMVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNOwlmurphy
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
An introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.comAn introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.comsharedserviceslink.com
 
Practical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBSPractical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBSSamsung Electronics
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Nek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility mNek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility mOracle Hrvatska
 
Acumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule IntegrationAcumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule IntegrationAcumen
 
Business Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done RightBusiness Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done RightDavid Sogn
 
Session7part1
Session7part1Session7part1
Session7part1abiraaman
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDevOps.com
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 

Semelhante a Architecting BigData Enterprise Application-HadoopSummit2012 (20)

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
 
Collaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater valueCollaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater value
 
SAP Analytics for Procurement
SAP Analytics for ProcurementSAP Analytics for Procurement
SAP Analytics for Procurement
 
Oracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And RoadmapOracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And Roadmap
 
Radium presentation sap.upload
Radium presentation   sap.uploadRadium presentation   sap.upload
Radium presentation sap.upload
 
Analytics for procurement health care
Analytics for procurement health careAnalytics for procurement health care
Analytics for procurement health care
 
Analytics For Procurement Health Care
Analytics For Procurement Health CareAnalytics For Procurement Health Care
Analytics For Procurement Health Care
 
Technical presentation
Technical presentationTechnical presentation
Technical presentation
 
MVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNOMVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNO
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Ams Webinar 25 March 2010 Jf Final[1]
Ams Webinar 25 March 2010 Jf Final[1]Ams Webinar 25 March 2010 Jf Final[1]
Ams Webinar 25 March 2010 Jf Final[1]
 
An introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.comAn introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.com
 
Practical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBSPractical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBS
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Nek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility mNek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility m
 
Acumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule IntegrationAcumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule Integration
 
Business Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done RightBusiness Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done Right
 
Session7part1
Session7part1Session7part1
Session7part1
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOps
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Architecting BigData Enterprise Application-HadoopSummit2012

  • 1. Architecting business critical enterprise application: Automated Support Kumar Palaniappan Enterprise Architect, NetApp
  • 2. Agenda ¡  NetApp’s Business Challenge ¡  Solution Architecture ¡  Best Practices ¡  Performance Benchmarks ¡  Questions 2
  • 3. The AutoSupport Family The Foundation of NetApp Support Strategies ¡  Catch issues before they become critical ¡  Secure automated “call-home” service ¡  System monitoring and nonintrusive alerting ¡  RMA requests without customer action ¡  Enables faster incident management “My AutoSupport Upgrade Advisor tool does all the hard work for me, saving me 4 to 5 hours of work per storage system and providing an upgrade plan that’s complete and easy to follow.” 3
  • 4. AutoSupport – Why Does it Matter? Customers Partners NetApp Product Adoption & Usage Product Planning Install Base Mgmt & Development Data Mining Lead Generation Pre Sales Stickiness Measurements “What If’ Scenarios & Capacity Planning Establish Initial Call Home Deployment Measure Implementation Effectiveness Storage usage Monitoring & Billing (NAFS) Event-Based Triggers & Alerts Automated E2E Case Technical Automated Case Creation Handling Support Automated… …Parts & Support Dispatch SAM Services: 1) Proactive Health Checks 2) Upgrade Planning Proactive Planning & Storage Efficiency Measurements & Recommendations Optimization PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning Critical to Quality Metrics Product Adoption & Usage Metrics Feedback Quality & Reliability Metrics NetApp Confidential – Limited Use 4
  • 5. Business Challenges Gateways ETL Data Warehouse Reporting •  Only 5% of data goes into the •  600K ASUPs •  Data needs to •  Numerous mining data warehouse, rest every week be parsed and requests are not satisfied unstructured. It’s growing loaded in 15 currently •  40% coming over 6-8TB per month the weekend mins •  Huge untapped potential •  Oracle DBMS struggling to of valuable information for •  .5% growth week scale, maintenance and lead generation, over week backups challenging supportability, and BI •  No easy way to access this unstructured content Finally, the incoming load doubles every 16 months! NetApp Confidential – Limited Use 5
  • 6. Incoming AutoSupport Volumes and TB Consumption 6,000 Actual (tb) Projected 5,000 Double High Count & Size Low Count & Size 4,000 3,000 2,000 1,000 0 Jan-00 Jan-01 Jan-02 Jan-03 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-15 Jan-16 Jan-17 Jan-04 Jan-14 ¡  At projected current rate of growth, total storage requirements continue doubling every 16 months ¡  Cost Model: > $15M per year Ecosystem costs NetApp Confidential – Limited Use 6
  • 7. New Functionality Needed Weeks Product Analysis Service Cross Sell & Performance Up Sell Planning Customer Intelligence Sales License Management Proactive Support Customer Product Self Service Development Seconds Gigabytes Petabytes 7
  • 9. Hadoop Architecture Ingest F Ingest HDFS Ingest Lookup l ASUP u Logs, m Config R e Performance Tools and raw config Data E S T Subscribe MapReduce Pig Analyze Metrics, Analytics, EBI 9
  • 11. Data Ingestion ¡  Use of Flume (v1) to consume large XML objects up to 20 MB compressed ea. ¡  4 agents feed 2 collectors in production ¡  Basic Process Control using supervisord (ZK in R2?) ¡  Reliability Mode: Disk Failover (Store on Failure) ¡  Separate sinks for Text and Binary sections ¡  Arrival time bucketing by minute ¡  Snappy Sequence Files with JSON values ¡  Evaluating Flume NG ¡  Ingesting 4.5 TB uncompressed/week 80% in an 8 hour window
  • 12. Data Transformation ¡  Ingested data processed every 1 min. (w/ 5 min. lag) –  Relies on Fair Scheduler to meet SLA –  Oozie (R0) -> Pentaho PDI (R1) for scheduling ¡  Configuration data written to HBase using Avro ¡  Duplicate data written to HDFS as Hive / JSON for ad hoc queries ¡  User scans of HBase for ad hoc queries avoided to meet SLA ¡  Also simplifies data access –  query tools don’t yet have support for Avro serialization in HBase –  they all assume String keys and values (evolving to support Avro)
  • 13. Low Latency Application Data Access ¡  High performance REST lookups ¡  Data stored as Avro serialized objects for performance and versioning ¡  Solr used to search for objects (one core per region) ¡  Then details pulled from HBase ¡  Large objects (logs) indexed and pulled from HDFS ¡  ~100 HBase regions (500 GB ea.) –  no splitting –  Snappy compressed tables ¡  Future: HBase coprocessors to keep Solr indexes up to date
  • 14. Export to Oracle DSS ¡  Pentaho pulls data from HBase and HDFS ¡  Pushes into Oracle star schema ¡  Daily export –  530 million rows and 350 GB on peak days ¡  Runs on 2 VMs –  64 GB RAM, 12 cores ¡  Enables existing BI tools (OBIE) to query DSS database
  • 15. Disaster Recovery ¡  DR cluster with 75% of production capacity –  in Release 2 ¡  Active/active from Flume back –  Primary cluster the one HTTP/SMTP responder ¡  SLA: cannot lose >1 hour of data –  can be lost in front-end switchover ¡  HBase incremental backups ¡  Staging used frequently for engineering test, operationally expensive so not used for DR
  • 16. NetApp Open Solution for Hadoop (NOSH) 16
  • 17. HDFS Storage: Key Needs Attribute Key Drivers Requirement Performance •  Fast response time for •  Minimize Network bottlenecks search, ad-hoc, and real- •  Optimize server workload time queries •  Leverage storage HW to •  High replication counts increase cluster performance impact throughput Opex •  Lower operational costs for •  Optimize usable storage managing huge amounts of capacity data •  Decouple storage from •  Controlling staff costs and compute nodes to decrease cluster management costs the need to add more as clusters scale compute nodes Enterprise •  Protect SPOF at the •  Protect cluster metadata from Robustness Hadoop name node SPOF •  Minimize cluster rebuild •  Minimize risks where equipment tends to fail NetApp Confidential – Limited Use 17
  • 18. NetApp Open Solution for Hadoop NFS over 1GbE HDFS ¡  Easy to Deploy, Manage and Scale 10GbE NameNode ¡  Uses High Performance storage FAS2040 –  Resilient and Compact Secondary –  RAID Protection of Data NameNode –  Less Network Congestion ¡  Raw Capacity and density Map –  120TB or 180TB in 4U Reduce DataNodes / –  Fully serviceable storage system TaskTracker 4 separate shared JobTracker : ¡  Reliability nothing partitions per datanode –  Hardware RAID & hot swap prevent job restart due to node go off-line in case of media failure E2660 DataNodes / –  Reliable metadata (Name Node) TaskTracker 6Gb/s SAS Direct Connect (1 per DataNode) Enterprise Class Hadoop 10GbE Links (1 per Node) NetApp Confidential – Limited Use 18
  • 20. Linear Throughput Scaling as DataNode Count Increases Read/Write Throughput 6000 Tot Read Throughput (MB/s) 5000 Tot Write Throughput (MB/s) 4000 Throughput 3000 2000 1000 0 4 8 12 24 DataNodes per Configuration Tested NetApp Confidential – Limited Use 20
  • 21. Summary 21
  • 22. Takeaways ¡  Hadoop-based Big Data architecture enables –  Cost effective scaling –  Low latency access to data –  Ad hoc issues & pattern detection –  Predictive modeling in future ¡  Using our own innovative Hadoop storage technology NOSH ¡  An enterprise transformation 22
  • 23. ¡  Kumar Palaniappan @megamda © 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, and Go further, faster, are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.