SlideShare uma empresa Scribd logo
1 de 20
Hadoop in the Enterprise:
Legacy Rides the Elephant


  Dr. Phil Shelley
  CTO Sears Holdings
  Founder and CEO MetaScale
Hadoop has
changed the
enterprise
big data
game.

Are you
languishing
in the past
or adopting
outdated
trends?       Legacy rides the elephant!
                                     Page 2
Why Hadoop and Why Now?
THE ADVANTAGES:
Cost reduction
Alleviate performance bottlenecks
ETL too expensive and complex
Mainframe and Data Warehouse processing à Hadoop

THE CHALLENGE:
Traditional enterprises lack of awareness

THE SOLUTION:
Leverage the growing support system for Hadoop
Make Hadoop the data hub in the Enterprise
Use Hadoop for processing batch and analytic jobs


                                                    Page 3
The Classic Enterprise Challenge

                                   Growing Data
                                     Volumes


                                                  Shortened
                        Tight IT                  Processing
                        Budgets
                                                   Windows




           Latency in                The                        Escalating
             Data                  Challenge                      Costs




                                                    Hitting
                      ETL                         Scalability
                    Complexity
                                                   Ceilings

                                    Demanding
                                     Business
                                   Requirements




                                                                             Page 4
The Sears Holdings Approach
 Key to our Approach:
 1)  allowing users to continue to use familiar consumption interfaces
 2)  providing inherent HA
 3)  enabling businesses to unlock previously unusable data


          1               2               3               4                5                6
                                                                      Move results    Retain, within
    Implement a         Move
                                                        Massively          and          Hadoop,
       Hadoop-        enterprise    Make Hadoop
                                                     reduce ETL by     aggregates     source files at
        centric         batch        the single
                                                      transforming   back to legacy     the finest
      reference     processing to   point of truth
                                                     within Hadoop     systems for    granularity for
     architecture      Hadoop
                                                                      consumption         re-use




                                                                                                Page 5
The Architecture

  •  Enterprise solutions using Hadoop must be an
     eco-system

  •  Large companies have a complex environment:
     –  Transactional system
     –  Services
     –  EDW and Data marts
     –  Reporting tools and needs

  •  We needed to build an entire solution



                                                    Page 6
The Sears Holdings Architecture




                                  Page 7
The Learning
  Over two years of Hadoop experience using Hadoop for Enterprise legacy workload.

                 ü  We can dramatically reduce batch processing times for mainframe and EDW
HADOOP




                 ü  We can retain and analyze data at a much more granular level, with longer history
                 ü  Hadoop must be part of an overall solution and eco-system
IMPLEMENTATION




                 ü  We can reliably meet our production deliverable time-windows by using Hadoop
                 ü  We can largely eliminate the use of traditional ETL tools
                 ü  New Tools allow improved user experience on very large data sets
UNIQUE VALUE




                 ü  We developed tools and skills – The learning curve is not to be underestimated
                 ü  We developed experience in moving workload from expensive, proprietary mainframe
                     and EDW platforms to Hadoop with spectacular results




                                                                                                  Page 8
Some Examples
Use-Cases at Sears Holdings
The Challenge – Use-Case #1

                                         Sales:
                                                                         Price
                                          8.9B
                                                                         Sync:
                                          Line       Elasticity:
                             Offers:                                     Daily
                                         Items         12.6B
                              1.4B
                             SKUs                   Parameters



                                         Items:                    Stores:
                               Timing:   11.3M                      3200
                                          SKUs      Inventory:      Sites
                               Weekly
                                                    1.8B rows




 •  Intensive computational and large storage requirements

 •  Needed to calculate item price elasticity based on 8 billion rows of sales data

 •  Could only be run quarterly and on subset of data – Needed more often

 •  Business need - React to market conditions and new product launches


                                                                                 Page 10
The Result – Use-Case #1
Business Problem:                                  Sales:
                                                                                            Price
                                                    8.9B
                                                                                            Sync:
                                                    Line             Elasticity:
 •    Intensive computational       Offers:                                                 Daily
                                                   Items               12.6B
      and large storage              1.4B
                                    SKUs                            Parameters
      requirements

 •    Needed to calculate
                                                   Items:                             Stores:
      store-item price                             11.3M                               3200
                                      Timing:
      elasticity based on 8                         SKUs            Inventory:         Sites
                                      Weekly
      billion rows of sales                                         1.8B rows
      data

 •    Could only be run
      quarterly and on subset
      of data
                                                        Hadoop
 •    Business missing the
      opportunity to react to
      changing market
      conditions and new
      product launches
                                Price elasticity     New business            100% of data
                                  calculated           capability              set and              Meets all SLAs
                                    weekly              enabled               granularity




                                                                                                               Page 11
The Challenge – Use-Case #2
                                                          Mainframe
                                  Data                    Scalability:
                                Sources:                   Unable to      Mainframe:
                                  30+                      Scale 100       100 MIPS
                                                Input        fold          on 1% of
                                              Records:
                                                                             data
                                               Billions




                                                   Hadoop



 •  Mainframe batch business process would not scale
 •  Needed to process 100 times more detail to handle business critical functionality
 •  Business need required processing billions of records from 30 input data sources
 •  Complex business logic and financial calculations
 •  SLA for this cyclic process was 2 hours per run


                                                                                        Page 12
The Result – Use-Case #2
                                                                       Mainframe
Business Problem:                     Data                             Scalability:
                                                                        Unable to
                                    Sources:                                                Mainframe:
                                      30+                               Scale 100            100 MIPS
 •  Mainframe batch                                      Input            fold               on 1% of
    business process would                             Records:
                                                                                               data
    not scale                                           Billions


 •  Needed to process 100
    times more detail to
    handle rollout of high                                  Hadoop
    value business critical
    functionality

 •  Time sensitive business
    need required processing
    billions of records from
    30 input data sources
                                   Teradata &            Implemented             JAVA UDFs for            Scalable
                                 Mainframe Data             PIG for                 financial            Solution in 8
 •  Complex business logic
                                  on Hadoop               Processing              calculations             Weeks
    and financial calculations

 •  SLA for this cyclic
    process was 2 hours per                                                                   6000 Lines
                                               Processing Met      $600K Annual
    run                                                                                     Reduced to 400
                                                Tighter SLA          Savings
                                                                                             Lines of PIG




                                                                                                                    Page 13
The Challenge – Use-Case #3

                                          Data
                                        Storage:
                                       Mainframe
                                       DB2 Tables

                              Price
                                                Processing
                              Data:
                                                 Window:     Mainframe
                              500M
                                                 3.5 Hours    Jobs: 64
                             Records




                                           Hadoop



 Mainframe unable to meet SLAs on growing data volume




                                                                         Page 14
The Result – Use-Case #3

Business Problem:
                                                     Data
                                                   Storage:
Mainframe unable to meet                          Mainframe
                                                  DB2 Tables
SLAs on growing data volume
                                         Price
                                                           Processing
                                         Data:
                                                            Window:          Mainframe
                                         500M
                                                            3.5 Hours         Jobs: 64
                                        Records




                                                      Hadoop


                                                   Job Runs Over                            Maintenance
                              Source Data in       100% faster –        $100K in Annual   Improvement –
                                 Hadoop              Now in 1.5            Savings         <50 Lines PIG
                                                       hours                                   code




                                                                                                     Page 15
The Challenge – Use-Case #4
                                      Teradata via
                                                         Transformation:
                                       Business
                                                           On Teradata                  User
                                        Objects
                                                                                     Experience:
                                                                                    Unacceptable

                                      Batch
                                                      History
                                    Processing
                                                     Retained:              New Report
                                    Output: .CS
                                                        No                 Development:
                                      V Files
                                                                               Slow




                                                     Hadoop



•  Needed to enhance user experience and ability to perform analytics at granular data
•  Restricted availability of data due to space constraint
•  Needed to retain granular data
•  Needed Excel format interaction on data sources of 100 millions of records with agility


                                                                                                   Page 16
The Result – Use-Case #4
Business Problem:                                  Teradata via
                                                                          Transformation:
                                                    Business
                                                                            On Teradata                  User
                                                     Objects
                                                                                                      Experience:
  •  Needed to enhance user                                                                          Unacceptable
     experience and ability to
                                                   Batch
     perform analytics at                        Processing
                                                                       History
     granular data                                                    Retained:              New Report
                                                 Output: .CS
                                                                         No                 Development:
                                                   V Files
                                                                                                Slow
  •  Restricted availability of
     data due to space
     constraint

  •  Needed to retain granular
                                                                    Hadoop
     data

  •  Needed Excel format
     interaction on data
     sources of 100 millions of
     records with agility                                                                                            User
                                         Sourcing Data            Redundant             Transformation
                                           Directly to                                                            Experience
                                                                   Storage                 Moved to
                                            Hadoop                                                               Expectations
                                                                  Eliminated               Hadoop
                                                                                                                     Met

                                                Over 50 Data                                                          Business’s
                          Datameer for                                 PIG Scripts to
                                                  Sources                                     Granular History       Single Source
                           Additional                                   Ease Code
                                                 Retained in                                     Retained               of Truth
                           Analytics                                   Maintenance
                                                  Hadoop


                                                                                                                            Page 17
Summary

•    Hadoop can handle Enterprise workload
•    Can reduce strain on legacy platforms
•    Can reduce cost
•    Can bring new business opportunities

•  Must be an eco-system
•  Must be part of an data overall strategy
•  Not to be underestimated
                                              Page 18
The Horizon – What do we need next
  •  Automation tools and techniques that ease the
     Enterprise integration of Hadoop

  •  Educate traditional Enterprise IT organizations
     about the possibilities and reasons to deploy
     Hadoop

  •  Continue development of a reusable framework
     for legacy workload migration


                                                       Page 19
For more information, visit:




                               www.metascale.com
                               Follow us on Twitter @BigDataMadeEasy


                               Join us on LinkedIn: www.linkedin.com/company/metascale-llc




                                                                                    Page 20

Mais conteúdo relacionado

Mais procurados

IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM India Smarter Computing
 
Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...IBM India Smarter Computing
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopDataWorks Summit
 
Commonwealth Bank of Australia's Private Cloud Implementation
Commonwealth Bank of Australia's Private Cloud ImplementationCommonwealth Bank of Australia's Private Cloud Implementation
Commonwealth Bank of Australia's Private Cloud ImplementationVishal Sharma
 
Maximize IT for Real Business Advantage
Maximize IT for Real Business AdvantageMaximize IT for Real Business Advantage
Maximize IT for Real Business AdvantageHitachi Vantara
 
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools UpdateDB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools UpdateBaha Majid
 
A Passion for Manufacturing
A Passion for ManufacturingA Passion for Manufacturing
A Passion for ManufacturingWebseology
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Hitachi Vantara
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorDaniel Martin
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Whats New in Postgres 12
Whats New in Postgres 12Whats New in Postgres 12
Whats New in Postgres 12EDB
 
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudIBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudDaniel Martin
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 
HDT for Mainframe Considerations: Simplified Tiered Storage
HDT for Mainframe Considerations: Simplified Tiered StorageHDT for Mainframe Considerations: Simplified Tiered Storage
HDT for Mainframe Considerations: Simplified Tiered StorageHitachi Vantara
 

Mais procurados (20)

IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
 
Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...Voith increases performance and saves license and maintenance costs by introd...
Voith increases performance and saves license and maintenance costs by introd...
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with Hadoop
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Commonwealth Bank of Australia's Private Cloud Implementation
Commonwealth Bank of Australia's Private Cloud ImplementationCommonwealth Bank of Australia's Private Cloud Implementation
Commonwealth Bank of Australia's Private Cloud Implementation
 
Maximize IT for Real Business Advantage
Maximize IT for Real Business AdvantageMaximize IT for Real Business Advantage
Maximize IT for Real Business Advantage
 
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools UpdateDB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
 
A Passion for Manufacturing
A Passion for ManufacturingA Passion for Manufacturing
A Passion for Manufacturing
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Whats New in Postgres 12
Whats New in Postgres 12Whats New in Postgres 12
Whats New in Postgres 12
 
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudIBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
Datacenter 2014: HP - Brian Andersen
Datacenter 2014: HP - Brian AndersenDatacenter 2014: HP - Brian Andersen
Datacenter 2014: HP - Brian Andersen
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
HDT for Mainframe Considerations: Simplified Tiered Storage
HDT for Mainframe Considerations: Simplified Tiered StorageHDT for Mainframe Considerations: Simplified Tiered Storage
HDT for Mainframe Considerations: Simplified Tiered Storage
 

Destaque

The 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and valueThe 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and valueDataWorks Summit
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopDataWorks Summit
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data EraBig Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data EraDataWorks Summit
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destaque (12)

The 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and valueThe 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and value
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data EraBig Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Customer Journey Analytics and Big Data
Customer Journey Analytics and Big DataCustomer Journey Analytics and Big Data
Customer Journey Analytics and Big Data
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a Hadoop in the Enterprise: Legacy Rides the Elephant

Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPCNetApp
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesDataWorks Summit
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Advancing the Traditional Enterprise: An EA Story
Advancing the Traditional Enterprise: An EA Story Advancing the Traditional Enterprise: An EA Story
Advancing the Traditional Enterprise: An EA Story InnoTech
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
BI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladuBI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladuOKsystem
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data QualityZavalaJV
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAmazon Web Services
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Scale Presentation 3 3 09
Scale Presentation 3 3 09Scale Presentation 3 3 09
Scale Presentation 3 3 09sdewall
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Odinot Stanislas
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarPlatfora
 

Semelhante a Hadoop in the Enterprise: Legacy Rides the Elephant (20)

Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPC
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive Technologies
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Advancing the Traditional Enterprise: An EA Story
Advancing the Traditional Enterprise: An EA Story Advancing the Traditional Enterprise: An EA Story
Advancing the Traditional Enterprise: An EA Story
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
BI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladuBI Forum 2009 - Principy architektury MPP datového skladu
BI Forum 2009 - Principy architektury MPP datového skladu
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data Quality
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Scale Presentation 3 3 09
Scale Presentation 3 3 09Scale Presentation 3 3 09
Scale Presentation 3 3 09
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Hadoop in the Enterprise: Legacy Rides the Elephant

  • 1. Hadoop in the Enterprise: Legacy Rides the Elephant Dr. Phil Shelley CTO Sears Holdings Founder and CEO MetaScale
  • 2. Hadoop has changed the enterprise big data game. Are you languishing in the past or adopting outdated trends? Legacy rides the elephant! Page 2
  • 3. Why Hadoop and Why Now? THE ADVANTAGES: Cost reduction Alleviate performance bottlenecks ETL too expensive and complex Mainframe and Data Warehouse processing à Hadoop THE CHALLENGE: Traditional enterprises lack of awareness THE SOLUTION: Leverage the growing support system for Hadoop Make Hadoop the data hub in the Enterprise Use Hadoop for processing batch and analytic jobs Page 3
  • 4. The Classic Enterprise Challenge Growing Data Volumes Shortened Tight IT Processing Budgets Windows Latency in The Escalating Data Challenge Costs Hitting ETL Scalability Complexity Ceilings Demanding Business Requirements Page 4
  • 5. The Sears Holdings Approach Key to our Approach: 1)  allowing users to continue to use familiar consumption interfaces 2)  providing inherent HA 3)  enabling businesses to unlock previously unusable data 1 2 3 4 5 6 Move results Retain, within Implement a Move Massively and Hadoop, Hadoop- enterprise Make Hadoop reduce ETL by aggregates source files at centric batch the single transforming back to legacy the finest reference processing to point of truth within Hadoop systems for granularity for architecture Hadoop consumption re-use Page 5
  • 6. The Architecture •  Enterprise solutions using Hadoop must be an eco-system •  Large companies have a complex environment: –  Transactional system –  Services –  EDW and Data marts –  Reporting tools and needs •  We needed to build an entire solution Page 6
  • 7. The Sears Holdings Architecture Page 7
  • 8. The Learning Over two years of Hadoop experience using Hadoop for Enterprise legacy workload. ü  We can dramatically reduce batch processing times for mainframe and EDW HADOOP ü  We can retain and analyze data at a much more granular level, with longer history ü  Hadoop must be part of an overall solution and eco-system IMPLEMENTATION ü  We can reliably meet our production deliverable time-windows by using Hadoop ü  We can largely eliminate the use of traditional ETL tools ü  New Tools allow improved user experience on very large data sets UNIQUE VALUE ü  We developed tools and skills – The learning curve is not to be underestimated ü  We developed experience in moving workload from expensive, proprietary mainframe and EDW platforms to Hadoop with spectacular results Page 8
  • 9. Some Examples Use-Cases at Sears Holdings
  • 10. The Challenge – Use-Case #1 Sales: Price 8.9B Sync: Line Elasticity: Offers: Daily Items 12.6B 1.4B SKUs Parameters Items: Stores: Timing: 11.3M 3200 SKUs Inventory: Sites Weekly 1.8B rows •  Intensive computational and large storage requirements •  Needed to calculate item price elasticity based on 8 billion rows of sales data •  Could only be run quarterly and on subset of data – Needed more often •  Business need - React to market conditions and new product launches Page 10
  • 11. The Result – Use-Case #1 Business Problem: Sales: Price 8.9B Sync: Line Elasticity: •  Intensive computational Offers: Daily Items 12.6B and large storage 1.4B SKUs Parameters requirements •  Needed to calculate Items: Stores: store-item price 11.3M 3200 Timing: elasticity based on 8 SKUs Inventory: Sites Weekly billion rows of sales 1.8B rows data •  Could only be run quarterly and on subset of data Hadoop •  Business missing the opportunity to react to changing market conditions and new product launches Price elasticity New business 100% of data calculated capability set and Meets all SLAs weekly enabled granularity Page 11
  • 12. The Challenge – Use-Case #2 Mainframe Data Scalability: Sources: Unable to Mainframe: 30+ Scale 100 100 MIPS Input fold on 1% of Records: data Billions Hadoop •  Mainframe batch business process would not scale •  Needed to process 100 times more detail to handle business critical functionality •  Business need required processing billions of records from 30 input data sources •  Complex business logic and financial calculations •  SLA for this cyclic process was 2 hours per run Page 12
  • 13. The Result – Use-Case #2 Mainframe Business Problem: Data Scalability: Unable to Sources: Mainframe: 30+ Scale 100 100 MIPS •  Mainframe batch Input fold on 1% of business process would Records: data not scale Billions •  Needed to process 100 times more detail to handle rollout of high Hadoop value business critical functionality •  Time sensitive business need required processing billions of records from 30 input data sources Teradata & Implemented JAVA UDFs for Scalable Mainframe Data PIG for financial Solution in 8 •  Complex business logic on Hadoop Processing calculations Weeks and financial calculations •  SLA for this cyclic process was 2 hours per 6000 Lines Processing Met $600K Annual run Reduced to 400 Tighter SLA Savings Lines of PIG Page 13
  • 14. The Challenge – Use-Case #3 Data Storage: Mainframe DB2 Tables Price Processing Data: Window: Mainframe 500M 3.5 Hours Jobs: 64 Records Hadoop Mainframe unable to meet SLAs on growing data volume Page 14
  • 15. The Result – Use-Case #3 Business Problem: Data Storage: Mainframe unable to meet Mainframe DB2 Tables SLAs on growing data volume Price Processing Data: Window: Mainframe 500M 3.5 Hours Jobs: 64 Records Hadoop Job Runs Over Maintenance Source Data in 100% faster – $100K in Annual Improvement – Hadoop Now in 1.5 Savings <50 Lines PIG hours code Page 15
  • 16. The Challenge – Use-Case #4 Teradata via Transformation: Business On Teradata User Objects Experience: Unacceptable Batch History Processing Retained: New Report Output: .CS No Development: V Files Slow Hadoop •  Needed to enhance user experience and ability to perform analytics at granular data •  Restricted availability of data due to space constraint •  Needed to retain granular data •  Needed Excel format interaction on data sources of 100 millions of records with agility Page 16
  • 17. The Result – Use-Case #4 Business Problem: Teradata via Transformation: Business On Teradata User Objects Experience: •  Needed to enhance user Unacceptable experience and ability to Batch perform analytics at Processing History granular data Retained: New Report Output: .CS No Development: V Files Slow •  Restricted availability of data due to space constraint •  Needed to retain granular Hadoop data •  Needed Excel format interaction on data sources of 100 millions of records with agility User Sourcing Data Redundant Transformation Directly to Experience Storage Moved to Hadoop Expectations Eliminated Hadoop Met Over 50 Data Business’s Datameer for PIG Scripts to Sources Granular History Single Source Additional Ease Code Retained in Retained of Truth Analytics Maintenance Hadoop Page 17
  • 18. Summary •  Hadoop can handle Enterprise workload •  Can reduce strain on legacy platforms •  Can reduce cost •  Can bring new business opportunities •  Must be an eco-system •  Must be part of an data overall strategy •  Not to be underestimated Page 18
  • 19. The Horizon – What do we need next •  Automation tools and techniques that ease the Enterprise integration of Hadoop •  Educate traditional Enterprise IT organizations about the possibilities and reasons to deploy Hadoop •  Continue development of a reusable framework for legacy workload migration Page 19
  • 20. For more information, visit: www.metascale.com Follow us on Twitter @BigDataMadeEasy Join us on LinkedIn: www.linkedin.com/company/metascale-llc Page 20