The Hidden Value of Hadoop Migration

Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.

  1. 1. Modernize Your Data Analytics Architecture with a Unified Approach to Data + AI Anand Venugopal Global Leader - Industry Solutions (Migrations) Databricks
  2. 2. Topics Why migrate from Hadoop to Databricks ? Success stories, technical and business benefits How can you migrate fast with low costs & low risk ?
  3. 3. Legacy On-Prem Analytics Architectures Are Not Keeping Up Hadoop costs rising when costs need to be cut Innovation hinges on ML and predictive insights Business agility requires real-time data This is preventing teams from driving high-impact business outcomes
  4. 4. Why Migrate to Databricks ? Forrester study finds 417% ROI for companies switching to Databricks 47% Cost-savings from retiring legacy infrastructure 5% Increase in revenue 25% Data team productivity increase
  5. 5. DEVOPS INTENSIVE RIGID AND INELASTIC Hadoop is Costly, Complex and Ineffective Hadoop ecosystem is complex and hard to manage that is prone to failures Low Productivity 24/7 HDFS clusters that need to be built for peak use and costly to upgrade Cost Prohibitive LACKS AI CAPABILITIES No out-of-box Hadoop support for ML/AI and separate environments for data and AI Slow Innovation X
  6. 6. Enterprises Need a Modern Data Analytics Architecture CRITICAL REQUIREMENTS Cost-effective scale and performance in the cloud Easy to manage and highly reliable for diverse data Predictive and real-time insights to drive innovation
  7. 7. Enhanced Productivity Lower Cost at Scale New Insights Faster Building a Modern Cloud Analytics Architecture with Databricks Data Science Workspace EASY TO MANAGE MASSIVE SCALE AI-ENABLED INNOVATION Managed cloud platform that can reliably handle all types of data On-demand, elastic autoscale clusters with optimized Apache Spark Unified and collaborative notebooks with built-in ML capabilities
  8. 8. Databricks Unified Data Analytics High performance query engineDELTA ENGINE One platform for every use caseStreaming Analytics BI Data Science Machine Learning Data Lake for all your data Structured, Semi-Structured and Unstructured Data Structured transactional layer
  9. 9. Powering Innovation with Modern Data Analytics Customers that migrated from Hadoop
  10. 10. Business value: What did they do with us? “The Un-carrier strategy is an approach that seeks to listen to the customer, address their pain points, bring innovation to the industry and improve the wireless experience for all.” Situation ○ Every network interaction (call, website load, text, app) logged in 1,600 node HDP data lake (30PB). ○ 4-5 “large scale” pipelines, with hundreds of downstream pipelines feeding the business ● PCMD (network measurement data), CDR (call records), EDR (DNS (website)), LSR (Location) ○ Process call data to get critical network insights: call-failure reasons and network outages. ○ PCMD – Per Call Measurement Data ● Provides insights on call failures at a granular level ● Best source to determine the outage cause and effect ● Provides rich information about the Sprint customers roaming in T-Mobile network
  11. 11. Solution: Holistic transformation instead of ‘lift & shift’ Overview ● Migration and transformation of streaming data analytics from Apache Storm and Hive on Hortonworks to Azure Databricks ● The Data was streaming in at an average of 2M records per second, 375GB per batch, 23 TB per day (uncompressed) Results Accelerating key insights e.g. hourly dashboards protecting revenue and customer churn. 78.5xPerformance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35 mins for 256 cores to do the same job KPI computations took 1/4th of the time enabling new hourly dashboards (w/out optimizations e.g. warm pool and others still in process) 40%Reduction in use of 1600 node on-prem cluster
  12. 12. Supply Chain decisions Apply ML to 5000+ stores data Impact • 70% reduction in operational costs • Accelerated Business growth Demand Forecasting 500K stores, 2TB, 250 pipelines Impact • 10X more capacity • 2X faster data pipelines Predict Bakery food spoilage 10+ Large Hadoop clusters Impact • $100M in fresh food spoilage saved • $900K costs down, Time: 7 hr → 40m Optimize programming • Could not process 90 days of data with large Hadoop cluster Impact • 26% Team productivity increase • More Data, lower costs, low devops
  13. 13. Databricks Drives New Business Value at 3 Levels Databricks Value Framework The Data Platform Business Outcome More value Less value $$$ $$ $ BUSINESS IMPACTING USE CASES PRODUCTIVITY INFRASTRUCTURE Databricks accelerates and expands the realization of value from business-oriented use cases that use net-new capabilities vs. Hadoop Higher productivity among data scientists & data engineers eliminating manual tasks Reduced infrastructure spend with the performance of the Databricks runtime 3 1 2
  14. 14. $12.8M in value delivered with Databricks Value of Databricks ■ Removed Cloudera licensing ■ No need to add expensive new hardware for additional capacity ■ Avoided data center costs ■ Avoided Hadoop administration costs Cloudera costs vs. Databricks value & investment Units: $ Cumulative PV over 3 years Potential value with Databricks Cloudera - Cost of inaction Investment - Databricks, migration & cloud Net impact Includes cost of both solutions during migration $13.8 M -$18.7 M -$4.9 M $12.8 M Cloudera costs ■ Data center, Hadoop administration, new hardware, licensing Databricks investment ■ Databricks usage & support ■ Migration ■ Cloud compute Databricks customer example: Large U.S. Telco, 156 node cluster Source: Databricks value model Value of Databricks ■ Avoided Cloudera licensing ■ No need to add expensive new hardware for additional capacity ■ Avoided data center costs ■ Avoided Hadoop administration costs
  15. 15. Work with us for a Tailored Value Case for Your Migration Tailored Financial Analysis Tailored business case to be produced by answering 4 core questions: 1. How many nodes in your Hadoop environment? 2. How many people support your Hadoop environment? 3. When is your Cloudera renewal? 4. How do you expect your data needs to grow over time? Customer example
  16. 16. Proven Migration Strategy:Reduce Risk, Costs Databricks Expert Team System IntegratorsTools, ISV Partners AUTOMATION, TOOLS AND PROVEN METHODOLOGIES Cloud Partners COMPONENTS TO MIGRATE SUCCESSFUL MIGRATION Data + Metadata Workloads/ Jobs Security & governance Other tools, integrations Strategy Options: Lift & shift (faster, automatable) Transformation (higher impact)
  17. 17. Automated conversion for most workload types Data Migration Metastore Migration SQL Migration Security Scheduled Data pulls Orchestration HDFS Hive Databases / Tables / Views Impala Databases / Tables/ Views HDFS Hive Queries Spark Queries Sentry permissions /Ranger policies HDFS access permissions Sqoop statements Oozie Jobs Azure ADLS Gen 2, AWS S3 Databricks Tables Databricks Tables Spark Sql Databricks Notebooks Spark Sql Databricks Notebooks Databricks Notebooks Databricks permissions AWS IAM, ADLS ACLs Databricks compatible PySpark code Airflow DAGs & Databricks Jobs
  18. 18. Typically, customers save 55-66 % in costs and see a reduction of 2-3x in timelines by using Automation tool Data MigrationAssessment & Design Manual Migration Workloads Migration, Validation Cutover Operations 17- 20 Weeks 8 Weeks Using Automation Accelerated Data & Workloads Migration, Validation Accelerated Assessment & Design Cutover Operations * Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
  19. 19. Our Partner Ecosystem will Accelerate Migrations ISV Partners and Migration Tools Security Governance Consulting & SI Partners Databricks Migration SWAT team + CS Packaged Services For Migration Cloud
  20. 20. Customized Hadoop Migration Success Plan with a Free Expert-led Assessment 1 2 3 Pre-questionnaire + Discovery, education workshops led by experts ▪ Learn about how Databricks works and how your current workloads, tools and processes map and transform in the future state in cloud Proposal and Recommendations for path forward ▪ The expert team will summarize all the findings and walk through the proposed costs, business value summary and recommended migration plan Technical, Use-case and Business Value analysis ▪ High level current and future state architecture, discuss use-cases and prioritize them, understand how $$ value is driven with the migration
  21. 21. Databricks Experts Know Hadoop ▪ More than 100 years of combined experience in Hadoop ▪ Practitioners, Architects, Engineers, and Consultants, Open Source Contributors and Committers ▪ Expertise with all Hadoop ecosystem components and distributions IMG IMG IMG IMG IMGIMG IMG
  22. 22. Hadoop migration to Databricks - recap Why - Costs, Productivity, Innovation → Business Impact Your competitors and market leaders are doing it NOW Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk
  23. 23. Thank you Please visit databricks.com/migration