Submit Search
Upload
Data Warehouse Offload
•
Download as PPTX, PDF
•
2 likes
•
1,437 views
John Berns
Follow
Presented at BigData.SG, October 2013
Read less
Read more
Technology
Business
Report
Share
Report
Share
1 of 19
Download now
Recommended
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Never late again! Job-Level deadline SLOs in YARN
Never late again! Job-Level deadline SLOs in YARN
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
Recommended
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Never late again! Job-Level deadline SLOs in YARN
Never late again! Job-Level deadline SLOs in YARN
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
skahler
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
skahler
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Presto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
EMR AWS Demo
EMR AWS Demo
Rim Moussa
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Edelweiss Kammermann
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ
More Related Content
What's hot
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
skahler
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
skahler
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Presto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
EMR AWS Demo
EMR AWS Demo
Rim Moussa
What's hot
(20)
Serverless data pipelines gcp
Serverless data pipelines gcp
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Presto: SQL-on-anything
Presto: SQL-on-anything
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
EMR AWS Demo
EMR AWS Demo
Viewers also liked
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Edelweiss Kammermann
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ
SolidQ SSIS Framework
SolidQ SSIS Framework
SolidQ
Webinar: Oracle Data Integrator 12c (25-02-2015)
Webinar: Oracle Data Integrator 12c (25-02-2015)
avanttic Consultoría Tecnológica
1. limpieza y transformación de datos
1. limpieza y transformación de datos
Miguel Murillo
Management in Informatica Power Center
Management in Informatica Power Center
Edureka!
Principios de diseño para procesos de ETL
Principios de diseño para procesos de ETL
SpanishPASSVC
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_framework
Bharat Vadlamudi
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
Big Data Architecture
Big Data Architecture
Guido Schmutz
Viewers also liked
(10)
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ SSIS Framework
SolidQ SSIS Framework
Webinar: Oracle Data Integrator 12c (25-02-2015)
Webinar: Oracle Data Integrator 12c (25-02-2015)
1. limpieza y transformación de datos
1. limpieza y transformación de datos
Management in Informatica Power Center
Management in Informatica Power Center
Principios de diseño para procesos de ETL
Principios de diseño para procesos de ETL
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_framework
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Big Data Architecture
Big Data Architecture
Similar to Data Warehouse Offload
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Yousun Jeong
Big data at United Airlines
Big data at United Airlines
DataWorks Summit
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
DataWorks Summit
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
KeithETD_CTO
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Yousun Jeong
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
SPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seeling Cheung
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
avanttic Consultoría Tecnológica
Greenplum feature
Greenplum feature
Ahmad Yani Emrizal
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Dancing with the Elephant
Dancing with the Elephant
DataWorks Summit
Similar to Data Warehouse Offload
(20)
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Big data at United Airlines
Big data at United Airlines
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
SPL_ALL_EN.pptx
SPL_ALL_EN.pptx
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
Greenplum feature
Greenplum feature
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
Dancing with the Elephant
Dancing with the Elephant
Recently uploaded
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
apidays
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Overkill Security
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Deepika Singh
Recently uploaded
(20)
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Data Warehouse Offload
1.
1©MapR Technologies -
Confidential Data Warehouse Offload (ETL and ELT and Preprocessing, Oh My!)
2.
2©MapR Technologies -
Confidential Introduce Myself John Berns, Solutions Architect, APAC for MapR I’ve been involed in Big Data for three years, using Hadoop for two. (I go waaaaay back!) I’m also co-founder of BigData.SG and Hadoop.SG http://bigdata.sg http://hadoop.sg I’m a Hadoop nerd—and proud of it.
3.
3©MapR Technologies -
Confidential Traditional Data Warehouse
4.
4©MapR Technologies -
Confidential Arrival of Big Data impacts DW BIG DATA Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs DW needs to accommodate Big Data
5.
5©MapR Technologies -
Confidential Scaling the Data Warehouse-MPP Databases
6.
6©MapR Technologies -
Confidential But There Are Some Problems Scaling Cost – Data Warehouse costs $$$,000’s per terabyte Works only on relational data; doesn’t like unstructured data Fixed schema—you can only query the data in ways that are predefined by the existing schema
7.
7©MapR Technologies -
Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
8.
8©MapR Technologies -
Confidential Data Warehouse Meets Big Data Use ELT to handle semi-structured (or even unstructured) data ELT applies structure after the data is loaded Use compute power to do the transformation Can be done in parallel—that’s what Hadoop is good for! ELT for ETL – process semi-structured data & save structured data Connect via ODBC or JDBC and execute queries on the fly
9.
9©MapR Technologies -
Confidential ELT: Applying Schema on Load CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s" ) STORED AS TEXTFILE;
10.
10©MapR Technologies -
Confidential Read Semi-Structured Data & CreateStructure 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 host 127.0.0.1 identity 1001 user frank time 10/Oct/2000:13:55:36 -0700 request GET /apache_pb.gif HTTP/1.0 status 200 size 2326
11.
11©MapR Technologies -
Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
12.
12©MapR Technologies -
Confidential MapR Strengths for DW Offload Best ROI • 2x Performance • No custom connectors • Unlimited scale Easiest Integration • Works with existing tools • Streaming ingestion and extraction Enterprise Grade Platform • 99.999% HA • Full data protection • Disaster recovery
13.
13©MapR Technologies -
Confidential MapR Customer Case Study Teradata Teradata OLD NEW • All ETL steps done in Teradata • Cost prohibitive scaling • Data warehouse team not able to handle new data formats • Replaced 5 out of 7 ETL steps • Only hot data is stored in EDW • Existing applications not affected • Extensively leverage NFS to directly ingest data into Teradata Large Telecom Company Deployed Billing applications using Teradata Hundreds of users and applications across the enterprise Hadoop
14.
14©MapR Technologies -
Confidential Lots of Data Lots of Scans Across Large Sets Throughput Important Data ShapeTelecom
15.
15©MapR Technologies -
Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
16.
16©MapR Technologies -
Confidential ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
17.
17©MapR Technologies -
Confidential Price Performance EDW strategy –1.5x performance –$30 million MapR Strategy –3x performance –$3 million 20x cost/performance advantage for MapR strategy
18.
18©MapR Technologies -
Confidential Business Impact: Saved $30M in 5 year TCO Able to store all data and have a scalable architecture for future Do not have to maintain any special connectors A happy Ops team enhancing services for its internal customers with MapReduce Implemented the change without impacting internal users MapR Customer Case Study continued
19.
19©MapR Technologies -
Confidential Wrapping It Up… My contact info: jberns@maprtech.com http://www.linkedin.com/in/jfxberns Find the slides at: http://www.slideshare.net Whitepaper with mode details on Data Warehouse Offload: http://www.mapr.com/solutions/data-warehouse-offload
Editor's Notes
----- Meeting Notes (3/22/13 11:57) -----Add a before and afterbroader data sources…. data
Download now