Submit Search
Upload
Data Warehouse Offload
•
Download as PPTX, PDF
•
2 likes
•
1,436 views
John Berns
Follow
Presented at BigData.SG, October 2013
Read less
Read more
Technology
Business
Report
Share
Report
Share
1 of 19
Download now
Recommended
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Never late again! Job-Level deadline SLOs in YARN
Never late again! Job-Level deadline SLOs in YARN
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
Recommended
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Never late again! Job-Level deadline SLOs in YARN
Never late again! Job-Level deadline SLOs in YARN
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
skahler
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
skahler
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Presto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
EMR AWS Demo
EMR AWS Demo
Rim Moussa
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Edelweiss Kammermann
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ
More Related Content
What's hot
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
skahler
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
skahler
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Presto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
EMR AWS Demo
EMR AWS Demo
Rim Moussa
What's hot
(20)
Serverless data pipelines gcp
Serverless data pipelines gcp
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Presto: SQL-on-anything
Presto: SQL-on-anything
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
EMR AWS Demo
EMR AWS Demo
Viewers also liked
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Edelweiss Kammermann
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ
SolidQ SSIS Framework
SolidQ SSIS Framework
SolidQ
Webinar: Oracle Data Integrator 12c (25-02-2015)
Webinar: Oracle Data Integrator 12c (25-02-2015)
avanttic Consultoría Tecnológica
1. limpieza y transformación de datos
1. limpieza y transformación de datos
Miguel Murillo
Management in Informatica Power Center
Management in Informatica Power Center
Edureka!
Principios de diseño para procesos de ETL
Principios de diseño para procesos de ETL
SpanishPASSVC
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_framework
Bharat Vadlamudi
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
Big Data Architecture
Big Data Architecture
Guido Schmutz
Viewers also liked
(10)
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ SSIS Framework
SolidQ SSIS Framework
Webinar: Oracle Data Integrator 12c (25-02-2015)
Webinar: Oracle Data Integrator 12c (25-02-2015)
1. limpieza y transformación de datos
1. limpieza y transformación de datos
Management in Informatica Power Center
Management in Informatica Power Center
Principios de diseño para procesos de ETL
Principios de diseño para procesos de ETL
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_framework
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Big Data Architecture
Big Data Architecture
Similar to Data Warehouse Offload
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Yousun Jeong
Big data at United Airlines
Big data at United Airlines
DataWorks Summit
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
DataWorks Summit
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
KeithETD_CTO
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Yousun Jeong
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
SPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seeling Cheung
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
avanttic Consultoría Tecnológica
Greenplum feature
Greenplum feature
Ahmad Yani Emrizal
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Dancing with the Elephant
Dancing with the Elephant
DataWorks Summit
Similar to Data Warehouse Offload
(20)
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Big data at United Airlines
Big data at United Airlines
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
SPL_ALL_EN.pptx
SPL_ALL_EN.pptx
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
Greenplum feature
Greenplum feature
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
Dancing with the Elephant
Dancing with the Elephant
Recently uploaded
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Paola De la Torre
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Recently uploaded
(20)
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Data Warehouse Offload
1.
1©MapR Technologies -
Confidential Data Warehouse Offload (ETL and ELT and Preprocessing, Oh My!)
2.
2©MapR Technologies -
Confidential Introduce Myself John Berns, Solutions Architect, APAC for MapR I’ve been involed in Big Data for three years, using Hadoop for two. (I go waaaaay back!) I’m also co-founder of BigData.SG and Hadoop.SG http://bigdata.sg http://hadoop.sg I’m a Hadoop nerd—and proud of it.
3.
3©MapR Technologies -
Confidential Traditional Data Warehouse
4.
4©MapR Technologies -
Confidential Arrival of Big Data impacts DW BIG DATA Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs DW needs to accommodate Big Data
5.
5©MapR Technologies -
Confidential Scaling the Data Warehouse-MPP Databases
6.
6©MapR Technologies -
Confidential But There Are Some Problems Scaling Cost – Data Warehouse costs $$$,000’s per terabyte Works only on relational data; doesn’t like unstructured data Fixed schema—you can only query the data in ways that are predefined by the existing schema
7.
7©MapR Technologies -
Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
8.
8©MapR Technologies -
Confidential Data Warehouse Meets Big Data Use ELT to handle semi-structured (or even unstructured) data ELT applies structure after the data is loaded Use compute power to do the transformation Can be done in parallel—that’s what Hadoop is good for! ELT for ETL – process semi-structured data & save structured data Connect via ODBC or JDBC and execute queries on the fly
9.
9©MapR Technologies -
Confidential ELT: Applying Schema on Load CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s" ) STORED AS TEXTFILE;
10.
10©MapR Technologies -
Confidential Read Semi-Structured Data & CreateStructure 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 host 127.0.0.1 identity 1001 user frank time 10/Oct/2000:13:55:36 -0700 request GET /apache_pb.gif HTTP/1.0 status 200 size 2326
11.
11©MapR Technologies -
Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
12.
12©MapR Technologies -
Confidential MapR Strengths for DW Offload Best ROI • 2x Performance • No custom connectors • Unlimited scale Easiest Integration • Works with existing tools • Streaming ingestion and extraction Enterprise Grade Platform • 99.999% HA • Full data protection • Disaster recovery
13.
13©MapR Technologies -
Confidential MapR Customer Case Study Teradata Teradata OLD NEW • All ETL steps done in Teradata • Cost prohibitive scaling • Data warehouse team not able to handle new data formats • Replaced 5 out of 7 ETL steps • Only hot data is stored in EDW • Existing applications not affected • Extensively leverage NFS to directly ingest data into Teradata Large Telecom Company Deployed Billing applications using Teradata Hundreds of users and applications across the enterprise Hadoop
14.
14©MapR Technologies -
Confidential Lots of Data Lots of Scans Across Large Sets Throughput Important Data ShapeTelecom
15.
15©MapR Technologies -
Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
16.
16©MapR Technologies -
Confidential ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
17.
17©MapR Technologies -
Confidential Price Performance EDW strategy –1.5x performance –$30 million MapR Strategy –3x performance –$3 million 20x cost/performance advantage for MapR strategy
18.
18©MapR Technologies -
Confidential Business Impact: Saved $30M in 5 year TCO Able to store all data and have a scalable architecture for future Do not have to maintain any special connectors A happy Ops team enhancing services for its internal customers with MapReduce Implemented the change without impacting internal users MapR Customer Case Study continued
19.
19©MapR Technologies -
Confidential Wrapping It Up… My contact info: jberns@maprtech.com http://www.linkedin.com/in/jfxberns Find the slides at: http://www.slideshare.net Whitepaper with mode details on Data Warehouse Offload: http://www.mapr.com/solutions/data-warehouse-offload
Editor's Notes
----- Meeting Notes (3/22/13 11:57) -----Add a before and afterbroader data sources…. data
Download now