SlideShare uma empresa Scribd logo
1 de 16
Hadoop vs. RDBMS for
Advanced Analytics
Josh Wills
April 26th, 2012
About Me

• jwills@cloudera.com
• Formerly of Google (2008 – 2011)
   • Worked on the ad auction
   • Led the team that build the data infrastructure for Google+
• Before that: a bunch of startups
   • Sometimes as a software engineer, sometimes as a statistician
• Math degree from Duke and a half-finished PhD from The
  University of Texas at Austin
• Now: Director of Data Science at Cloudera




                       Copyright 2012 Cloudera Inc. All rights reserved
Getting Started with Hadoop: Apache Hive

 • Stick with the relational
   models that you are
   used to working with

 • Great for the common
   starter use cases
    • Logs processing
    • Online data archival
    • ETL/ELT


                  Copyright 2012 Cloudera Inc. All rights reserved
Hadoop for Advanced Analytics




When Should I Use Hadoop instead of an RDBMS?




              Copyright 2012 Cloudera Inc. All rights reserved
First Symptom: COUNT DISTINCT




     Copyright 2012 Cloudera Inc. All rights reserved
Second Symptom: Cursors




  Copyright 2012 Cloudera Inc. All rights reserved
Third Symptom: ALTER TABLE OF_DOOM




        Copyright 2012 Cloudera Inc. All rights reserved
The Unit of Analysis Problem

 • Data warehouses are
   optimized to analyze
   transactions
   • Awesome for finance
     and ERP
   • Not ideal for product
     and marketing
 • A function of what
   databases are good at


                Copyright 2012 Cloudera Inc. All rights reserved
What Are You Trying to Analyze?

           Simple Entities                                    Complex Entities
 •   Static attributes                               •    Evolving attributes
 •   Flat data structure                             •    Hierarchical data structure
 •   Transient                                       •    Persistent
 •   Examples                                        •    Examples
     • SKUs                                                 • Customers
     • Line items from an invoice                           • Suppliers
     • Log messages                                         • Website visitors




                     Copyright 2011 Cloudera Inc. All rights reserved
Rods and Cones vs. Facial Recognition




              Copyright 2012 Cloudera Inc. All rights reserved
Structure the Data to Fit the Problem

 • HDFS Lets Us Store Our
   Data However We Want
 • We can choose storage
   schemas that are:
   •   Flexible
   •   Evolvable
   •   Compact
   •   Fast
       serialization/deserializati
       on

                   Copyright 2012 Cloudera Inc. All rights reserved
Advaned Analytics: Use Cases




   Copyright 2012 Cloudera Inc. All rights reserved
Simple Counts on Complex Objects




             Copyright 2012 Cloudera Inc. All rights reserved
Self-Self-Self-Joins




                Copyright 2012 Cloudera Inc. All rights reserved
Matching Problems




             Copyright 2012 Cloudera Inc. All rights reserved
We’re Hiring.
jwills@cloudera.com

Mais conteúdo relacionado

Mais procurados

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive
 

Mais procurados (20)

Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 

Destaque

Abc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa cieplaAbc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa ciepla
abc-kotly
 
Abc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor slonecznyAbc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor sloneczny
abc-kotly
 
Javascript4
Javascript4Javascript4
Javascript4
mozks
 
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem cieplaAbc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
abc-kotly
 
Abc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnegoAbc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnego
abc-kotly
 

Destaque (20)

Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introduction
 
Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
 
Big data meets big analytics
Big data meets big analyticsBig data meets big analytics
Big data meets big analytics
 
BDSA Solutions Comparison sheet
BDSA Solutions Comparison sheetBDSA Solutions Comparison sheet
BDSA Solutions Comparison sheet
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 
Life history of frog
Life history of frogLife history of frog
Life history of frog
 
Abc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa cieplaAbc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa ciepla
 
Abc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor slonecznyAbc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor sloneczny
 
Compu
CompuCompu
Compu
 
Javascript4
Javascript4Javascript4
Javascript4
 
Protección de las mujeres contra la violencia de genero en la argentina
Protección de las mujeres contra la violencia de genero en la argentinaProtección de las mujeres contra la violencia de genero en la argentina
Protección de las mujeres contra la violencia de genero en la argentina
 
La tecnologia esperanzadora
La tecnologia esperanzadoraLa tecnologia esperanzadora
La tecnologia esperanzadora
 
Papi 2008 buku 2
Papi   2008 buku 2Papi   2008 buku 2
Papi 2008 buku 2
 
Barriers to insulin therapy
Barriers to insulin therapyBarriers to insulin therapy
Barriers to insulin therapy
 
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem cieplaAbc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
 
Abc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnegoAbc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnego
 

Semelhante a Hadoop vs. RDBMS for Advanced Analytics

Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
DATAVERSITY
 
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
Mrunal Shridhar
 
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Mrunal Shridhar
 

Semelhante a Hadoop vs. RDBMS for Advanced Analytics (20)

Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Machine Learning and Hadoop: Present and future
Machine Learning and Hadoop: Present and futureMachine Learning and Hadoop: Present and future
Machine Learning and Hadoop: Present and future
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum Computing
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
 
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Hadoop vs. RDBMS for Advanced Analytics

  • 1. Hadoop vs. RDBMS for Advanced Analytics Josh Wills April 26th, 2012
  • 2. About Me • jwills@cloudera.com • Formerly of Google (2008 – 2011) • Worked on the ad auction • Led the team that build the data infrastructure for Google+ • Before that: a bunch of startups • Sometimes as a software engineer, sometimes as a statistician • Math degree from Duke and a half-finished PhD from The University of Texas at Austin • Now: Director of Data Science at Cloudera Copyright 2012 Cloudera Inc. All rights reserved
  • 3. Getting Started with Hadoop: Apache Hive • Stick with the relational models that you are used to working with • Great for the common starter use cases • Logs processing • Online data archival • ETL/ELT Copyright 2012 Cloudera Inc. All rights reserved
  • 4. Hadoop for Advanced Analytics When Should I Use Hadoop instead of an RDBMS? Copyright 2012 Cloudera Inc. All rights reserved
  • 5. First Symptom: COUNT DISTINCT Copyright 2012 Cloudera Inc. All rights reserved
  • 6. Second Symptom: Cursors Copyright 2012 Cloudera Inc. All rights reserved
  • 7. Third Symptom: ALTER TABLE OF_DOOM Copyright 2012 Cloudera Inc. All rights reserved
  • 8. The Unit of Analysis Problem • Data warehouses are optimized to analyze transactions • Awesome for finance and ERP • Not ideal for product and marketing • A function of what databases are good at Copyright 2012 Cloudera Inc. All rights reserved
  • 9. What Are You Trying to Analyze? Simple Entities Complex Entities • Static attributes • Evolving attributes • Flat data structure • Hierarchical data structure • Transient • Persistent • Examples • Examples • SKUs • Customers • Line items from an invoice • Suppliers • Log messages • Website visitors Copyright 2011 Cloudera Inc. All rights reserved
  • 10. Rods and Cones vs. Facial Recognition Copyright 2012 Cloudera Inc. All rights reserved
  • 11. Structure the Data to Fit the Problem • HDFS Lets Us Store Our Data However We Want • We can choose storage schemas that are: • Flexible • Evolvable • Compact • Fast serialization/deserializati on Copyright 2012 Cloudera Inc. All rights reserved
  • 12. Advaned Analytics: Use Cases Copyright 2012 Cloudera Inc. All rights reserved
  • 13. Simple Counts on Complex Objects Copyright 2012 Cloudera Inc. All rights reserved
  • 14. Self-Self-Self-Joins Copyright 2012 Cloudera Inc. All rights reserved
  • 15. Matching Problems Copyright 2012 Cloudera Inc. All rights reserved

Notas do Editor

  1. How do you know you have a unit of analysis problem? You’re doing a bunch of COUNT DISTINCT queries. You’re doing LAG/LEAD-style queries, or using a cursor.