SlideShare uma empresa Scribd logo
1 de 21
: Blue Apron uses Looker and Big Query to
Advance their Analytics
May 2, 2017
Data Modeling in the Age of
Cloud Warehouses
Daniel Mintz
Chief Data Evangelist
3
Some History
4
5
First wave
● Slow, expensive
hardware
● IT-centric
● Manual data entry
Why? Advantages
● Reliable answers
● Pixel-perfect
● Fast (for 1990)
Disadvantages
● Inflexible
● Locked down
● Low-resolution
6
7
Second wave
● Faster PCs
● IT bottleneck
● Growing data
volumes
Why? Advantages
● Agility
● Faster time-to-
insight
● Higher-resolution
Disadvantages
● No shared model
● Dependent on
data extracts
● Tool explosion
8
But…
9
...what if?
1. Storing big data got cheap?
2. Querying big data was fast?
3. Warehouses scaled elastically?
4. Ops was handled?
5. You paid for what you used?
10
Duplicate
records are
“free”
● Repeated data
costs ~$0
● It’s OK to duplicate
if it doesn’t create
inconsistency
11
Distributed
systems are the
new normal
● JOIN/Shuffle
incurs a new cost
● What does any
given JOIN
improve?
12
Then...
13
...you would want?
1. A model that
a. aids performance
b. is flexible and easy to update
c. reflects the real world and is easy to understand
2. A tool that could leverage the warehouse directly
3. A language to abstract away low-level concerns
14
Detour!
1940 1950 1960 1970 1980 1990 2000 2010
Machine Code
Assembly
FORTRAN
COBOL
BASIC C C++
Objective-C
Python
Java
Javascript
PHP
C#
Go
Ruby
1940 1950 1960 1970 1980 1990 2000 2010
Data Written to Files
Roll Your Own b-tree
Codd SQL
Oracle V2
IBM DB2
T-SQL MySQL
PostgreSQL
??????
Programming Language Development
Data Language Development
16
What do we want from data language?
1. Define relationships and definitions once
2. Retain agility
3. Translate business questions to data queries
4. Stay performant
5. Stop worrying about syntax
17
“What about SQL? I love SQL!”
It’s proven, powerful and versatile
Except:
SELECT
DATE(orders.timestamp),
SUM(orders.total)
FROM
orders
WHERE
orders.status = ‘completed’
GROUP BY 1
18
LookML is one solution: builds on SQL
● Reusable
● Collaborative
● Flexible
● Organized
● Version-controlled
19
20
Third wave
● Big Data
Revolution
● Too many tools
● The Internet
Why? Advantages
● Reliable answers
● Agility
● Best-in-class
tools
● Full resolution
Disadvantages
● Shift in thinking
Need a powerful
warehouse
Insight
abundance
21
?

Mais conteúdo relacionado

Semelhante a Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
Alexandru Iosup
 

Semelhante a Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017 (20)

10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
bigdata 2.pptx
bigdata 2.pptxbigdata 2.pptx
bigdata 2.pptx
 
Show me the data
Show me the dataShow me the data
Show me the data
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data Rampage
 
CCXG Workshop, February 2021, Michael Vartanyan
CCXG Workshop, February 2021, Michael VartanyanCCXG Workshop, February 2021, Michael Vartanyan
CCXG Workshop, February 2021, Michael Vartanyan
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Games Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - PlumbeeGames Industry Analytics Forum 2 - Plumbee
Games Industry Analytics Forum 2 - Plumbee
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
IoT Eindhoven Presentation: Why Your Dad's Database Won't Work For IoT
IoT Eindhoven Presentation: Why Your Dad's Database Won't Work For IoTIoT Eindhoven Presentation: Why Your Dad's Database Won't Work For IoT
IoT Eindhoven Presentation: Why Your Dad's Database Won't Work For IoT
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf
 
Big data management
Big data managementBig data management
Big data management
 
Dynamics Day 2014: Doing More with Data
Dynamics Day 2014: Doing More with Data Dynamics Day 2014: Doing More with Data
Dynamics Day 2014: Doing More with Data
 

Mais de Caserta

Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
Caserta
 

Mais de Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 

Último

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017

  • 1. : Blue Apron uses Looker and Big Query to Advance their Analytics May 2, 2017
  • 2. Data Modeling in the Age of Cloud Warehouses Daniel Mintz Chief Data Evangelist
  • 4. 4
  • 5. 5 First wave ● Slow, expensive hardware ● IT-centric ● Manual data entry Why? Advantages ● Reliable answers ● Pixel-perfect ● Fast (for 1990) Disadvantages ● Inflexible ● Locked down ● Low-resolution
  • 6. 6
  • 7. 7 Second wave ● Faster PCs ● IT bottleneck ● Growing data volumes Why? Advantages ● Agility ● Faster time-to- insight ● Higher-resolution Disadvantages ● No shared model ● Dependent on data extracts ● Tool explosion
  • 9. 9 ...what if? 1. Storing big data got cheap? 2. Querying big data was fast? 3. Warehouses scaled elastically? 4. Ops was handled? 5. You paid for what you used?
  • 10. 10 Duplicate records are “free” ● Repeated data costs ~$0 ● It’s OK to duplicate if it doesn’t create inconsistency
  • 11. 11 Distributed systems are the new normal ● JOIN/Shuffle incurs a new cost ● What does any given JOIN improve?
  • 13. 13 ...you would want? 1. A model that a. aids performance b. is flexible and easy to update c. reflects the real world and is easy to understand 2. A tool that could leverage the warehouse directly 3. A language to abstract away low-level concerns
  • 15. 1940 1950 1960 1970 1980 1990 2000 2010 Machine Code Assembly FORTRAN COBOL BASIC C C++ Objective-C Python Java Javascript PHP C# Go Ruby 1940 1950 1960 1970 1980 1990 2000 2010 Data Written to Files Roll Your Own b-tree Codd SQL Oracle V2 IBM DB2 T-SQL MySQL PostgreSQL ?????? Programming Language Development Data Language Development
  • 16. 16 What do we want from data language? 1. Define relationships and definitions once 2. Retain agility 3. Translate business questions to data queries 4. Stay performant 5. Stop worrying about syntax
  • 17. 17 “What about SQL? I love SQL!” It’s proven, powerful and versatile Except: SELECT DATE(orders.timestamp), SUM(orders.total) FROM orders WHERE orders.status = ‘completed’ GROUP BY 1
  • 18. 18 LookML is one solution: builds on SQL ● Reusable ● Collaborative ● Flexible ● Organized ● Version-controlled
  • 19. 19
  • 20. 20 Third wave ● Big Data Revolution ● Too many tools ● The Internet Why? Advantages ● Reliable answers ● Agility ● Best-in-class tools ● Full resolution Disadvantages ● Shift in thinking Need a powerful warehouse Insight abundance
  • 21. 21 ?