SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
© 2019 Snowflake Inc. All Rights Reserved
DOES IT ONLY HAVE TO BE ML & AI?
MACHINE LEARNING INSIDE + OUTSIDE OF
SNOWFLAKE’S CLOUD DATA WAREHOUSE
HARALD ERB I harald.erb@snowflake.com
Nürnberg, 19. November 2019
© 2019 Snowflake Computing Inc. All Rights Reserved
AGENDA
> What is Snowflake? (briefly explained)
> Talk Intro
> Recap: What can a Database do for you?
> What about Implementation of In-Database ML Models?
> Or better use Auto-ML outside your Database?
> End-to-end ML Projects at Scale
© 2019 Snowflake Inc. All Rights Reserved 3
WHAT IS SNOWFLAKE?_
© 2019 Snowflake Computing Inc. All Rights Reserved
SNOWFLAKE: A TEAM OF DATA EXPERTS
4
© 2019 Snowflake Computing Inc. All Rights Reserved
SNOWFLAKE TIMELINE
5
Founded in 2012 by
industry veterans
with over 120
database patents
~$1BN in venture
capital funding from
leading investors
~$4.5BN valuation
First customers
2014, general
availability 2015
1.600+ employees
Over 3.000
customers today
Queries processed in
Snowflake per day:
> 290+ million
Largest single
table:
> 68 trillion rows
Largest number of
tables single DB:
> 200,000
Single customer
most data:
> 55PB
Single customer
most users:
> 10,000
FUN FACTS
Gartner and
Forrester “Leader”
© 2019 Snowflake Computing Inc. All Rights Reserved
A NEW ARCHITECTURE FOR DATA WAREHOUSING
Multi-Cluster, Shared Data, in the Cloud
10
Traditional Architectures Snowflake
Cluster of nodes with a
single shared disk.
Throughput is constrained
by either CPU, memory or
disk access
(DW’s based on
traditional RDBMS)
Shared-Disk (SMP)
Cluster of nodes each of which
has its own disk – data distributed
across the nodes. Not elastic
because data must be
redistributed when resize the
cluster (Most MPP DW‘s, Hadoop)
Shared-Nothing (MPP) Multi-Cluster, Shared Data
Multiple clusters, shared data.
Compute power and storage scale
independently of each other
© 2019 Snowflake Computing Inc. All Rights Reserved
Snowflake’s built-for-the-cloud architecture
provides many benefits. The key is separation
of storage, compute and metadata services.
● Unlimited storage scalability without
refactoring
● Multiple compute clusters can read/write
shared data
● Resize clusters instantly - no downtime
● Centrally manage logical assets
(warehouse, database, etc) - not
technical assets (servers, buckets, etc)
● Full transactional consistency (ACID)
across entire system
THE SNOWFLAKE ELASTIC DATA WAREHOUSE
Management Optimisation Security Availability Transactions Metadata
© 2019 Snowflake Computing Inc. All Rights Reserved
THE SNOWFLAKE DIFFERENCE
12
Traditional DW
(in the Cloud)
Query Service
ADMINISTRATION
Data warehouse
as a service
Customer manages tuning,
optimization and manual
administration
Complete black box
CONCURRENCY
Unlimited, automatic
concurrency scaling
Limited concurrency Poor concurrency scaling
FLEXIBILITY
Native, optimized support
for diverse data
Data transformation
required
Native support, limited
optimization for diverse data
SCALING
Scale on the fly,
in seconds to minutes
Manual, disruptive,
slow scaling
Scale on the fly
© 2019 Snowflake Computing Inc. All Rights Reserved
YOU MIGHT HAVE HEARD SOMETHING SIMILAR
13
If you build replication and failover
Stop, resize, repartition, restart
What you hear What it actually means Difference with Snowflake
We support SQL Some SQL… some BQL… ANSI-Standard SQL, full DML
We scale elastically Instant, automatic resize
We support semi-structured Limited JSON Avro, Parquet, XML, JSON
We’re fast If you sort, index, and tune Built fast with no tuning
We are great for concurrency If it’s only 4-5 concurrent queries Scale linearly for concurrency
We’re global Global HA, out of the box
© 2019 Snowflake Inc. All Rights Reserved 14
TALK INTRO_
© 2019 Snowflake Computing Inc. All Rights Reserved
ML & AI – FROM HYPE TO PRODUCTIVITY
15
> Enterprises continue to establish a
data-driven culture
• Predictive analytics matures; “What is
likely to happen?” questions allow
organizations to become proactive and
not to rely on human experience and
intuitions only
• Machine Learning is producing some
quantifiable results today; integration with
operations (procedures and processes) is
still a challenge
• Automation: data preparation, insight
discovery, data science, ML Model
development + complex decisioning
à still a future topic
> “Citizen Data Scientists”
• To fill the data scientist talent gap
• Modern analytics tools to guide business
users through the process and help to
extract advanced analytic insights from data
• “Real” data scientists to focus on more
difficult analytics work and insight.
Source: Gartner Hype Cycle for Midsize Enterprises 2019 à Link
© 2019 Snowflake Computing Inc. All Rights Reserved
FOCUSING ON THE RIGHT THINGS
16
> SQL vs. Machine Learning vs. Machine Learning
Applied to SQL
• SQL for BI Level Analysis: Business questions and many
prediction problems can be solved by well-crafted SQL –
and it offers explainability that deep ML generally does not
• Still true: “Garbage in, Garbage Out”- nothing of substance
can come from BI or ML without good data à Data
Collection + Engineering is a sophisticated discipline,
consumes a lot of time à but these activities are crucial on
making information available reliably at scale
• Machine Learning: helpful to spot complex patterns,
maybe less important for predictions. Applied ML for better
data preparation can be very beneficial
„Wow! ML“
„BI-level Analysis“
„Data Engineering“
„Sometimes
needed“
„Always needed,
often enough for
predictions“
„Always needed“
Sources: A. Jhingran, Talk at VLDB 2019, Slides à Link ;
C. Kozyrkov, Towards Data Science Blog, 2019 à Link
> Decision Intelligence
• A new engineering discipline that augments data science with
theory from social science, decision theory, and managerial
science
• Goal: Turning information into better actions at any scale.
• Provides a framework for best practices in organizational decision-
making and processes for applying machine learning at scale.
• […] Theory skipped for this talk, but here some interesting questions
to think about
- “How should you set up decision criteria and design your metrics?”
- “What quality should you make this decision at and how much should
you pay for perfect information?” (Decision analysis)
- “How do emotions, heuristics, and biases play into decision-making?”
(Psychology)
> Fact-based decisions are not enough?
Enter à Data Science
• Use partial facts along with statistics, analytics, ML & AI to deal with
uncertainty.
• Remember: The goal (objective) is always the starting point!
© 2019 Snowflake Computing Inc. All Rights Reserved 17
FIXING DEPLOYMENT ISSUES
Source: D. Sculley, et al.: “Hidden technical debt in Machine learning systems”, 2015
© 2019 Snowflake Computing Inc. All Rights Reserved
FIXING DEPLOYMENT ISSUES – MAYBE WITH MLOPS
18
© 2019 Snowflake Inc. All Rights Reserved 19
RECAP:
WHAT CAN A DATABASE DO FOR YOU?_
© 2019 Snowflake Computing Inc. All Rights Reserved
RECAP #1: FACT-BASED DECISIONING
21
TPC-DS Benchmark Query Q57:
Catalog Sales Call Center Outliers
„Find the item brands and categories for each call center and their monthly sales figures for a specified
year, where the monthly sales figure deviated more than 10% of the average monthly sales for the year,
sorted by deviation and call center. Report the sales deviation from the previous and following months.“
> „BI-level Analysis“
• Mature Point & Click Tools based on a well-crafted
Semantic Layer on top of a (virtualized) Data Mart
enable lots of business users to answer even complex
questions
• Challenges: rapidly growing data volumes, resource
contention issues lead to restricted DW access (instead
of wider end user adoption)
© 2019 Snowflake Computing Inc. All Rights Reserved
Elastic compute: Snowflake separates
user workloads through multiple Virtual
Warehouses which scale instantly to meet
required performance levels
and can auto-resize in the case of peak
workloads to eliminate concurrency issues
Feature
© 2019 Snowflake Computing Inc. All Rights Reserved
RECAP #2: FAST DATA EXPLORATION & AGGREGATION
23
Sessionized clickstream data:
Finding sessions that include at least
one "addtocart" event, but do NOT include
a transaction.
ANSI Compliant SQL, comprehensive
set of aggregation, window, pattern
matching SQL functionsFeature
© 2019 Snowflake Computing Inc. All Rights Reserved
WORKLOAD SEPARATION IN SNOWFLAKE
24
Continuous
Loading (4TB/day)
S3
<5min SLA
Virtual
Warehouse
Medium
Batch ETL &
Maintenance
Virtual Warehouse
Large
Virtual
Warehouse
2X-Large
Analytics
(Segmentation)
Interactive
Dashboards
50% < 1s
85% < 2s
95% < 5s
Virtual Warehouse
Auto Scale – X-Large x 5
3+ PB of raw data
1,5 PB data stored in Database (8x compression ratio)
25M micro partitions
Prod DB
EXCURSUS
© 2019 Snowflake Computing Inc. All Rights Reserved
RECAP #3: ANALYTICS ON (SEMI-)STRUCTURED DATA
25
Ingest external weather data (JSON format)
and make it instantly available for SQL queries
Use case: Blend historical city bike trip data
with semi-structured weather data to spot
new patterns in customer behavior
© 2019 Snowflake Computing Inc. All Rights Reserved 26
Feature
Because of Snowflake’s VARIANT
data type, semi-structured data can be
handeled with similar performance
compared to structured
data. ”Real world" SQL including
Common Table Expressions (CTEs)
and User Defined Functions (UDFs)
are also supported.
Selecting directly from a JSON document stored in
a VARIANT column of a table
© 2019 Snowflake Computing Inc. All Rights Reserved
RECAP #4: FORECAST USING AGGREGATION FUNCTIONS
27
Calculation of Linear Regression Line + UNION ALL
of actual data and forecast data for a complete set of
sales data
Use case: Forecast price for the next hour interval Simple linear regression model:
Actual Predicted
© 2019 Snowflake Computing Inc. All Rights Reserved 28
Simple linear regression model:
Aggregation Functions for
Linear Regression
© 2019 Snowflake Inc. All Rights Reserved 29
WHAT ABOUT IMPLEMENTATION OF
IN-DATABASE ML MODELS?_
© 2019 Snowflake Computing Inc. All Rights Reserved
IT CAN BE DONE IN A DATABASE…
30
Working (experimental) examples
in Snowflake:
> K-Means Clustering,
> Predictions using an ID3
Decision Tree algorithm, or even
> Hierarchical Temporal Memory
(HTM) approach
© 2019 Snowflake Computing Inc. All Rights Reserved
…USING SQL, UDF’S, STORED PROCEDURES & MATH…
31
Feature
Snowflake stored procedures are
implemented through JavaScript
and, optionally, SQL:
• JavaScript provides the control
structures (branching and
looping).
• SQL is executed within the
JavaScript by calling
functions in an API (SQL is
not required in a stored
procedure, but is typically
included)
Embedded SQL
© 2019 Snowflake Computing Inc. All Rights Reserved
Working with JSON
© 2019 Snowflake Computing Inc. All Rights Reserved
Result of Procedure Call
© 2019 Snowflake Computing Inc. All Rights Reserved
…IT MAKES SOME SENSE FROM A DEPLOYMENT PERSPECTIVE…
34
Line of
Governance
• Structured + semi-
structured data
• Raw data available for
discovery
• Self-Service sandbox
• Multiple toolsets / IDE’s
• Readable code!
• Same technology for
commercial exploitation
• Direct access via SQL
• Elastic compute
• Versioning
• Standardisation & governance
Model
© 2019 Snowflake Computing Inc. All Rights Reserved
…BUT PURE CODING IS NOT WORKING FOR EVERYONE!
35
Challenge: Fixing the Data
Scientists Talent Gap
> In any organization there
are many Business
Analysts, BI power users,
who are curious to explore
data science and predictive
algorithms for their
business case
> Enablement through basic
learning, literacy and the
right tools will lead these
individuals to transform to
Citizen Data Scientists to
do their hypothesis and
prototyping on their own
> Probably the only feasible
way today to democratize
advanced analytics in an
organisation
Potential large
user base
Citizen Data
Scientist
Potential
user impact
© 2019 Snowflake Inc. All Rights Reserved 36
AUTO-ML OUTSIDE YOUR DATABASE?_
© 2019 Snowflake Computing Inc. All Rights Reserved 37
WOW! AUTOML
> Automated Machine Learning (AutoML)
• Is the process of automating the entire end-to-end process (or some
steps) of applying ML to real-world problems:
- Data pre-processing
- Feature engineering, extraction, and selection
- Algorithm selection & hyperparameter optimization
• Accuracy of ML solutions can be measured à automated systems
can fine-tune data, features, algorithms to generate accurate models
relying on established ML knowledge
> Benefits of AutoML
• Cost reductions: Increased productivity for data scientists and/or
Democratization of machine learning reduces demand for data
scientists
• Intelligence can be easily added to applications to à Increase
revenues and customer satisfaction
• Higher productivity: Roll out more models with increased accuracy
> The Data Scientists advantage
• Conformance to custom specifications, i.e. if a model needs to be
embedded in edge devices, or if Explainability is required
• Model performance: Humans are still beating models generated by
AutoML tools.
© 2019 Snowflake Computing Inc. All Rights Reserved 38
AMAZON FORECAST – DATA IMPORT
• Input file format has to be csv
• Data schema of new time series
dataset needs to be specified and
mapped to required input format
• Data import from AWS S3 buckets
only
• A Dataset can have multiple types:
- TARGET_TIME_SERIES - historical
time series data for each item
- RELATED_TIME_SERIES –
additional numeric data points, i.e.
price, webpage_hits, flags (1,0); the
more information available, the more
accurate the forecast
- ITEM_METADATA – additional
metadata (attributes), i.e. category,
color, brand
© 2019 Snowflake Computing Inc. All Rights Reserved 39
AMAZON FORECAST – MODEL TRAINING
• Instead of AutoML, manual algorithm selection is also possible:
- Autoregressive Integrated Moving Average (ARIMA)
- DeepAR+ (incl. hyperparameter optimization)
- Exponential Smoothing (ETS)
- Non-Parametric Time Series (NPTS)
- Prophet Algorithm
• Additional configuration in non-AutoML mode:
• Predictor accuracy needs to be evaluated using related metadata + metrics,
i.e. RMSE. Training and featurization configurations are also available
© 2019 Snowflake Computing Inc. All Rights Reserved 40
AMAZON FORECAST – FORECAST GENERATION + EXPORT
• Based on the evaluation metrics of
previously trained models (predictors),
a good performing predictor can be
used to generate a forecast for each
unique item in a given target time-
series dataset
• Retrieval of a forecast for a single
item à via query incl. filter (time
window)
• Export of the complete forecast into
an Amazon S3 bucket
© 2019 Snowflake Computing Inc. All Rights Reserved 41
INTEGRATING SNOWFLAKE WITH AMAZON FORECAST
Source: aws.amazon.com/forecast
Scenario with Snowflake
(AWS deployment)
• Prepare & retrieve time
series data in Snowflake
• Export data set into
Snowflake stage (= S3
bucket)
• Use AWS Forecast via
Console, CLI, or API’s
• Retrieve forecast results as
csv files or via API and
write it back to Snowflake
Snowflake
Connector
for PythonFeature
© 2019 Snowflake Computing Inc. All Rights Reserved 42
USING PYTHON TO ORCHESTRATE THE OVERALL PROCESS
> Using Amazon Forecast with Python
• For Python, AWS provides a SDK called “Boto 3” enabling
developers to create, configure, and manage AWS services, such as
EC2 and S3. Boto provides also low-level access to AWS services
like AWS Forecast à Documentation Link
• AWS Forecast API Reference provides all actions explained in the
previous slides à Link
• Jupyter Notebooks with detailed examples on Amazon Forecast
are available in Github à Link
> Snowflake Connector for Python
• Provides an interface for developing Python applications that can
connect to Snowflake and perform all standard operations:
- Connecting to Snowflake with the Default Authenticator, or with a
SAML 2.0-compliant identity provider
- Query date, create & set up new database and tables, grant access,…
- Assign, resize a compute cluster (Virtual Warehouse)
- Load/unload data from/to Amazon S3 (or other Cloud Storage)
• End-to-end integration example is explained in Snowflake’s Blog à
Link, sample Python code can be downloaded from a Github Repo
à Link
© 2019 Snowflake Inc. All Rights Reserved 43
END-TO-END ML PROJECTS AT SCALE_
© 2019 Snowflake Computing Inc. All Rights Reserved 44
KEY ROLES IN A DATA-POWERED ORGANIZATION…
Source: Dataiku.com
© 2019 Snowflake Computing Inc. All Rights Reserved 45
…(IDEAL) CROSS-COLLABORATION IN ML PROJECTS…
Source:Dataiku
© 2019 Snowflake Computing Inc. All Rights Reserved
…SUPPORTED BY A SCALABLE ANALYTICS ARCHITECTURE
46
© 2019 Snowflake Computing Inc. All Rights Reserved 47
SNOWFLAKE INTEGRATION WITH A DATA SCIENCE SUITE
EXAMPLE
End-to-end Data Flow
© 2019 Snowflake Computing Inc. All Rights Reserved 48
SNOWFLAKE INTEGRATION WITH A DATA SCIENCE SUITE
End-to-end Data Flow
EXAMPLE
Automatic bulk copy
of datasets from S3
Data Lake to
Snowflake
Automatic table
creation and data
movement
Run complex SQL
directly in
Snowflake utilising
its built-in functions
Visual data transformations
operations (prepare, group
by, filter, split…)
automatically pushed down
to Snowflake
Use Python and
coding recipes and
execute it in
Snowflake
Train and use build-
in ML models on
Snowflake Data
Interactive SQL
notebooks for
interactive analysis
In-database” charting
to visual entire
datasets (stored in
Snowflake)
© 2019 Snowflake Inc. All Rights Reserved 49
Q&A_
© 2019 Snowflake Inc. All Rights Reserved
THANK YOU
© 2019 Snowflake Inc. All Rights Reserved 51
APPENDIX_
© 2019 Snowflake Computing Inc. All Rights Reserved
TRY SNOWFLAKE YOURSELF!
Snowflake Hands-on Lab
Guide à Download
Handbook here
© 2019 Snowflake Computing Inc. All Rights Reserved
SNOWFLAKE SIGMOD PAPER
Download:
www.snowflake.com/resource/
sigmod-2016-paper-snowflake-elastic-data-warehouse
© 2019 Snowflake Computing Inc. All Rights Reserved
SELECTED SNOWFLAKE TECH ARTICLES
> SNOWFLAKE CHALLENGE: CONCURRENT LOAD AND QUERY,
Benoit Dageville
> DON’T IGNORE ACID-COMPLIANT DATA PROCESSING IN THE CLOUD
Michael Nixon
> HOW TO LOAD TERABYTES INTO SNOWFLAKE – SPEEDS, FEEDS AND TECHNIQUES
Stuart Ozer
> SNOWFLAKE AND SPARK: PUSHING SPARK QUERY PROCESSING TO SNOWFLAKE
Edward Ma
> HOW WE BUILT SNOWFLAKE ON AZURE
Polita Paulus
> DATA MODELING IN THE AGE OF JSON AND SCHEMA-ON-READ
Kent Graziano
> HOW TO MANAGE GDPR COMPLIANCE WITH SNOWFLAKE’S TIME TRAVEL AND
DISASTER RECOVERY
Kent Graziano
> DATA ENCRYPTION WITH CUSTOMER-MANAGED KEYS
Martin Hentschel

Mais conteúdo relacionado

Mais procurados

Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
Brandon Berlinrut
 

Mais procurados (19)

Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Launching a Data Platform on Snowflake
Launching a Data Platform on SnowflakeLaunching a Data Platform on Snowflake
Launching a Data Platform on Snowflake
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
2021 gartner mq dsml
2021 gartner mq dsml2021 gartner mq dsml
2021 gartner mq dsml
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake Practice
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Data Vault 2.0 - Getting Started | Certus Solutions
Data Vault 2.0 - Getting Started | Certus SolutionsData Vault 2.0 - Getting Started | Certus Solutions
Data Vault 2.0 - Getting Started | Certus Solutions
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through Metadata
 

Semelhante a Does it only have to be ML + AI?

Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
Dobler Consulting
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP Data
Denodo
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
CCG
 

Semelhante a Does it only have to be ML + AI? (20)

Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Analytics on system z final
Analytics on system z finalAnalytics on system z final
Analytics on system z final
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
Become More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP DataBecome More Data-driven by Leveraging Your SAP Data
Become More Data-driven by Leveraging Your SAP Data
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
Jakarta keynote
Jakarta keynoteJakarta keynote
Jakarta keynote
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
 
Snowflake + Syncsort: Get Value from Your Mainframe Data
Snowflake + Syncsort: Get Value from Your Mainframe DataSnowflake + Syncsort: Get Value from Your Mainframe Data
Snowflake + Syncsort: Get Value from Your Mainframe Data
 
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
 
Is your data paying you dividends?
Is your data paying you dividends? Is your data paying you dividends?
Is your data paying you dividends?
 

Mais de Harald Erb

Mais de Harald Erb (9)

Machine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenMachine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für Architekten
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
 
Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
DOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataDOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big Data
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
 

Último

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Does it only have to be ML + AI?

  • 1. © 2019 Snowflake Inc. All Rights Reserved DOES IT ONLY HAVE TO BE ML & AI? MACHINE LEARNING INSIDE + OUTSIDE OF SNOWFLAKE’S CLOUD DATA WAREHOUSE HARALD ERB I harald.erb@snowflake.com Nürnberg, 19. November 2019
  • 2. © 2019 Snowflake Computing Inc. All Rights Reserved AGENDA > What is Snowflake? (briefly explained) > Talk Intro > Recap: What can a Database do for you? > What about Implementation of In-Database ML Models? > Or better use Auto-ML outside your Database? > End-to-end ML Projects at Scale
  • 3. © 2019 Snowflake Inc. All Rights Reserved 3 WHAT IS SNOWFLAKE?_
  • 4. © 2019 Snowflake Computing Inc. All Rights Reserved SNOWFLAKE: A TEAM OF DATA EXPERTS 4
  • 5. © 2019 Snowflake Computing Inc. All Rights Reserved SNOWFLAKE TIMELINE 5 Founded in 2012 by industry veterans with over 120 database patents ~$1BN in venture capital funding from leading investors ~$4.5BN valuation First customers 2014, general availability 2015 1.600+ employees Over 3.000 customers today Queries processed in Snowflake per day: > 290+ million Largest single table: > 68 trillion rows Largest number of tables single DB: > 200,000 Single customer most data: > 55PB Single customer most users: > 10,000 FUN FACTS Gartner and Forrester “Leader”
  • 6. © 2019 Snowflake Computing Inc. All Rights Reserved A NEW ARCHITECTURE FOR DATA WAREHOUSING Multi-Cluster, Shared Data, in the Cloud 10 Traditional Architectures Snowflake Cluster of nodes with a single shared disk. Throughput is constrained by either CPU, memory or disk access (DW’s based on traditional RDBMS) Shared-Disk (SMP) Cluster of nodes each of which has its own disk – data distributed across the nodes. Not elastic because data must be redistributed when resize the cluster (Most MPP DW‘s, Hadoop) Shared-Nothing (MPP) Multi-Cluster, Shared Data Multiple clusters, shared data. Compute power and storage scale independently of each other
  • 7. © 2019 Snowflake Computing Inc. All Rights Reserved Snowflake’s built-for-the-cloud architecture provides many benefits. The key is separation of storage, compute and metadata services. ● Unlimited storage scalability without refactoring ● Multiple compute clusters can read/write shared data ● Resize clusters instantly - no downtime ● Centrally manage logical assets (warehouse, database, etc) - not technical assets (servers, buckets, etc) ● Full transactional consistency (ACID) across entire system THE SNOWFLAKE ELASTIC DATA WAREHOUSE Management Optimisation Security Availability Transactions Metadata
  • 8. © 2019 Snowflake Computing Inc. All Rights Reserved THE SNOWFLAKE DIFFERENCE 12 Traditional DW (in the Cloud) Query Service ADMINISTRATION Data warehouse as a service Customer manages tuning, optimization and manual administration Complete black box CONCURRENCY Unlimited, automatic concurrency scaling Limited concurrency Poor concurrency scaling FLEXIBILITY Native, optimized support for diverse data Data transformation required Native support, limited optimization for diverse data SCALING Scale on the fly, in seconds to minutes Manual, disruptive, slow scaling Scale on the fly
  • 9. © 2019 Snowflake Computing Inc. All Rights Reserved YOU MIGHT HAVE HEARD SOMETHING SIMILAR 13 If you build replication and failover Stop, resize, repartition, restart What you hear What it actually means Difference with Snowflake We support SQL Some SQL… some BQL… ANSI-Standard SQL, full DML We scale elastically Instant, automatic resize We support semi-structured Limited JSON Avro, Parquet, XML, JSON We’re fast If you sort, index, and tune Built fast with no tuning We are great for concurrency If it’s only 4-5 concurrent queries Scale linearly for concurrency We’re global Global HA, out of the box
  • 10. © 2019 Snowflake Inc. All Rights Reserved 14 TALK INTRO_
  • 11. © 2019 Snowflake Computing Inc. All Rights Reserved ML & AI – FROM HYPE TO PRODUCTIVITY 15 > Enterprises continue to establish a data-driven culture • Predictive analytics matures; “What is likely to happen?” questions allow organizations to become proactive and not to rely on human experience and intuitions only • Machine Learning is producing some quantifiable results today; integration with operations (procedures and processes) is still a challenge • Automation: data preparation, insight discovery, data science, ML Model development + complex decisioning à still a future topic > “Citizen Data Scientists” • To fill the data scientist talent gap • Modern analytics tools to guide business users through the process and help to extract advanced analytic insights from data • “Real” data scientists to focus on more difficult analytics work and insight. Source: Gartner Hype Cycle for Midsize Enterprises 2019 à Link
  • 12. © 2019 Snowflake Computing Inc. All Rights Reserved FOCUSING ON THE RIGHT THINGS 16 > SQL vs. Machine Learning vs. Machine Learning Applied to SQL • SQL for BI Level Analysis: Business questions and many prediction problems can be solved by well-crafted SQL – and it offers explainability that deep ML generally does not • Still true: “Garbage in, Garbage Out”- nothing of substance can come from BI or ML without good data à Data Collection + Engineering is a sophisticated discipline, consumes a lot of time à but these activities are crucial on making information available reliably at scale • Machine Learning: helpful to spot complex patterns, maybe less important for predictions. Applied ML for better data preparation can be very beneficial „Wow! ML“ „BI-level Analysis“ „Data Engineering“ „Sometimes needed“ „Always needed, often enough for predictions“ „Always needed“ Sources: A. Jhingran, Talk at VLDB 2019, Slides à Link ; C. Kozyrkov, Towards Data Science Blog, 2019 à Link > Decision Intelligence • A new engineering discipline that augments data science with theory from social science, decision theory, and managerial science • Goal: Turning information into better actions at any scale. • Provides a framework for best practices in organizational decision- making and processes for applying machine learning at scale. • […] Theory skipped for this talk, but here some interesting questions to think about - “How should you set up decision criteria and design your metrics?” - “What quality should you make this decision at and how much should you pay for perfect information?” (Decision analysis) - “How do emotions, heuristics, and biases play into decision-making?” (Psychology) > Fact-based decisions are not enough? Enter à Data Science • Use partial facts along with statistics, analytics, ML & AI to deal with uncertainty. • Remember: The goal (objective) is always the starting point!
  • 13. © 2019 Snowflake Computing Inc. All Rights Reserved 17 FIXING DEPLOYMENT ISSUES Source: D. Sculley, et al.: “Hidden technical debt in Machine learning systems”, 2015
  • 14. © 2019 Snowflake Computing Inc. All Rights Reserved FIXING DEPLOYMENT ISSUES – MAYBE WITH MLOPS 18
  • 15. © 2019 Snowflake Inc. All Rights Reserved 19 RECAP: WHAT CAN A DATABASE DO FOR YOU?_
  • 16. © 2019 Snowflake Computing Inc. All Rights Reserved RECAP #1: FACT-BASED DECISIONING 21 TPC-DS Benchmark Query Q57: Catalog Sales Call Center Outliers „Find the item brands and categories for each call center and their monthly sales figures for a specified year, where the monthly sales figure deviated more than 10% of the average monthly sales for the year, sorted by deviation and call center. Report the sales deviation from the previous and following months.“ > „BI-level Analysis“ • Mature Point & Click Tools based on a well-crafted Semantic Layer on top of a (virtualized) Data Mart enable lots of business users to answer even complex questions • Challenges: rapidly growing data volumes, resource contention issues lead to restricted DW access (instead of wider end user adoption)
  • 17. © 2019 Snowflake Computing Inc. All Rights Reserved Elastic compute: Snowflake separates user workloads through multiple Virtual Warehouses which scale instantly to meet required performance levels and can auto-resize in the case of peak workloads to eliminate concurrency issues Feature
  • 18. © 2019 Snowflake Computing Inc. All Rights Reserved RECAP #2: FAST DATA EXPLORATION & AGGREGATION 23 Sessionized clickstream data: Finding sessions that include at least one "addtocart" event, but do NOT include a transaction. ANSI Compliant SQL, comprehensive set of aggregation, window, pattern matching SQL functionsFeature
  • 19. © 2019 Snowflake Computing Inc. All Rights Reserved WORKLOAD SEPARATION IN SNOWFLAKE 24 Continuous Loading (4TB/day) S3 <5min SLA Virtual Warehouse Medium Batch ETL & Maintenance Virtual Warehouse Large Virtual Warehouse 2X-Large Analytics (Segmentation) Interactive Dashboards 50% < 1s 85% < 2s 95% < 5s Virtual Warehouse Auto Scale – X-Large x 5 3+ PB of raw data 1,5 PB data stored in Database (8x compression ratio) 25M micro partitions Prod DB EXCURSUS
  • 20. © 2019 Snowflake Computing Inc. All Rights Reserved RECAP #3: ANALYTICS ON (SEMI-)STRUCTURED DATA 25 Ingest external weather data (JSON format) and make it instantly available for SQL queries Use case: Blend historical city bike trip data with semi-structured weather data to spot new patterns in customer behavior
  • 21. © 2019 Snowflake Computing Inc. All Rights Reserved 26 Feature Because of Snowflake’s VARIANT data type, semi-structured data can be handeled with similar performance compared to structured data. ”Real world" SQL including Common Table Expressions (CTEs) and User Defined Functions (UDFs) are also supported. Selecting directly from a JSON document stored in a VARIANT column of a table
  • 22. © 2019 Snowflake Computing Inc. All Rights Reserved RECAP #4: FORECAST USING AGGREGATION FUNCTIONS 27 Calculation of Linear Regression Line + UNION ALL of actual data and forecast data for a complete set of sales data Use case: Forecast price for the next hour interval Simple linear regression model: Actual Predicted
  • 23. © 2019 Snowflake Computing Inc. All Rights Reserved 28 Simple linear regression model: Aggregation Functions for Linear Regression
  • 24. © 2019 Snowflake Inc. All Rights Reserved 29 WHAT ABOUT IMPLEMENTATION OF IN-DATABASE ML MODELS?_
  • 25. © 2019 Snowflake Computing Inc. All Rights Reserved IT CAN BE DONE IN A DATABASE… 30 Working (experimental) examples in Snowflake: > K-Means Clustering, > Predictions using an ID3 Decision Tree algorithm, or even > Hierarchical Temporal Memory (HTM) approach
  • 26. © 2019 Snowflake Computing Inc. All Rights Reserved …USING SQL, UDF’S, STORED PROCEDURES & MATH… 31 Feature Snowflake stored procedures are implemented through JavaScript and, optionally, SQL: • JavaScript provides the control structures (branching and looping). • SQL is executed within the JavaScript by calling functions in an API (SQL is not required in a stored procedure, but is typically included) Embedded SQL
  • 27. © 2019 Snowflake Computing Inc. All Rights Reserved Working with JSON
  • 28. © 2019 Snowflake Computing Inc. All Rights Reserved Result of Procedure Call
  • 29. © 2019 Snowflake Computing Inc. All Rights Reserved …IT MAKES SOME SENSE FROM A DEPLOYMENT PERSPECTIVE… 34 Line of Governance • Structured + semi- structured data • Raw data available for discovery • Self-Service sandbox • Multiple toolsets / IDE’s • Readable code! • Same technology for commercial exploitation • Direct access via SQL • Elastic compute • Versioning • Standardisation & governance Model
  • 30. © 2019 Snowflake Computing Inc. All Rights Reserved …BUT PURE CODING IS NOT WORKING FOR EVERYONE! 35 Challenge: Fixing the Data Scientists Talent Gap > In any organization there are many Business Analysts, BI power users, who are curious to explore data science and predictive algorithms for their business case > Enablement through basic learning, literacy and the right tools will lead these individuals to transform to Citizen Data Scientists to do their hypothesis and prototyping on their own > Probably the only feasible way today to democratize advanced analytics in an organisation Potential large user base Citizen Data Scientist Potential user impact
  • 31. © 2019 Snowflake Inc. All Rights Reserved 36 AUTO-ML OUTSIDE YOUR DATABASE?_
  • 32. © 2019 Snowflake Computing Inc. All Rights Reserved 37 WOW! AUTOML > Automated Machine Learning (AutoML) • Is the process of automating the entire end-to-end process (or some steps) of applying ML to real-world problems: - Data pre-processing - Feature engineering, extraction, and selection - Algorithm selection & hyperparameter optimization • Accuracy of ML solutions can be measured à automated systems can fine-tune data, features, algorithms to generate accurate models relying on established ML knowledge > Benefits of AutoML • Cost reductions: Increased productivity for data scientists and/or Democratization of machine learning reduces demand for data scientists • Intelligence can be easily added to applications to à Increase revenues and customer satisfaction • Higher productivity: Roll out more models with increased accuracy > The Data Scientists advantage • Conformance to custom specifications, i.e. if a model needs to be embedded in edge devices, or if Explainability is required • Model performance: Humans are still beating models generated by AutoML tools.
  • 33. © 2019 Snowflake Computing Inc. All Rights Reserved 38 AMAZON FORECAST – DATA IMPORT • Input file format has to be csv • Data schema of new time series dataset needs to be specified and mapped to required input format • Data import from AWS S3 buckets only • A Dataset can have multiple types: - TARGET_TIME_SERIES - historical time series data for each item - RELATED_TIME_SERIES – additional numeric data points, i.e. price, webpage_hits, flags (1,0); the more information available, the more accurate the forecast - ITEM_METADATA – additional metadata (attributes), i.e. category, color, brand
  • 34. © 2019 Snowflake Computing Inc. All Rights Reserved 39 AMAZON FORECAST – MODEL TRAINING • Instead of AutoML, manual algorithm selection is also possible: - Autoregressive Integrated Moving Average (ARIMA) - DeepAR+ (incl. hyperparameter optimization) - Exponential Smoothing (ETS) - Non-Parametric Time Series (NPTS) - Prophet Algorithm • Additional configuration in non-AutoML mode: • Predictor accuracy needs to be evaluated using related metadata + metrics, i.e. RMSE. Training and featurization configurations are also available
  • 35. © 2019 Snowflake Computing Inc. All Rights Reserved 40 AMAZON FORECAST – FORECAST GENERATION + EXPORT • Based on the evaluation metrics of previously trained models (predictors), a good performing predictor can be used to generate a forecast for each unique item in a given target time- series dataset • Retrieval of a forecast for a single item à via query incl. filter (time window) • Export of the complete forecast into an Amazon S3 bucket
  • 36. © 2019 Snowflake Computing Inc. All Rights Reserved 41 INTEGRATING SNOWFLAKE WITH AMAZON FORECAST Source: aws.amazon.com/forecast Scenario with Snowflake (AWS deployment) • Prepare & retrieve time series data in Snowflake • Export data set into Snowflake stage (= S3 bucket) • Use AWS Forecast via Console, CLI, or API’s • Retrieve forecast results as csv files or via API and write it back to Snowflake Snowflake Connector for PythonFeature
  • 37. © 2019 Snowflake Computing Inc. All Rights Reserved 42 USING PYTHON TO ORCHESTRATE THE OVERALL PROCESS > Using Amazon Forecast with Python • For Python, AWS provides a SDK called “Boto 3” enabling developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides also low-level access to AWS services like AWS Forecast à Documentation Link • AWS Forecast API Reference provides all actions explained in the previous slides à Link • Jupyter Notebooks with detailed examples on Amazon Forecast are available in Github à Link > Snowflake Connector for Python • Provides an interface for developing Python applications that can connect to Snowflake and perform all standard operations: - Connecting to Snowflake with the Default Authenticator, or with a SAML 2.0-compliant identity provider - Query date, create & set up new database and tables, grant access,… - Assign, resize a compute cluster (Virtual Warehouse) - Load/unload data from/to Amazon S3 (or other Cloud Storage) • End-to-end integration example is explained in Snowflake’s Blog à Link, sample Python code can be downloaded from a Github Repo à Link
  • 38. © 2019 Snowflake Inc. All Rights Reserved 43 END-TO-END ML PROJECTS AT SCALE_
  • 39. © 2019 Snowflake Computing Inc. All Rights Reserved 44 KEY ROLES IN A DATA-POWERED ORGANIZATION… Source: Dataiku.com
  • 40. © 2019 Snowflake Computing Inc. All Rights Reserved 45 …(IDEAL) CROSS-COLLABORATION IN ML PROJECTS… Source:Dataiku
  • 41. © 2019 Snowflake Computing Inc. All Rights Reserved …SUPPORTED BY A SCALABLE ANALYTICS ARCHITECTURE 46
  • 42. © 2019 Snowflake Computing Inc. All Rights Reserved 47 SNOWFLAKE INTEGRATION WITH A DATA SCIENCE SUITE EXAMPLE End-to-end Data Flow
  • 43. © 2019 Snowflake Computing Inc. All Rights Reserved 48 SNOWFLAKE INTEGRATION WITH A DATA SCIENCE SUITE End-to-end Data Flow EXAMPLE Automatic bulk copy of datasets from S3 Data Lake to Snowflake Automatic table creation and data movement Run complex SQL directly in Snowflake utilising its built-in functions Visual data transformations operations (prepare, group by, filter, split…) automatically pushed down to Snowflake Use Python and coding recipes and execute it in Snowflake Train and use build- in ML models on Snowflake Data Interactive SQL notebooks for interactive analysis In-database” charting to visual entire datasets (stored in Snowflake)
  • 44. © 2019 Snowflake Inc. All Rights Reserved 49 Q&A_
  • 45. © 2019 Snowflake Inc. All Rights Reserved THANK YOU
  • 46. © 2019 Snowflake Inc. All Rights Reserved 51 APPENDIX_
  • 47. © 2019 Snowflake Computing Inc. All Rights Reserved TRY SNOWFLAKE YOURSELF! Snowflake Hands-on Lab Guide à Download Handbook here
  • 48. © 2019 Snowflake Computing Inc. All Rights Reserved SNOWFLAKE SIGMOD PAPER Download: www.snowflake.com/resource/ sigmod-2016-paper-snowflake-elastic-data-warehouse
  • 49. © 2019 Snowflake Computing Inc. All Rights Reserved SELECTED SNOWFLAKE TECH ARTICLES > SNOWFLAKE CHALLENGE: CONCURRENT LOAD AND QUERY, Benoit Dageville > DON’T IGNORE ACID-COMPLIANT DATA PROCESSING IN THE CLOUD Michael Nixon > HOW TO LOAD TERABYTES INTO SNOWFLAKE – SPEEDS, FEEDS AND TECHNIQUES Stuart Ozer > SNOWFLAKE AND SPARK: PUSHING SPARK QUERY PROCESSING TO SNOWFLAKE Edward Ma > HOW WE BUILT SNOWFLAKE ON AZURE Polita Paulus > DATA MODELING IN THE AGE OF JSON AND SCHEMA-ON-READ Kent Graziano > HOW TO MANAGE GDPR COMPLIANCE WITH SNOWFLAKE’S TIME TRAVEL AND DISASTER RECOVERY Kent Graziano > DATA ENCRYPTION WITH CUSTOMER-MANAGED KEYS Martin Hentschel