GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud

Proprietary + Confidential
Data Science in Cloud
Quick Tour
Priyanka Vergadia
Staff Developer Advocate
Google Cloud
Twitter: @pvergadia
Proprietary + Confidential
1
2
3
4
Google Cloud Orientation
Data Science
Data Analytics
MLOps
Flow
Some Google Cloud Tools - BigQuery,
BigQuery ML and Vertex AI
5
Wrap up
6
@pvergadia
01
Data Science
Orientation
Things I don’t
want to think
about...
1. Provisioning hardware
2. Installing software
3. Upgrading operating systems
4. Security patching
5. System and network admin
6. Scaling up/down
7. Paying for stuff I don’t use
8. Dealing with failures
9. Managing clusters
Things I want
to think
about...
1. Solving my problem
Getting things done using someone else’s computers, especially
where someone else worries about maintenance, provisioning, system
administration, security, networking, failure recover, etc.
02
Data Science
6 steps
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
03
Data Analysis
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
04
ML Ops
Proprietary + Confidential
The real problems with a
ML system will be found
while you are continuously
operating it for the long term”
Launching is easy,
Operating is hard.
pixabay.com
Developing the model
is just the beginning...
Modeling Code
…a product requires so much more
Configuration
Data Collection
Data
Verification
Feature Extraction Process Management
Tools
Analysis Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
ML Code
Proprietary + Confidential
Why do things become harder in production?
(an incomplete list)
● data cleaning and processing is hard at scale
● scaling out training and serving; infrastructure issues
● tracking, monitoring, and reproducibility requirements
○ model or data drift
○ training/serving skew
● access control issues, security requirements
● (and lots more)
The level of automation defines the maturity of the
ML process
Level 0
Build and
deploy manually
Level 1
Automate
the training phase
Level 2
Automate training,
validation, and deployment
Production ML Experience
“On any given day there are thousands of TFX
pipelines running, which are processing exabytes
of data and producing tens of thousands of
models, which in turn are performing hundreds
of millions of inferences per second.“
BigQuery & BQML
05
BigQuery & BQML
Google BigQuery
Data warehouse with customers ranging from TB to 100+ PB
Insights for everyone
Cloud-scale enterprise
data warehouse
Unique
Serverless platform
Standard SQL(ANSI 2011)
with DML Support
Encrypted, durable,
highly available Unique
Built-in ML Unique
Real-time insights Unique
In ~15s:
● Read 2TB:
○ ~1k disks
● Run 50B regexps:
○ ~3k cores
Train and deploy ML models in
SQL
BigQuery ML
Execute ML workflows without
moving data from BigQuery
Automate common ML tasks
Built-in infrastructure
management, security &
compliance
BigQuery ML supported models and features
The data analyst’s onramp to AI and ML
Classification
Logistic regression
DNN classifier (TensorFlow)
XGBoost
Regression
Other Models
k-means clustering
Time series forecasting
Model ops and
explainability
Import/export TensorFlow models for
batch and online prediction
NDA
AutoML Tables
Linear regression
DNN regressor (TensorFlow)
XGBoost
AutoML Tables
Recommendation: Matrix factorization
NDA
Time series anomaly detectionPreview Q2’21,
GA H2’21
Hyperparameter tuning using Cloud AI
VizierPreview H1’21, GA H2’21
Model explainability using Cloud
AIPreview H1’21, GA H2’21
Managed Kubernetes and TFX
pipelinesPreview H2’21, GA 2022
List models for comparison and online
deployment in Cloud AIPreview H2’21, GA 2022
Model versioning, continuous
monitoringfuture
Wide and Deep NNsPreview, GA H1’21
Wide and Deep NNsPreview, GA H1’21
Autoencoders
06
Vertex AI
Vertex AI is a
managed ML platform
to speed the rate of
experimentation and accelerate
deployment of AI models.
The End-To-End ML Journey through Vertex AI
Where can I find
training data?
Feature Store
Datasets
Where do I start with
model experiments?
Workbench
How can I track
the results of
experiments?
Experiments
How can I train at scale?
Training
How do I deploy?
Endpoints
And for production?
Monitoring
Pipelines
07
Learning Resources
Proprietary + Confidential
● Introduction to Data Science blog:
https://goo.gle/dsintro
● Getting started docs:
cloud.google.com/vertex-ai/docs
● Get started in Cloud Console:
console.cloud.google.com/ai/platform
● Best practices:
cloud.google.com/architecture/ml-on-gcp-best-practices
Learn more
goo.gle/bqml-use-cases
BQML design patterns
https://github.com/priyankavergadia/google-cloud-4-words
Thank you!
Twitter, LinkedIn: @pvergadia
Proprietary + Confidential
What is Dataplex?
NDA
BigQuery
Dataplex
Data Lifecycle Mgmt
(Ingest, discover, prep, monitor, serve, archive)
Logical data organization
Unified Security and Governance
Unified Metadata with auto-discovery
Dataproc AI Platform
Data
Studio
Structured Streaming Data*
Semi-Structured Unstructured
GCP On-premises*
Multi-Cloud*
Dataflow
Storage
Built for distributed data
Logically unify and organize your data without any data
movement.
Intelligent Data Management
Automatic data discovery, metadata harvesting,
lifecycle management, and data quality with built-in
AI-driven intelligence.
Centralized Security & Governance
Central policy management, monitoring and auditing for
data authorization, retention, and classification.
Data Classification and Data Quality
Data
Intelligence
Analytics
*future capabilities
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
Proprietary + Confidential
Data Science On Google Cloud
A Guided Tour
Polong Lin & Marc Cohen
Developer Relations Engineers
Google Cloud
Slides: mco.fyi/ds
Lab
mco.fyi/mllab
or
mco.fyi/forecast
Feature Store: Data Model
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
Photo by Martin Olsen on
Complexity is a
barrier to adoption
HELLO CSECT The name of this program is 'HELLO'
* Register 15 points here on entry from OPSYS or caller.
STM 14,12,12(13) Save registers 14,15, and 0 thru 12 in caller's Save area
LR 12,15 Set up base register with program's entry point address
USING HELLO,12 Tell assembler which register we are using for pgm. base
LA 15,SAVE Now Point at our own save area
ST 15,8(13) Set forward chain
ST 13,4(15) Set back chain
LR 13,15 Set R13 to address of new save area
* -end of housekeeping (similar for most programs) -
WTO 'Hello World' Write To Operator (Operating System macro)
*
L 13,4(13) restore address to caller-provided save area
XC 8(4,13),8(13) Clear forward chain
LM 14,12,12(13) Restore registers as on entry
DROP 12 The opposite of 'USING'
SR 15,15 Set register 15 to 0 so that the return code (R15) is Zero
BR 14 Return to caller
*
SAVE DS 18F Define 18 fullwords to save calling program registers
END HELLO This is the end of the program
class HelloWorld
{
public static void main(String args[])
{
System.out.println("Hello, World");
}
}
print('Hello World')
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
Continuous Training for Production ML in the TFX Platform. OpML (2019).
Slice Finder: Automated Data Slicing for Model Validation. ICDE (2019).
Data Validation for Machine Learning. SysML (2019).
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD (2017).
Data Management Challenges in Production Machine Learning. SIGMOD (2017).
Rules of Machine Learning: Best Practices for ML Engineering. Google AI Web (2017).
Machine Learning: The High Interest Credit Card of Technical Debt. NeurIPS (2015).
Hidden Technical Debt in Machine Learning Systems. NIPS (2015).
Production ML Research
Serving and
Monitoring
Continuous
Training
Experimentation/
Development
Code
Repository
Training Pipeline
CI/CD
Code and
configurations
Artifact
Repository
Pipeline
artifacts
Model
Registry
Model Deployment
CI/CD
Serving
Infrastructure
Trained
model
Model
deployment
ML Metadata
Logs
Serving
logs
Putting it all together
End-to-end view
Code & Config
Training pipeline
Registered model
Deployed model
Serving logs
Focus of today
Vertex
Feature Store
Vertex Training and
Pipelines
Vertex Model
Monitoring
Vertex
Workbench
Cloud Build
Vertex ML Metadata
Vertex Endpoints
and Prediction
Feature Store in one picture
Our Solution
Feature
Store
Online Store
Feature
Management API
Batch
Ingestion API
Stream
Ingestion API
Feature Discovery
API
Online Serving API
Batch Serving API
Cache
Online Prediction
Model Training
Batch Feature
Engineering
Streaming Feature
Engineering
Data Lake
(BQ, GCS)
Kafka/Pubsub
Point-in-time
lookups
Registry
Feature Monitoring
Offline Store
How does the new SDK fits in the picture?
Our Solution
Feature
Store
Online Store
Feature
Management API
Batch
Ingestion API
Stream
Ingestion API
Feature Discovery
API
Online Serving API
Batch Serving API
Cache
Online Prediction
Model Training
Batch Feature
Engineering
Streaming Feature
Engineering
Data Lake
(BQ, GCS)
Kafka/Pubsub
Point-in-time
lookups
Registry
Feature Monitoring
Offline Store
Vertex AI SDK
Data engineer ML engineer
Data scientist
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
Proprietary + Confidential
Scalable training and serving on Vertex AI
Train with
Data
Analyst
ML
Developer
Data
Scientist
Use when Serve with
Vertex
Training
• Your problem doesn’t match the criteria
listed below for BigQuery ML or AutoML.
• You’re already running training on-premises or
another cloud, and you need consistency across
the platforms.
Vertex
Prediction
AutoML
• Your problem fits into one of the types AutoML
supports. Offers a point-and-click workflow.
• Natural Language or Video models are served from
Google Cloud. While Vision and Tables support
edge / downloadable models.
BigQuery ML
• All your data is contained in BigQuery.
• Users are most comfortable with SQL.
• The set of models available in BigQuery ML
matches the problem you’re trying to solve.
Train with Use when Serve with
Data
Analyst
ML
Developer
Data
Scientist
Model deployment &
management (MLOps)
Explainable AI
Model development and
data science
BigQuery ML Roadmap for 2021
H1’21 H2’21
TF Wide and Deep NNs Preview
Autoencoders Preview
PCA Preview
P-values for linear models Preview
Hyperparameter Tuning Preview
Anomaly Detection Preview
AutoML Tables GA
NDA
TF Wide and Deep NNs GA
Autoencoders GA
PCA GA
P-values for linear models GA
Hyperparameter Tuning GA
Anomaly Detection GA
Multivariate Time Series (AutoML) Preview
Model Registry Preview
Managed Pipelines Preview
Explainable AI Preview
Preparing the training data
Mix of demographic & behavioural data
Each row
is a
different
user
Preparing the training data
Each row
is a
different
user
Mix of demographic & behavioural data
Goal is to
create
cluster
labels
3
2
3
2
1
Proprietary + Confidential
Developer Days
SELECT
* EXCEPT(userId)
FROM
mydataset.train
Build and train with
CREATE MODEL
Proprietary + Confidential
Developer Days
CREATE OR REPLACE MODEL
mydataset.kmeans_3
OPTIONS(
model_type='KMEANS',
kmeans_init_method = 'KMEANS++',
num_clusters=3
)
SELECT
* EXCEPT(userId)
FROM
mydataset.train
Build and train with
CREATE MODEL
Proprietary + Confidential
Developer Days
ML.PREDICT results
Proprietary + Confidential
Developer Days
Compute cluster labels
using ML.PREDICT
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.kmeans_3,
(
SELECT
*
FROM
mydataset.train ))
Proprietary + Confidential
Developer Days
Inspecting the clusters "Evaluation" tab on the BigQuery UI
Proprietary + Confidential
Developer Days
Inspecting the clusters
Anomaly detection with k-means
Fraud detection
Each row is a transaction
Which rows are anomalies?
CREATE MODEL - k-means clustering
#Query for model training
CREATE MODEL demo.kmeans_model
OPTIONS(
model_type='kmeans',
num_clusters= 8,
kmeans_init_method = 'kmeans++'
)
AS
SELECT * EXCEPT(Time, Class)
FROM
bigquery-public-data.ml_datasets.ulb_fraud_detection;
ML.DETECT_ANOMALIES with k-means clustering
#Query for creating anomaly detection results
SELECT
*
FROM
ML.DETECT_ANOMALIES(
MODEL demo.kmeans_model,
STRUCT(0.005 AS contamination),
TABLE bigquery-public-data.ml_datasets.ulb_fraud_detection
);
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud
Blogpost
https://cloud.google.com/blog/prod
ucts/data-analytics/bigquery-ml-unsu
pervised-anomaly-detection
Docs
https://cloud.google.com/bigquery-
ml/docs/reference/standard-sql/bigq
ueryml-syntax-detect-anomalies
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Don't need to be an expert in HPs
Save time from manually training
models with different HPs
Easy to use
CREATE MODEL
mydataset.my_logreg_model
OPTIONS(
model_type="logistic_reg",
input_label_cols=["mylabel"],
num_trials=20
) AS
SELECT
*
FROM
mydataset.my_training_data
Hyperparameter tuning with BigQuery ML
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Uses Vertex Vizier under-the-hood
Save time from manually training
models with different HPs
Easy to use
Inspect the trials info
2
SELECT
*
FROM
ML.TRIAL_INFO(MODEL mydataset.my_logreg_model)
Even while it's
still training!
Hyperparameter tuning with BigQuery ML
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Uses Vertex Vizier under-the-hood
Save time from manually training
models with different HPs
Easy to use
Inspect the trials info
2
Evaluate your model
3
Hyperparameter tuning with BigQuery ML
Automated HP tuning
Have BigQuery ML automatically
search for the optimal
hyperparameters
Preview
Select number of trials
1
Uses Vertex Vizier under-the-hood
Save time from manually training
models with different HPs
Easy to use
Inspect the trials info
2
Evaluate your model
3
Predict!
4
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.my_logreg_model)
Hyperparameter tuning with BigQuery ML
How to import TensorFlow models to do
batch predictions in BigQuery
using BigQuery ML
Importing TensorFlow models into BigQuery
CREATE MODEL
PREDICT
https://towardsdatascience.com/how-to-do-batch-predictions-of-tensorflow-models-directly-in-bigquery-ffa843ebdba6
https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
TensorFlow Hub - tfhub.dev
Question:
Can we do text
similarity based on
embeddings?
# The following are example embedding outputs of 20 dimensions per sentence
# Embedding for: The quick brown fox jumps over the lazy dog.
# [0.0560572519898, 0.0534118898213, -0.0112254749984, ...]
# Embedding for: I am a sentence for which I would like to get its embedding.
# [-0.0343746766448, -0.0529498048127, 0.0469399243593, ...]
Text similarity using an imported Tensorflow model
https://towardsdatascience.com/how-to-do-text-similarity-search-and-document-clustering-in-bigquery-75eb8f45ab65
Goal:
I want to search for
comments similar to:
"power line down on a home"
Step 1: Save the TensorFlow model to GCS
CREATE OR REPLACE MODEL
mydataset.swivel_text_embed
OPTIONS(
model_type='tensorflow',
model_path='gs://BUCKET/swivel/*')
Step 2: CREATE MODEL using the GCS folder path
Step 3: Use ML.PREDICT to get comment embeddings
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.swivel_text_embed,
(SELECT
comments AS sentences
FROM
mydataset.mydata) );
Step 3: Use ML.PREDICT to get comment embeddings
SELECT
*
FROM
ML.PREDICT(MODEL mydataset.swivel_text_embed,
(SELECT
comments AS sentences
FROM
mydataset.mydata) );
Text converted into an
embedding of 20
floating points
Step 4: Calculate distance between embeddings to
compute text similarity
Input search term:
"power line down on a home"
Top 15 most similar comments to input
Exporting BQML models for use with Vertex
Model trained with BigQuery ML Vertex Pipelines
Export to Cloud Storage
https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/tree/master/retail/recommendation-system/bqml-mlops
Proprietary + Confidential
Data
Labeling
AutoML
DL Environment (DL VM + DL Container)
Prediction
Feature
Store
Training
Experiments
Data
Readiness
Feature
Engineering
Training/
HP-Tuning
Model
Monitoring
Model
serving
Understanding/
Tuning
Edge
Model
Management
Notebooks
Pipelines (Orchestration)
Explainable
AI
Hybrid AI
Continuous
Monitoring
Metadata
Vision Translation Tables
Language
Video
AI
Accelerators
Vizier
Optimization
Datasets
What’s included in Vertex AI? NDA
Proprietary + Confidential
Vertex Pipelines: Key capabilities
Python SDKs
Data Scientist friendly
Python SDKs
Serverless and
Scalable
Run as many pipelines
on as much data as you
want.
Metadata and lineage
Store metadata for
every artifact produced
by the pipeline.
Monitoring UIs
and APIs
Track and debug
pipelines executions
Security
Supports Cloud IAM,
VPC-SC, and CMEK.
Cost-effective
Only pay for the pipelines
you run and the
resources they use
Proprietary + Confidential
Proprietary + Confidential
Proprietary + Confidential
Proprietary + Confidential
Conditional triggers
Proprietary + Confidential
Logging metrics
Proprietary + Confidential
Experimentation management with Vertex Pipelines
Iterative Experimentation
Data
Prep
Development
datasets / Features
Source
Repository
Feature
Eng
Model
Training
Model
Eval
Experiment Tracking
Training Pipeline
Automation
Parameters, metrics, artifacts
Training
Pipeline
Source Code
Proprietary + Confidential
Continuous Training with Vertex Pipelines
Orchestrated Training Pipeline
Data
Extraction
Development
datasets / Features
Model Registry &
Artifact Store
Data
Valid.
Data
Prep.
Model
Training
Training Pipeline Metadata
Trained
Model
Model
Eval.
Model
Valid.
Training Pipeline CI/CD
Training Pipeline Source Code
Evaluate and Understand Models
Tabular Text
What-If Tool (WIT)
Visually probe the behavior of trained machine
learning models, with minimal coding
Language Interpretability Tool (LIT)
Open-source platform for visualization and
understanding of NLP models.
A canonical ML workflow
Experimentation (Re) Training Model Deployment
Continuous Model
Monitoring
Training Serving
1 2 3 4
EDA /
Prototyping
Training
pipeline dev
Pipeline
CI/CD
Candidate
Model generation
Model
Serving
Canary & A/B
Testing
Model performance monitoring
Retrain Triggers
Data
Validation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Registry
Model Cards
& Reporting
Model
Provenance
Compliance
Model Management & Governance
Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012
computational cost
Accuracy
(precision
@1)
accuracy
AutoML outperforms handcrafted models
92
https://cloud.google.com/architecture/ml-on-gcp-best-practices
Proprietary + Confidential
Three Modalities
of Google Cloud
1. Cloud Console
2. Command Line
3. APIs
1 de 96

Recomendados

Databricks Overview for MLOps por
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
1.5K visualizações30 slides
Machine Learning Operations & Azure por
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & AzureErlangen Artificial Intelligence & Machine Learning Meetup
568 visualizações23 slides
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud por
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
1.2K visualizações50 slides
Getting started with BigQuery por
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQueryPradeep Bhadani
249 visualizações35 slides
Modernizing to a Cloud Data Architecture por
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
651 visualizações22 slides
Introdution to Dataops and AIOps (or MLOps) por
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
1.2K visualizações47 slides

Mais conteúdo relacionado

Mais procurados

Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec... por
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Edureka!
591 visualizações30 slides
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin... por
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Naoki (Neo) SATO
794 visualizações43 slides
Getting Started with BigQuery ML por
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
166 visualizações32 slides
Emerging Trends in Data Architecture – What’s the Next Big Thing por
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
489 visualizações21 slides
Data Architecture, Solution Architecture, Platform Architecture — What’s the ... por
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
1.3K visualizações26 slides
Technology Trend Roadmap.pdf por
Technology Trend Roadmap.pdfTechnology Trend Roadmap.pdf
Technology Trend Roadmap.pdfssuser4522cc
292 visualizações51 slides

Mais procurados(20)

Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec... por Edureka!
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Edureka!591 visualizações
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin... por Naoki (Neo) SATO
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Naoki (Neo) SATO794 visualizações
Getting Started with BigQuery ML por Dan Sullivan, Ph.D.
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
Dan Sullivan, Ph.D.166 visualizações
Emerging Trends in Data Architecture – What’s the Next Big Thing por DATAVERSITY
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
DATAVERSITY489 visualizações
Data Architecture, Solution Architecture, Platform Architecture — What’s the ... por DATAVERSITY
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY1.3K visualizações
Technology Trend Roadmap.pdf por ssuser4522cc
Technology Trend Roadmap.pdfTechnology Trend Roadmap.pdf
Technology Trend Roadmap.pdf
ssuser4522cc292 visualizações
MLOps Using MLflow por Databricks
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks2K visualizações
MLOps - The Assembly Line of ML por Jordan Birdsell
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
Jordan Birdsell1.7K visualizações
2023 Trends in Enterprise Analytics por DATAVERSITY
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY333 visualizações
Foundry technical intro por esseemme69
Foundry technical introFoundry technical intro
Foundry technical intro
esseemme69488 visualizações
Emerging Trends in Data Architecture – What’s the Next Big Thing? por DATAVERSITY
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY545 visualizações
Future of Data and AI in Retail - NRF 2023 por Rob Saker
Future of Data and AI in Retail - NRF 2023Future of Data and AI in Retail - NRF 2023
Future of Data and AI in Retail - NRF 2023
Rob Saker641 visualizações
DevOps + DataOps = Digital Transformation por Delphix
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation
Delphix539 visualizações
Palantir Company Preso por Jacob Flom
Palantir Company PresoPalantir Company Preso
Palantir Company Preso
Jacob Flom777 visualizações
The Importance of Metadata por DATAVERSITY
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
DATAVERSITY435 visualizações
Introduction to MLflow por Databricks
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks9.7K visualizações
What’s New with Databricks Machine Learning por Databricks
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks401 visualizações
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop por Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K visualizações
Improving Data Literacy Around Data Architecture por DATAVERSITY
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY972 visualizações
Visualizing Google Cloud 101 Illustrated References for Cloud Engineers and A... por GustavoMaciel67
Visualizing Google Cloud 101 Illustrated References for Cloud Engineers and A...Visualizing Google Cloud 101 Illustrated References for Cloud Engineers and A...
Visualizing Google Cloud 101 Illustrated References for Cloud Engineers and A...
GustavoMaciel673K visualizações

Similar a GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud

Serverless machine learning architectures at Helixa por
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
259 visualizações68 slides
Infrastructure Agnostic Machine Learning Workload Deployment por
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
347 visualizações38 slides
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod... por
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
904 visualizações33 slides
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian) por
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
61 visualizações33 slides
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz... por
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
315 visualizações40 slides
EPAM ML/AI Accelerator - ODAHU por
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUDmitrii Suslov
43 visualizações16 slides

Similar a GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud(20)

Serverless machine learning architectures at Helixa por Data Science Milan
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan259 visualizações
Infrastructure Agnostic Machine Learning Workload Deployment por Databricks
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks347 visualizações
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod... por All Things Open
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
All Things Open904 visualizações
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian) por dtz001
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
dtz00161 visualizações
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz... por HostedbyConfluent
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent315 visualizações
EPAM ML/AI Accelerator - ODAHU por Dmitrii Suslov
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
Dmitrii Suslov43 visualizações
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat... por UA DevOps Conference
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
UA DevOps Conference95 visualizações
Paige Roberts: Shortcut MLOps with In-Database Machine Learning por Edunomica
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica125 visualizações
Peek into Neo4j Product Strategy and Roadmap por Neo4j
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j87 visualizações
Processing Large Datasets for ADAS Applications using Apache Spark por Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks513 visualizações
BigQuery ML - Machine learning at scale using SQL por Márton Kodok
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
Márton Kodok305 visualizações
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli... por ScyllaDB
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
ScyllaDB1.1K visualizações
.Net development with Azure Machine Learning (AzureML) Nov 2014 por Mark Tabladillo
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo1.3K visualizações
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019 por GoDataDriven
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven978 visualizações
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh por IanFurlong4
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4616 visualizações
Feature Store as a Data Foundation for Machine Learning por Provectus
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus349 visualizações
Neo4j Vision and Roadmap por Neo4j
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
Neo4j39 visualizações
DICE & Cloudify – Quality Big Data Made Easy por Cloudify Community
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
Cloudify Community412 visualizações
DevOps for DataScience por Stepan Pushkarev
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev1.7K visualizações
Solving enterprise challenges through scale out storage & big compute final por Avere Systems
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
Avere Systems578 visualizações

Mais de James Anderson

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... por
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
142 visualizações32 slides
GDG SLK - Why should devs care about container security.pdf por
GDG SLK - Why should devs care about container security.pdfGDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdfJames Anderson
140 visualizações21 slides
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... por
 GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...James Anderson
223 visualizações17 slides
A3 - AR Code Planetarium CST.pdf por
A3 - AR Code Planetarium CST.pdfA3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdfJames Anderson
11 visualizações11 slides
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V... por
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...James Anderson
100 visualizações43 slides
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models por
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsGDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsJames Anderson
137 visualizações22 slides

Mais de James Anderson(19)

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... por James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson142 visualizações
GDG SLK - Why should devs care about container security.pdf por James Anderson
GDG SLK - Why should devs care about container security.pdfGDG SLK - Why should devs care about container security.pdf
GDG SLK - Why should devs care about container security.pdf
James Anderson140 visualizações
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... por James Anderson
 GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ... GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
GDG Cloud Southlake #25: Jacek Ostrowski & David Browne: Sabre's Journey to ...
James Anderson223 visualizações
A3 - AR Code Planetarium CST.pdf por James Anderson
A3 - AR Code Planetarium CST.pdfA3 - AR Code Planetarium CST.pdf
A3 - AR Code Planetarium CST.pdf
James Anderson11 visualizações
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V... por James Anderson
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
GDG Cloud Southlake #24: Arty Starr: Enabling Powerful Software Insights by V...
James Anderson100 visualizações
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models por James Anderson
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language ModelsGDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
GDG Cloud Southlake #23:Ralph Lloren: Social Engineering Large Language Models
James Anderson137 visualizações
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C... por James Anderson
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
GDG Cloud Southlake #21:Alexander Snegovoy: Master Continuous Resiliency in C...
James Anderson43 visualizações
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ... por James Anderson
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
James Anderson279 visualizações
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B... por James Anderson
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
GDG Cloud Southlake #19: Sullivan and Schuh: Design Thinking Primer: How to B...
James Anderson358 visualizações
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone por James Anderson
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
James Anderson268 visualizações
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A... por James Anderson
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
GDG Cloud Southlake #15: Mihir Mistry: Cybersecurity and Data Privacy in an A...
James Anderson124 visualizações
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ... por James Anderson
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
GDG Cloud Southlake #14: Jonathan Schneider: OpenRewrite: Making your source ...
James Anderson240 visualizações
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries por James Anderson
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
James Anderson101 visualizações
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ... por James Anderson
GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...GDG Cloud Southlake #8  Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
James Anderson457 visualizações
GDG Cloud Southlake #6 Tammy Bryant Butow: Chaos Engineering The Road To Res... por James Anderson
 GDG Cloud Southlake #6 Tammy Bryant Butow: Chaos Engineering The Road To Res... GDG Cloud Southlake #6 Tammy Bryant Butow: Chaos Engineering The Road To Res...
GDG Cloud Southlake #6 Tammy Bryant Butow: Chaos Engineering The Road To Res...
James Anderson206 visualizações
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P... por James Anderson
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
James Anderson198 visualizações
GDG Cloud Southlake #4 Biodun Awojobi and Wade Walters Security Programs and ... por James Anderson
GDG Cloud Southlake #4 Biodun Awojobi and Wade Walters Security Programs and ...GDG Cloud Southlake #4 Biodun Awojobi and Wade Walters Security Programs and ...
GDG Cloud Southlake #4 Biodun Awojobi and Wade Walters Security Programs and ...
James Anderson219 visualizações
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice por James Anderson
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
James Anderson931 visualizações
GDG Cloud Southlake #2 Jez Humble DevOps Transformation:Building & Scaling H... por James Anderson
 GDG Cloud Southlake #2 Jez Humble DevOps Transformation:Building & Scaling H... GDG Cloud Southlake #2 Jez Humble DevOps Transformation:Building & Scaling H...
GDG Cloud Southlake #2 Jez Humble DevOps Transformation:Building & Scaling H...
James Anderson362 visualizações

Último

VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue por
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueShapeBlue
134 visualizações54 slides
The Power of Heat Decarbonisation Plans in the Built Environment por
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
67 visualizações20 slides
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... por
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
93 visualizações13 slides
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool por
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPoolShapeBlue
56 visualizações10 slides
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue por
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueShapeBlue
149 visualizações7 slides
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... por
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...ShapeBlue
74 visualizações17 slides

Último(20)

VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue por ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue134 visualizações
The Power of Heat Decarbonisation Plans in the Built Environment por IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE67 visualizações
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... por ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue93 visualizações
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool por ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue56 visualizações
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue por ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue149 visualizações
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... por ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue74 visualizações
"Surviving highload with Node.js", Andrii Shumada por Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays49 visualizações
Microsoft Power Platform.pptx por Uni Systems S.M.S.A.
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptx
Uni Systems S.M.S.A.74 visualizações
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... por ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue69 visualizações
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... por Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 visualizações
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT por ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue138 visualizações
Uni Systems for Power Platform.pptx por Uni Systems S.M.S.A.
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptx
Uni Systems S.M.S.A.60 visualizações
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... por ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue59 visualizações
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T por ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue81 visualizações
Digital Personal Data Protection (DPDP) Practical Approach For CISOs por Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash103 visualizações
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive por Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Network Automation Forum49 visualizações
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... por The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Digital Insurer40 visualizações
Business Analyst Series 2023 - Week 4 Session 7 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10110 visualizações
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue por ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue63 visualizações

GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google Cloud

  • 1. Proprietary + Confidential Data Science in Cloud Quick Tour Priyanka Vergadia Staff Developer Advocate Google Cloud Twitter: @pvergadia
  • 2. Proprietary + Confidential 1 2 3 4 Google Cloud Orientation Data Science Data Analytics MLOps Flow Some Google Cloud Tools - BigQuery, BigQuery ML and Vertex AI 5 Wrap up 6 @pvergadia
  • 4. Things I don’t want to think about... 1. Provisioning hardware 2. Installing software 3. Upgrading operating systems 4. Security patching 5. System and network admin 6. Scaling up/down 7. Paying for stuff I don’t use 8. Dealing with failures 9. Managing clusters
  • 5. Things I want to think about... 1. Solving my problem
  • 6. Getting things done using someone else’s computers, especially where someone else worries about maintenance, provisioning, system administration, security, networking, failure recover, etc.
  • 13. Proprietary + Confidential The real problems with a ML system will be found while you are continuously operating it for the long term” Launching is easy, Operating is hard. pixabay.com
  • 14. Developing the model is just the beginning... Modeling Code
  • 15. …a product requires so much more Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring ML Code
  • 16. Proprietary + Confidential Why do things become harder in production? (an incomplete list) ● data cleaning and processing is hard at scale ● scaling out training and serving; infrastructure issues ● tracking, monitoring, and reproducibility requirements ○ model or data drift ○ training/serving skew ● access control issues, security requirements ● (and lots more)
  • 17. The level of automation defines the maturity of the ML process Level 0 Build and deploy manually Level 1 Automate the training phase Level 2 Automate training, validation, and deployment
  • 19. “On any given day there are thousands of TFX pipelines running, which are processing exabytes of data and producing tens of thousands of models, which in turn are performing hundreds of millions of inferences per second.“
  • 22. Google BigQuery Data warehouse with customers ranging from TB to 100+ PB Insights for everyone Cloud-scale enterprise data warehouse Unique Serverless platform Standard SQL(ANSI 2011) with DML Support Encrypted, durable, highly available Unique Built-in ML Unique Real-time insights Unique
  • 23. In ~15s: ● Read 2TB: ○ ~1k disks ● Run 50B regexps: ○ ~3k cores
  • 24. Train and deploy ML models in SQL BigQuery ML Execute ML workflows without moving data from BigQuery Automate common ML tasks Built-in infrastructure management, security & compliance
  • 25. BigQuery ML supported models and features The data analyst’s onramp to AI and ML Classification Logistic regression DNN classifier (TensorFlow) XGBoost Regression Other Models k-means clustering Time series forecasting Model ops and explainability Import/export TensorFlow models for batch and online prediction NDA AutoML Tables Linear regression DNN regressor (TensorFlow) XGBoost AutoML Tables Recommendation: Matrix factorization NDA Time series anomaly detectionPreview Q2’21, GA H2’21 Hyperparameter tuning using Cloud AI VizierPreview H1’21, GA H2’21 Model explainability using Cloud AIPreview H1’21, GA H2’21 Managed Kubernetes and TFX pipelinesPreview H2’21, GA 2022 List models for comparison and online deployment in Cloud AIPreview H2’21, GA 2022 Model versioning, continuous monitoringfuture Wide and Deep NNsPreview, GA H1’21 Wide and Deep NNsPreview, GA H1’21 Autoencoders
  • 27. Vertex AI is a managed ML platform to speed the rate of experimentation and accelerate deployment of AI models.
  • 28. The End-To-End ML Journey through Vertex AI Where can I find training data? Feature Store Datasets Where do I start with model experiments? Workbench How can I track the results of experiments? Experiments How can I train at scale? Training How do I deploy? Endpoints And for production? Monitoring Pipelines
  • 30. Proprietary + Confidential ● Introduction to Data Science blog: https://goo.gle/dsintro ● Getting started docs: cloud.google.com/vertex-ai/docs ● Get started in Cloud Console: console.cloud.google.com/ai/platform ● Best practices: cloud.google.com/architecture/ml-on-gcp-best-practices Learn more
  • 34. Proprietary + Confidential What is Dataplex? NDA BigQuery Dataplex Data Lifecycle Mgmt (Ingest, discover, prep, monitor, serve, archive) Logical data organization Unified Security and Governance Unified Metadata with auto-discovery Dataproc AI Platform Data Studio Structured Streaming Data* Semi-Structured Unstructured GCP On-premises* Multi-Cloud* Dataflow Storage Built for distributed data Logically unify and organize your data without any data movement. Intelligent Data Management Automatic data discovery, metadata harvesting, lifecycle management, and data quality with built-in AI-driven intelligence. Centralized Security & Governance Central policy management, monitoring and auditing for data authorization, retention, and classification. Data Classification and Data Quality Data Intelligence Analytics *future capabilities
  • 36. Proprietary + Confidential Data Science On Google Cloud A Guided Tour Polong Lin & Marc Cohen Developer Relations Engineers Google Cloud Slides: mco.fyi/ds
  • 40. Photo by Martin Olsen on Complexity is a barrier to adoption
  • 41. HELLO CSECT The name of this program is 'HELLO' * Register 15 points here on entry from OPSYS or caller. STM 14,12,12(13) Save registers 14,15, and 0 thru 12 in caller's Save area LR 12,15 Set up base register with program's entry point address USING HELLO,12 Tell assembler which register we are using for pgm. base LA 15,SAVE Now Point at our own save area ST 15,8(13) Set forward chain ST 13,4(15) Set back chain LR 13,15 Set R13 to address of new save area * -end of housekeeping (similar for most programs) - WTO 'Hello World' Write To Operator (Operating System macro) * L 13,4(13) restore address to caller-provided save area XC 8(4,13),8(13) Clear forward chain LM 14,12,12(13) Restore registers as on entry DROP 12 The opposite of 'USING' SR 15,15 Set register 15 to 0 so that the return code (R15) is Zero BR 14 Return to caller * SAVE DS 18F Define 18 fullwords to save calling program registers END HELLO This is the end of the program
  • 42. class HelloWorld { public static void main(String args[]) { System.out.println("Hello, World"); } }
  • 45. Continuous Training for Production ML in the TFX Platform. OpML (2019). Slice Finder: Automated Data Slicing for Model Validation. ICDE (2019). Data Validation for Machine Learning. SysML (2019). TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD (2017). Data Management Challenges in Production Machine Learning. SIGMOD (2017). Rules of Machine Learning: Best Practices for ML Engineering. Google AI Web (2017). Machine Learning: The High Interest Credit Card of Technical Debt. NeurIPS (2015). Hidden Technical Debt in Machine Learning Systems. NIPS (2015). Production ML Research
  • 46. Serving and Monitoring Continuous Training Experimentation/ Development Code Repository Training Pipeline CI/CD Code and configurations Artifact Repository Pipeline artifacts Model Registry Model Deployment CI/CD Serving Infrastructure Trained model Model deployment ML Metadata Logs Serving logs Putting it all together End-to-end view
  • 47. Code & Config Training pipeline Registered model Deployed model Serving logs Focus of today Vertex Feature Store Vertex Training and Pipelines Vertex Model Monitoring Vertex Workbench Cloud Build Vertex ML Metadata Vertex Endpoints and Prediction
  • 48. Feature Store in one picture Our Solution Feature Store Online Store Feature Management API Batch Ingestion API Stream Ingestion API Feature Discovery API Online Serving API Batch Serving API Cache Online Prediction Model Training Batch Feature Engineering Streaming Feature Engineering Data Lake (BQ, GCS) Kafka/Pubsub Point-in-time lookups Registry Feature Monitoring Offline Store
  • 49. How does the new SDK fits in the picture? Our Solution Feature Store Online Store Feature Management API Batch Ingestion API Stream Ingestion API Feature Discovery API Online Serving API Batch Serving API Cache Online Prediction Model Training Batch Feature Engineering Streaming Feature Engineering Data Lake (BQ, GCS) Kafka/Pubsub Point-in-time lookups Registry Feature Monitoring Offline Store Vertex AI SDK Data engineer ML engineer Data scientist
  • 51. Proprietary + Confidential Scalable training and serving on Vertex AI Train with Data Analyst ML Developer Data Scientist Use when Serve with Vertex Training • Your problem doesn’t match the criteria listed below for BigQuery ML or AutoML. • You’re already running training on-premises or another cloud, and you need consistency across the platforms. Vertex Prediction AutoML • Your problem fits into one of the types AutoML supports. Offers a point-and-click workflow. • Natural Language or Video models are served from Google Cloud. While Vision and Tables support edge / downloadable models. BigQuery ML • All your data is contained in BigQuery. • Users are most comfortable with SQL. • The set of models available in BigQuery ML matches the problem you’re trying to solve. Train with Use when Serve with Data Analyst ML Developer Data Scientist
  • 52. Model deployment & management (MLOps) Explainable AI Model development and data science BigQuery ML Roadmap for 2021 H1’21 H2’21 TF Wide and Deep NNs Preview Autoencoders Preview PCA Preview P-values for linear models Preview Hyperparameter Tuning Preview Anomaly Detection Preview AutoML Tables GA NDA TF Wide and Deep NNs GA Autoencoders GA PCA GA P-values for linear models GA Hyperparameter Tuning GA Anomaly Detection GA Multivariate Time Series (AutoML) Preview Model Registry Preview Managed Pipelines Preview Explainable AI Preview
  • 53. Preparing the training data Mix of demographic & behavioural data Each row is a different user
  • 54. Preparing the training data Each row is a different user Mix of demographic & behavioural data Goal is to create cluster labels 3 2 3 2 1
  • 55. Proprietary + Confidential Developer Days SELECT * EXCEPT(userId) FROM mydataset.train Build and train with CREATE MODEL
  • 56. Proprietary + Confidential Developer Days CREATE OR REPLACE MODEL mydataset.kmeans_3 OPTIONS( model_type='KMEANS', kmeans_init_method = 'KMEANS++', num_clusters=3 ) SELECT * EXCEPT(userId) FROM mydataset.train Build and train with CREATE MODEL
  • 57. Proprietary + Confidential Developer Days ML.PREDICT results
  • 58. Proprietary + Confidential Developer Days Compute cluster labels using ML.PREDICT SELECT * FROM ML.PREDICT(MODEL mydataset.kmeans_3, ( SELECT * FROM mydataset.train ))
  • 59. Proprietary + Confidential Developer Days Inspecting the clusters "Evaluation" tab on the BigQuery UI
  • 60. Proprietary + Confidential Developer Days Inspecting the clusters
  • 61. Anomaly detection with k-means Fraud detection Each row is a transaction Which rows are anomalies?
  • 62. CREATE MODEL - k-means clustering #Query for model training CREATE MODEL demo.kmeans_model OPTIONS( model_type='kmeans', num_clusters= 8, kmeans_init_method = 'kmeans++' ) AS SELECT * EXCEPT(Time, Class) FROM bigquery-public-data.ml_datasets.ulb_fraud_detection;
  • 63. ML.DETECT_ANOMALIES with k-means clustering #Query for creating anomaly detection results SELECT * FROM ML.DETECT_ANOMALIES( MODEL demo.kmeans_model, STRUCT(0.005 AS contamination), TABLE bigquery-public-data.ml_datasets.ulb_fraud_detection );
  • 66. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Don't need to be an expert in HPs Save time from manually training models with different HPs Easy to use CREATE MODEL mydataset.my_logreg_model OPTIONS( model_type="logistic_reg", input_label_cols=["mylabel"], num_trials=20 ) AS SELECT * FROM mydataset.my_training_data Hyperparameter tuning with BigQuery ML
  • 67. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Uses Vertex Vizier under-the-hood Save time from manually training models with different HPs Easy to use Inspect the trials info 2 SELECT * FROM ML.TRIAL_INFO(MODEL mydataset.my_logreg_model) Even while it's still training! Hyperparameter tuning with BigQuery ML
  • 68. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Uses Vertex Vizier under-the-hood Save time from manually training models with different HPs Easy to use Inspect the trials info 2 Evaluate your model 3 Hyperparameter tuning with BigQuery ML
  • 69. Automated HP tuning Have BigQuery ML automatically search for the optimal hyperparameters Preview Select number of trials 1 Uses Vertex Vizier under-the-hood Save time from manually training models with different HPs Easy to use Inspect the trials info 2 Evaluate your model 3 Predict! 4 SELECT * FROM ML.PREDICT(MODEL mydataset.my_logreg_model) Hyperparameter tuning with BigQuery ML
  • 70. How to import TensorFlow models to do batch predictions in BigQuery using BigQuery ML
  • 71. Importing TensorFlow models into BigQuery CREATE MODEL PREDICT https://towardsdatascience.com/how-to-do-batch-predictions-of-tensorflow-models-directly-in-bigquery-ffa843ebdba6 https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models
  • 72. TensorFlow Hub - tfhub.dev
  • 73. Question: Can we do text similarity based on embeddings? # The following are example embedding outputs of 20 dimensions per sentence # Embedding for: The quick brown fox jumps over the lazy dog. # [0.0560572519898, 0.0534118898213, -0.0112254749984, ...] # Embedding for: I am a sentence for which I would like to get its embedding. # [-0.0343746766448, -0.0529498048127, 0.0469399243593, ...]
  • 74. Text similarity using an imported Tensorflow model https://towardsdatascience.com/how-to-do-text-similarity-search-and-document-clustering-in-bigquery-75eb8f45ab65 Goal: I want to search for comments similar to: "power line down on a home"
  • 75. Step 1: Save the TensorFlow model to GCS CREATE OR REPLACE MODEL mydataset.swivel_text_embed OPTIONS( model_type='tensorflow', model_path='gs://BUCKET/swivel/*') Step 2: CREATE MODEL using the GCS folder path
  • 76. Step 3: Use ML.PREDICT to get comment embeddings SELECT * FROM ML.PREDICT(MODEL mydataset.swivel_text_embed, (SELECT comments AS sentences FROM mydataset.mydata) );
  • 77. Step 3: Use ML.PREDICT to get comment embeddings SELECT * FROM ML.PREDICT(MODEL mydataset.swivel_text_embed, (SELECT comments AS sentences FROM mydataset.mydata) ); Text converted into an embedding of 20 floating points
  • 78. Step 4: Calculate distance between embeddings to compute text similarity Input search term: "power line down on a home" Top 15 most similar comments to input
  • 79. Exporting BQML models for use with Vertex Model trained with BigQuery ML Vertex Pipelines Export to Cloud Storage https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/tree/master/retail/recommendation-system/bqml-mlops
  • 80. Proprietary + Confidential Data Labeling AutoML DL Environment (DL VM + DL Container) Prediction Feature Store Training Experiments Data Readiness Feature Engineering Training/ HP-Tuning Model Monitoring Model serving Understanding/ Tuning Edge Model Management Notebooks Pipelines (Orchestration) Explainable AI Hybrid AI Continuous Monitoring Metadata Vision Translation Tables Language Video AI Accelerators Vizier Optimization Datasets What’s included in Vertex AI? NDA
  • 81. Proprietary + Confidential Vertex Pipelines: Key capabilities Python SDKs Data Scientist friendly Python SDKs Serverless and Scalable Run as many pipelines on as much data as you want. Metadata and lineage Store metadata for every artifact produced by the pipeline. Monitoring UIs and APIs Track and debug pipelines executions Security Supports Cloud IAM, VPC-SC, and CMEK. Cost-effective Only pay for the pipelines you run and the resources they use
  • 87. Proprietary + Confidential Experimentation management with Vertex Pipelines Iterative Experimentation Data Prep Development datasets / Features Source Repository Feature Eng Model Training Model Eval Experiment Tracking Training Pipeline Automation Parameters, metrics, artifacts Training Pipeline Source Code
  • 88. Proprietary + Confidential Continuous Training with Vertex Pipelines Orchestrated Training Pipeline Data Extraction Development datasets / Features Model Registry & Artifact Store Data Valid. Data Prep. Model Training Training Pipeline Metadata Trained Model Model Eval. Model Valid. Training Pipeline CI/CD Training Pipeline Source Code
  • 89. Evaluate and Understand Models Tabular Text What-If Tool (WIT) Visually probe the behavior of trained machine learning models, with minimal coding Language Interpretability Tool (LIT) Open-source platform for visualization and understanding of NLP models.
  • 90. A canonical ML workflow Experimentation (Re) Training Model Deployment Continuous Model Monitoring Training Serving 1 2 3 4 EDA / Prototyping Training pipeline dev Pipeline CI/CD Candidate Model generation Model Serving Canary & A/B Testing Model performance monitoring Retrain Triggers Data Validation Feature Engineering Model Training Model Evaluation Model Registry Model Cards & Reporting Model Provenance Compliance Model Management & Governance
  • 91. Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. 2017, https://arxiv.org/abs/1707.07012 computational cost Accuracy (precision @1) accuracy AutoML outperforms handcrafted models
  • 93. Proprietary + Confidential Three Modalities of Google Cloud