SlideShare uma empresa Scribd logo
1 de 88
Baixar para ler offline
Getting Started
With Google Cloud
<Cloud OnBoard>
</Cloud OnBoard>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard <!Content>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
}
(’Module 1’) Introducing Google Cloud Platform Page 1 - 11
(’Module 2’) Compute & Storage Fundamentals Page 11 - 21
(’Module 3’) Data Analysis�on the Cloud Page 22 - 35
(’Module 4’) Scaling Data Analysis Page 35 - 49
(’Module 5’) Machine Learning Page 50 - 70
(’Module 6’) Data Processing Architecture Page 70 - 79
Summary | Continue learning with Google Cloud Page 79 - 85
1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Introducing
Google Cloud Platform:
Big Data and Machine Learning
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
What is Google Cloud Platform
Google Cloud Big Data products
Agenda
2
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud computing is a continuation of a long-term
shift in how computing resources are managed
Now
2000
1980s
Next
First Wave
Server on-premises
You own everything.
It is yours to manage.
Second Wave
Data centers
You pay for the hardware
but rent the space.
Still yours to manage.
First Generation
Cloud Virtualized
data centers
You don’t rent hardware and
space, but still control
and configure virtual
machines. Pay for what
you provision.
Third Wave
Managed service
Completely elastic storage,
processing, and machine
learning so that you can
invest your energy in great
apps. Pay for what you use.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
What is Google Cloud Platform
Google Cloud Big Data products
Agenda
3
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google’s mission is to organize
the world’s information and make
it universally accessible and
useful
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
To organize the world’s
information,Google has been
building the most powerful
infrastructure on the planet
4
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Edge node locations >1000
Edge points of presence >100
Network
Network sea cable investments
In terms of hardware, Google Cloud has the largest cloud network, with
over 100 points of presence, and 100,000s of miles of fiber optic cable.
Unity (US, JP) 2010
Monet (US, BR) 2017
Tannat (BR, UY, AR) 2017
Junior (Rio, Santos) 2017
FASTER (US, JP, TW) 2016
PLCN (HK, LA) 2019
Indigo (SG, ID, AU) 2019
SJC (JP, HK, SG) 2013
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Future region and
number of zones
The network connects 15 regions,
with 3 more coming
Current region and
number of zones
S Carolina
N Virginia
Oregon
Iowa
Montreal
Los Angeles
3
4
3
3
3
3
Frankfurt
Belgium
London
FinlandNetherlands 3
33
2
3
São Paulo
3
Taiwan
Singapore
Mumbai
Sydney
Tokyo
3
3
2
3
3
3
HongKong 3
2
3
5
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In terms of software, organizing the world’s information
has meant that Google needed to invent data processing methods
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
Bigtable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
Pub/Sub
F1
http://research.google.com/pubs/papers.html
TPU
2018
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud opens up that innovation and infrastructure to you
2002 2004 2006 2008 2010 2012 2014 2016
ML Engine
Pub/Sub
Dataflow
Datastore
Dataflow
Cloud Storage
BigQuery
Bigtable
Dataproc
Cloud Storage
2018
Auto ML
Cloud Spanner
6
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A suite of products that can be put together for data processing
Foundation Databases
Data-handling
frameworks
Analytics and ML
Cloud
Storage
Cloud
Bigtable
BigQuery
Cloud Dataproc
Compute
Engine
Cloud SQL
Cloud
Datalab Cloud Dataflow
Cloud Pub/Sub
Cloud
Spanner
ML APIs
...
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Spotify illustrates the typical journey of companies that come to
Google Cloud: From lower costs to increased reliability to business
transformation
2
3
1
Spend less
No-ops, Pay
for use, Secure
Flexible
Complete
Innovative
Powerful
7
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A suite of products that can be put together for data processing
Change how you computeChange where you compute
Improve scalability
and reliability
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Atomic Fiction lowered their costs with per-minute
(now per-second) billing
https://www.youtube.com/watch?v=mBY-RjE15WA
Change where you compute
8
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FIS was able to improve reliability and scalability
on a massive data-processing challenge
6 BILLION
MARKET EVENTS
WRITTEN PER HOUR
1.7 GIGs
PER SECOND
PER HOUR
6 TBs
10 BN
WRITTEN
PER HOUR
BURSTS
1.7 GIGABYTES
PER SECOND
10 TERABYTES
PER HOUR
The Consolidated Audit Trail (CAT) is a data repository of all equities and options
orders, quotes, and events; FIS processed the CAT to organize 100 billion market events
into an “order lifecycle” in a 4-hour window using Cloud Bigtable.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Rooms to Go transformed its business with data and machine learning
https://www.thinkwithgoogle.com/case-studies/rooms-to-go-improves-the-shopper-experience.html
BigQuery
Analyze
CRM
Customer Relationship Manager
customer demographics, past purchases
Rooms
to Go
Google Analytics
Premium
landing pages,
views
Combine
data
Collect
data
completely
designed room
packages
9
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In summary, Google Cloud offers you ways to…
Become a truly
data-driven
company
Apply machine
learning broadly
and easily
Spend less
on ops and
administration
Incorporate real-
time data into
apps and
architectures
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
10
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Platform is:
(select all of the correct options)
Cloud OnBoard
Module review
Operated by Google on the same
infrastructure it uses
A set of modular services from
which you can compose cloud-based
applications
Most cost-effective if you pre-
purchase instances on a yearly
basis
A platform on which to host
scalable and fast distributed
applications
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Platform is:
(select all of the correct options)
Cloud OnBoard
Module review
Operated by Google on the same
infrastructure it uses
A set of modular services from
which you can compose cloud-based
applications
Most cost-effective if you pre-
purchase instances on a yearly
basis
A platform on which to host
scalable and fast distributed
applications
11
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Resources
Google Cloud Platform https://cloud.google.com/
Datacenters https://www.google.com/about/datacenters/
Google IT security https://cloud.google.com/files/Google-
CommonSecurity-WhitePaper-v1.4.pdf
Why Google Cloud
Platform?
https://cloud.google.com/why-google/
Pricing Philosophy https://cloud.google.com/pricing/philosophy/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Compute & Storage
Fundamentals
Version #1.1
12
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
CPUs on demand + Demo
A global filesystem + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google Cloud provides an earth-scale computer
Data storage
Compute power
Networking
13
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Custom/changeable machine types, preemptible machines,
and automatic discounts lead to simplicity and agility
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Create a Compute Engine instance
14
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In this demo, we will :
1. Create a Compute Engine instance
2. SSH into the instance
3. Install the software package git
(for source code version control)
Cloud OnBoard
Demo : Create a Compute Engine Instance
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
CPUs on demand + Demo
A global filesystem + Demo
Agenda
15
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Use Cloud Storage for persistent storage and as staging
ground for import to other Google Cloud products
31
Compute
Engine + Disk
Cloud SQL
BigQuery
Dataproc
Cloud Storage
Store/StageTransform
Raw data (any format)
4
Ingest/ Extract
Load
2
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Create a bucket and copy the data over using the Cloud
SDK; blobs are referenced through a gs://.../ URL
sales*.csv gs://acme-sales/data/
Copy
Google Cloud Platform Project
Bucket
Objects
Data and
metadata
Bucket
Objects
Data and
metadata
gsutil cp
16
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud Storage gives you durability,
reliability, and global reach
Cloud
Storage
Cloud
SQL
Compute
Engine
Transfer Services
are useful for ingest
Ingest
Store
Import
Control access at project,
bucket and/or object level
Publish
Use Cloud Storage
as staging area
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Control latency and availability
with zones and regions
Choose the closest
zone/region so as
to to reduce latency.
Region: North America
Zone: us-central1-a
...
Region: Europe
Zone: europe-west1-b
...
Region: ...
Zone: ...
...
Distribute your apps and data across
regions for global availability.
Distribute your apps
and data across zones
to reduce service
disruptions.
17
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Interact with
Cloud Storage
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In this demo, we carry out the steps of an ingest-
transform-and-publish data pipeline manually
1. Ingest data into a Compute Engine instance
2. Transform data on the Compute Engine instance
3. Store the transformed data on Cloud Storage
4. Publish Cloud Storage data to the web
Cloud OnBoard
Demo : Interact with Cloud Storage
18
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Ingest-Transform-Publish
using core infrastructure
Cloud
Storage
Cloud
SQL
Compute
Engine
Step 1
Ingest/
Extract
Store
Import
Publish
Step 2 Step 3
Step 4
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud Shell gives you an easy command-line
Cloud Shell comes pre-installed with the tools, libraries,
and so on you need to interact with Google Cloud Platform
Click
Do Now
19
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Compute nodes on GCP are:
(select the correct option)
❏ Allocated on demand, and you pay for the time that they are up.
❏ Expensive to create and teardown
❏ Pre-installed with all the software packages you might ever need.
❏ One of ~50 choices in terms of CPU and memory
Cloud OnBoard
Module review (1 of 2)
20
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Compute nodes on GCP are:
(select the correct option)
➔ Allocated on demand, and you pay for the time that they are up.
❏ Expensive to create and teardown
❏ Pre-installed with all the software packages you might ever need.
❏ One of ~50 choices in terms of CPU and memory
Cloud OnBoard
Module review answers (1 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Storage is a good option for storing data that:
(select all of the correct options)
❏ Is ingested in real-time from sensors and other devices
❏ Will be frequently read/written from a compute node
❏ May be required to be read at some later time
❏ May be imported into a cluster for analysis
Cloud OnBoard
Module review (2 of 2)
21
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Storage is a good option for storing data that:
(select all of the correct options)
❏ Is ingested in real-time from sensors and other devices
❏ Will be frequently read/written from a compute node
➔ May be required to be read at some later time
➔ May be imported into a cluster for analysis
Cloud OnBoard
Module review (2 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Resources
Google Cloud Platform https://cloud.google.com/compute/
Datacenters https://cloud.google.com/storage/
Pricing https://cloud.google.com/pricing/
Cloud Launcher https://cloud.google.com/launcher/
Pricing Philosophy https://cloud.google.com/pricing/philosophy/
22
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Data Analysis
on the Cloud
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
23
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google Cloud Platform began in 2008, with App Engine,
a serverless way to run web applications
1
Develop
2
Upload
3
Autoscales Reliable
App Engine
Your code
http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html
http://googleappengine.blogspot.com/2013/05/the-google-app-engine-blog-is-moving.html
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Compute
Engine
Container
Engine
App Engine
Flex
App Engine
There [was] something fundamentally
wrong with what we were doing in 2008
… We didn't get the right stepping
stones into the cloud …
-- Eric Schmidt, Executive Chairman, Google
24
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GCP now consists of a suite of products that together provide these
stepping stones in a business’ transformative journey
Change how you computeChange where you compute
Flexibility, scalability
and reliability
Cost effective virtual machines,
storage, Hadoop, and MySQL to
migrate your current workloads to
the public cloud.
Reliable, autoscaling messaging,
data processing, and storage.
Fully managed products for data
warehousing, data analysis,
streaming, and machine learning.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Machine learning. This is the next
transformation … the programming
paradigm is changing. Instead of
programming a computer, you teach a
computer to learn something and it
does what you want.
Cloud OnBoard
Eric Schmidt,
Executive Chairman,
Google
25
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
“If you want to teach a neural network to
recognize a cat, for instance, you don’t
tell it to look for whiskers, ears, fur,
and eyes. You simply show it thousands
and thousands of photos of cats, and
eventually it works things out.”
WIRED’s headline
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Search
People who bought ...
Spam filtering
Suggest next video
Route planning
Smart Reply
Cloud OnBoard
Machine Learning is not new,
but it is now mainstream
What’s common to all of
these use cases of Machine
Learning?
?
26
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
There are three components in a recommendation system
RecommendingRating Training
Users rate a few houses
explicitly or implicitly
A machine learning model is
created to predict a user’s
rating of a house
For each user, the model is
applied to every unrated
house and the top 5 houses
for that user are saved.
What else is needed??
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The ML algorithm essentially clusters users and items
How often do you need to compute
the predicted ratings?
Where would you save them?
2
3
Who is like this user? Is this a good house?
Predict rating
Is this house similar to houses that
people similar to this user like?
Predicted rating = user-preference *
item-quality
1
?
27
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In addition to the ML algorithm, you also need
sophisticated data management
Data Collection
Data Analysis
Machine Learning
Serving
Scalable front end to collect customer actions
Data that is accessible and not silo-ed
(Re-)training and experimentation
Scalable, real-time system to serve
recommendations
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
28
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Choose your storage solution based on your access pattern
Capacity
Access
metaphor
Read
Write
Update
granularity
Usage
Petabytes +
Like files in a
file system
Have to copy to
local disk
One file
An object
(a “file”)
Store blobs
Gigabytes
Relational
database
SELECT rows
INSERT row
Field
No-ops SQL
database on
the cloud
Terabytes
Persistent
Hashmap
Filter objects
on property
put object
Attribute
Structured
data from
AppEngine apps
Petabytes
Key-value(s),
HBase API
scan rows
put row
Row
No-ops, high
throughput,
scalable,
flattened data
Petabytes
Relational
SELECT rows
Batch/stream
Field
Interactive SQL*
querying fully
managed warehouse
Cloud
Storage
Cloud SQL Datastore Bigtable BigQuery
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud SQL is a fully managed database service
Cloud SQL
Google-managed
MySQL or Postgres
Flexible pricing
Familiar
Managed backups
Automatic replication
Fast connection from GCE & GAE
Connect from anywhere
Google Security
29
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Set up rentals data
in Cloud SQL
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we populate rentals data in Cloud
SQL for the recommendation engine to use:
1. Create Cloud SQL instance
2. Create database tables by importing .sql
files from Cloud Storage
3. Populate the tables by importing .csv
files from Cloud Storage
4. Allow access to Cloud SQL
5. Explore the rentals data using SQL
statements from Cloud Shell
Demo: Setup rentals data in Cloud SQL
Cloud
Storage
Cloud SQL
External
machine
Import
30
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
There is a rich open-source ecosystem for big data
http://hadoop.apache.org/
http://pig.apache.org/
http://hive.apache.org/
http://spark.apache.org/
31
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataproc reduces the cost and complexity associated with
Spark and Hadoop clusters
Dataproc
Google-managed:
Hadoop
Pig
Hive
Spark
Image Versioning
Familiar
Resize in seconds
Automated cluster mgmt
Integrates with Google Cloud
Flexible VMs
Google Security
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Recommendations ML
with Dataproc
32
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we implement
machine learning recommendations
using Cloud Dataproc:
1. Launch Dataproc
2. Train and apply ML model
written in PySpark to create
product recommendations
3. Explore inserted rows in
Cloud SQL
Dataproc
Cloud SQL
Train
model
Show
recommendations
1
Demo: Recommendations ML with Cloud Dataproc
2
3
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
33
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Relational databases are a good choice when you need:
(select all of the correct options)
❏ Streaming, high-throughput writes
❏ Fast queries on terabytes of data
❏ Aggregations on unstructured data
❏ Transactional updates on relatively small datasets
Module review (1 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Relational databases are a good choice when you need:
(select all of the correct options)
❏ Streaming, high-throughput writes
❏ Fast queries on terabytes of data
❏ Aggregations on unstructured data
✓ Transactional updates on relatively small datasets
Module review (1 of 2)
34
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
(select all of the correct options)
❏ It’s the same API, but Google implements it better
❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on
❏ Fully-managed versions of the software offer no-ops
❏ Running it on Google infrastructure offers reliability and cost savings
Module review (2 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
(select all of the correct options)
❏ It’s the same API, but Google implements it better
❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on
✓ Fully-managed versions of the software offer no-ops
✓ Running it on Google infrastructure offers reliability and cost savings
Module review (2 of 2)
35
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud SQL
Cloud Dataproc
Cloud Solutions
Resources
https://cloud.google.com/sql/
https://cloud.google.com/dataproc/
https://cloud.google.com/solutions/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Scaling Data Analysis
Version #1.1
36
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Choosing where to store data on GCP
unstructured
Data analytics
workload
Transactional
workload
structuredNeed
MOBILE
SDKs
SQL
Horizontal
scalability
No-SQL Millisecond
Latency
Latency in
secondsCloud
Spanner
Cloud
SQL
Firebase
Storage
Cloud
Storage
Firebase
Realtime DB
Cloud
Datastore
Cloud
Bigtable
BigQuery
Need
MOBILE
SDKs
One
database
enough
37
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Use cloud spanner if you need globally consistent data or more
than one Cloud SQL instance
Source:
https://quizlet.com/blog/
quizlet-cloud-spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
38
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Bigtable is meant for high throughput data where access is primarily
for a range of Row Key prefixes
NASDAQ#1426535612045
...
MD:SYMBOL:
ZXZZT
...
Row Key Column data
MD:LASTSALE:
600.58
...
MD:LASTSIZE:
300
...
MD:TRADETIME:
1426535612045
...
MD:EXCHANGE:
NASDAQ
...
Tables should be tall and narrow
Store changes as new rows
Bigtable will automatically
compact the table
39
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Short meaningful column names reduce storage and RPC overhead
NASDAQ#1426535612045
MD:SYMBOL:
ZXZZT
Row Key Column data
MD:LASTSALE:
600.58
MD:LASTSIZE:
300
MD:TRADETIME:
1426535612045
MD:EXCHANGE:
NASDAQ
Design row key with most
common query in mind
Design row key to minimize hotspots
Column families is a quick
way to get some hierarchy
Use short column names
Designed for sparse tables
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Can work with Bigtable using the HBase API
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;
byte[] CF = Bytes.toBytes("MD"); // column family
Connection connection = ConnectionFactory.createConnection(...)
Table table = null;
try {
table = connection.getTable(TABLE_NAME);
Put p = new Put(Bytes.toBytes("NASDAQ#GOOG #1234561234561"));
p.addColumn(CF, Bytes.toBytes("SYMBOL"), Bytes.toBytes("GOOG"));
p.addColumn(CF, Bytes.toBytes("LASTSALE"), Bytes.toBytes(742.03d));
...
table.put(p);
} finally {
if (table != null) table.close();
}
40
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
41
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
BigQuery is a fully managed data warehouse that lets you do ad-hoc
SQL queries on massive volumes of data
Project X
Dataset A Dataset B
Project Y
Dataset C Dataset D
Table 1
Table 2
Table 1
Table 2
Table 1
Table 2
Table 1
Table 2
BigQuery Service
42
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A demo of BigQuery on a 10 billion-row dataset shows what it is
and what it can do
#standardsql
SELECT
language, SUM(views) as views
FROM `bigquery-samples.wikipedia_benchmark.Wiki10B`
WHERE
title like "%google%"
GROUP by language
ORDER by views DESC
Familiar, SQL 2011 query
language
Interactive ad-hoc analysis
of petabyte-scale databases
No need to provision
clusters
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Three ways of loading data into BigQuery
POST
Serverless
ETL
CSV
JSON
AVRO
Google
Sheets
Files on disk or Cloud
Storage
Stream Data Federated data source
43
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
With Federated data sources, you can directly query files on
Cloud Storage, without having to ingest them into BigQuery
Also: Google Drive, Bigtable
Also: JSON/Avro/Google Sheet
Can also pass in a schema
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
44
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
45
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Increasingly, data analysis and machine learning are carried
out in self-descriptive, shareable, executable notebooks
Code
Output
Markup
Share
A typical notebook
contains code,
charts, and
explanations
Image Source:
Git Logo from
Wikipedia
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab is an open-source notebook built on Jupyter (IPython)
Analyze data in BigQuery,
Compute Engine or Cloud Storage
Use existing
Python packages
Datalab is free—just pay
for Google Cloud resources
46
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab notebooks are developed in an iterative, collaborative process
Development
Process in
Cloud Datalab
PHASE 1
Write code in
Python
PHASE 5
Share and
collaborate
PHASE 2
Run cell
(Shift+Enter)
PHASE 4
Write
commentary in
markdown
PHASE 3
Examine Output
1
3
4
5
2
5
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab supports BigQuery
%%sql
To
Pandas
47
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Create ML dataset
with BigQuery
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we use BigQuery to create a
dataset that we later use to build a taxi
demand forecast system using Machine Learning.
● What kinds of things affect taxi demand?
● What are some ways to measure “demand”?
Demo: Create ML dataset with
BigQuery
48
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo: Create ML dataset with BigQuery
In this demo, we use BigQuery to create a dataset that we later
use to build a taxi demand forecast system using Machine Learning.
1. Use BigQuery and Datalab to explore and visualize data
2. Build a Pandas dataframe that will be used as the training
dataset for machine learning using TensorFlow
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
49
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Match the use case on the left with the product on the right
Global consistency needed
High-throughput writes of wide-column data
Warehousing structured data
Develop Big Data algorithms interactively in Python
1. Datalab
2. BigTable
3. BigQuery
4. Spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Match the use case on the left with the product on the right
Global consistency needed (4)
High-throughput writes of wide-column data (2)
Warehousing structured data (3)
Develop Big Data algorithms interactively in Python (1)
1. Datalab
2. BigTable
3. BigQuery
4. Spanner
50
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Machine Learning
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Machine learning with TensorFlow + Demo
Pre-built machine learning models + Demo
Agenda
51
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
TensorFlow is an open source library that underlies many Google products
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo: Playing with neural networks to learn what they are
http://playground.tensorflow.org/
52
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Supervised machine learning requires features and labels
…
…
Neural Network
…
Input
features Prediction
Cost
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Machine Learning with TensorFlow involves four steps:
1
Gather
Data
Gather training data (input features and labels)
Create model
Train the model based on input data
Use the model on new data
2
Create
3
Train
4
Use
53
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Gather training data and select input features
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
Input features
targetdiscard
1
Gather
Data
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
All input features need to be numeric
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
Gather
Data
One-hot encodingUse as-is
54
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Create a neural network model, defining the number of feature columns
and hidden units
2
Create
npredictors
nhidden
noutputs
…
…
estimator = DNNRegressor(hidden_units=[5], feature_columns=[...])
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
3
Train
estimator.fit(predictors, targets, steps=1000)
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
…
…
Predicted
value of
taxicab
demand
model
True value of
taxicab
demand
CostUpdate
model based
on Cost
npredictors
55
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
4
Use
True value of
taxicab demand
rain
Max temp
…
…
Predicted value
of taxicab
demand
Update model
based on
Cost
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
model
Cost
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
input = pd.DataFrame.from_dict(data =
{'dayofweek' : [4, 5, 6],
'mintemp' : [60, 15, 60],
'maxtemp' : [80, 80, 65],
'rain' : [0, 0.8, 0]})
# read trained model from /tmp/trained_model
estimator = DNNRegressor(model_dir='/tmp/trained_model',
hidden_units=[5])
pred = estimator.predict(input.values)
print pred
4
Use
56
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo 2 Part 2:
Carry out ML
with TensorFlow
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo 2, Part 2: Carry out ML with TensorFlow
Neural Network
Inputs Prediction
In this demo, we build a neural network to predict taxicab demand
on a day-by-day basis using TensorFlow.
57
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Machine learning with TensorFlow + Demo
Pre-built machine learning models + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The accuracy of a ML problem is driven largely by the size and quality
of the dataset; this is why ML requires massive compute
Size of dataset
Scale of Compute Problem
Accuracy
https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
58
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CloudML Engine simplifies the use of Distributed TensorFlow
...
...
...
.
.
.
.
.
.Size of
dataset
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ML APIs are pre-trained ML models (trained off Google’s data) for common
tasks; they are accessible through REST APIs
Use your own data to train models Machine Learning as an API
Cloud
Vision API
Cloud
Translation API
Cloud
Natural Language
API
Cloud
Speech API
Cloud Machine
Learning Engine
TensorFlow
Cloud Video
Intelligence
59
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Logo Detection
117
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Face detection
"detectionConfidence" : 0.93568963,
"joyLikelihood" : "VERY_LIKELY",
"panAngle" : 4.150538,
"sorrowLikelihood" : "VERY_UNLIKELY",
"tiltAngle" : -19.377356,
"underExposedLikelihood" : "VERY_UNLIKELY",
"blurredLikelihood" : "VERY_UNLIKELY"
"faceAnnotations" : [
{
"headwearLikelihood" : "VERY_UNLIKELY",
"surpriseLikelihood" : "VERY_UNLIKELY",
rollAngle" : -4.6490049,
"angerLikelihood" : "VERY_UNLIKELY",
"landmarks" : [
{
"type" : "LEFT_EYE",
"position" : {
"x" : 691.97974,
"y" : 373.11096,
"z" : 0.000037421443
}
},
...
],
"boundingPoly" : {
"vertices" : [
{
"x" : 743,
"y" : 449
},
...
60
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Web annotations
{
"entityId": "/m/016ms7",
"score": 1.44038,
"description": "Ford Anglia"
}
{
"entityId": "/m/0gff2yr",
"score": 5.92256,
"description": "ArtScience Museum"
}
{
"entityId": "/m/0h898pd",
"score": 7.4162,
"description": "Harry Potter (Literary Series)"
}
CC-BY 2.0 Rev Stan: https://www.flickr.com/photos/revstan/6865880240
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Try it in the browser with your own images
cloud.google.com/vision
61
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The Translation API supports 100+ languages
https://cloud.google.com/translate/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Wootric uses the Cloud Natural Language API (entity and sentiment) to
make sense of qualitative customer feedback
62
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Extracted entities are tied into a knowledge graph
Joanne "Jo" Rowling, pen names J. K. Rowling and Robert Galbraith,
is a British novelist, screenwriter and film producer best known as
the author of the Harry Potter fantasy series
{
"name": "Joanne 'Jo' Rowling",
"type": "PERSON",
"metadata": {
"mid": "/m/042xh",
"wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
}
{
"name": "British",
"type": "LOCATION",
"metadata": {
"mid": "/m/07ssc",
"wikipedia_url": "http://en.wikipedia.org/wiki/United_Kingdom"
}
{
"name": "Harry Potter",
"type": "PERSON",
"metadata": {
"mid": "/m/078ffw",
"wikipedia_url":
"http://en.wikipedia.org/wiki/Harry_Potter"
}
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
When you analyze sentiment, you get a score (positive/negative) as well
as a magnitude (how intense?)
{
"documentSentiment": {
"score": 0.8,
"magnitude": 0.8
}
}
The food was excellent, I would definitely go back!
63
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The Cloud Speech API can be used to transcribe audio to text
http://cloud.google.com/speech
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Like the Vision API, the Video Intelligence API can identify labels in a
video, along with a timestamp
{
"description": "Bird's-eye view",
"language_code": "en-us",
"locations": {
"segment": {
"start_time_offset": 71905212,
"end_time_offset": 73740392
},
"confidence": 0.96653205
}
}
https://cloud.google.com/video-intelligence/
64
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo 2 Part 3:
Machine Learning APIs
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Use several of the Machine Learning
APIs (Vision, Translate, Natural
Language Processing, Speech) from
Python
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo 2, Part 3: Machine
Learning APIs
65
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
“How much is this car worth?”
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
“Hi Ocado,
I love your website. I have children so it’s
easier for me to do the shopping online.
Many thanks for saving my time!
Regards”
Feedback Customer is happy
Improves natural
language processing
of customer service
claims
“Thanks to the Google Cloud Platform, Ocado was able to use
the power of cloud computing and train our models in
parallel.”
66
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
50%
of enterprises will be
spending more per annum
on bots and chatbot
creation than traditional
mobile app development by
2021 – Gartner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Use
Dialogflow to
create a new
shopping
experience
Custom image
model to
price cars
Build off NLP
API to route
customer
emails
Use Vision
API as-is to
find text in
memes
67
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Confidential & Proprietary
UPDATEDEPLOYEVALUATETUNE ML MODEL
PARAMETERS
ML MODEL DESIGN
DATA
PREPROCESSING
Introducing Cloud AutoML
A technology that can automatically create a Machine Learning Model
UPDATEDEPLOYEVALUATETUNE ML MODEL
PARAMETERS
ML MODEL
DESIGN
DATA
PREPROCESSING
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Handbag Shoe Hat
Cloud AutoML
Cloud AutoML Vision
Model is now trained and ready to make prediction.
This model can scale as needed to adapt to customer demands.
Upload and label
images
Train your model
in minutes or one day Evaluate
68
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Match the use case on the left with the
product on the right
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Create, test new machine learning methods
No-ops, custom machine learning applications at scale
Automatically reject inappropriate image content
Build application to monitor Spanish twitter feed
Transcribe customer support calls
1. Vision API
2. TensorFlow
3. Speech API
4. Cloud ML
5. Translation API
69
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Match the use case on the left with the
product on the right
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Create, test new machine learning methods (2)
No-ops, custom machine learning applications at scale (4)
Automatically reject inappropriate image content (1)
Build application to monitor Spanish twitter feed (5)
Transcribe customer support calls (3)
1. Vision API
2. TensorFlow
3. Speech API
4. Cloud ML
5. Translation API
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (1 of 2)
Cloud Spanner https://cloud.google.com/spanner/
Cloud Bigtable https://cloud.google.com/bigtable/
Google BigQuery https://cloud.google.com/bigquery/
Cloud Datalab https://cloud.google.com/datalab/
TensorFlow https://www.tensorflow.org/
70
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (2 of 2)
Cloud Machine Learning https://cloud.google.com/ml/
Vision API https://cloud.google.com/vision/
Translation API https://cloud.google.com/translate/
Speech API https://cloud.google.com/speech/
Video Intelligence API
https://cloud.google.com/video-
intelligence
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Data Processing Architecture
Cloud OnBoard
Cloud OnBoard
71
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Asynchronous processing is useful for
long-lived tasks or to have loose
coupling between two systems
P1 P2 P3 Producers
C1 C2 C3 Consumers
Potential use cases:
1. Send an SMS
2. Train ML model
3. Process data from multiple sources
4. Weekly reports …
Message
Queue
72
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
For robust asynchronous processing, you need:
P1 P2 P3
C1 C2 C3
1. A global, highly available queue
2. Scale without over-provisioning
4. Reliable delivery of messages
3. Queue
must be
interoperable
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Pub/Sub provides a no-ops, serverless global message queue
73
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataflow offers NoOps data pipelines in Java and Python
p = beam.Pipeline(options=options)
p.run();
traffic = lines | beam.Map(parse_data).with_output_types(unicode)
| beam.GroupByKey() # (sensor, [speed])
output = traffic | beam.io.WriteToText(‘gs://...]’)
lines = p | beam.io.ReadFromText(‘gs://…’)
Each of these steps is run
in parallel and autoscaled
by execution framework
Open-source API (Apache
Beam) can be executed on
Flink, Spark, etc. also
Group
Transform 3
Transform 4
Write
Read
Transform 1
Input
Output
Transform 2 | beam.Map(get_speedbysensor) # (sensor, speed)
| beam.Map(avg_speed) # (sensor, avgspeed)
| beam.Map(lambda tup: '%s: %d' % tup))
Map
Group-By
Reduce
74
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Same code does real-time and batch
options = PipelineOptions(pipeline_args)
options.view_as(StandardOptions).streaming = True
p = beam.Pipeline(options=options)
lines = p | beam.io.ReadStringsFromPubSub(input_topic)
traffic = (lines
|
beam.Map(parse_data).with_output_types(unicode)
| beam.Map(get_speedbysensor) # (sensor,
speed)
| beam.WindowInto(window.FixedWindows(15, 0))
| beam.GroupByKey() # (sensor, [speed])
| beam.Map(avg_speed) # (sensor, avgspeed)
| beam.Map(lambda tup: '%s: %d' % tup))
traffic | beam.io.WriteStringsToPubSub(output_topic)
p.run()
Cloud
Pub/Sub
Cloud
Dataflow
BigQuery
Cloud
Storage
Cloud
Pub/Sub
Cloud
Storage
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataflow does ingest, transform, and load; consider using it
instead of Spark
75
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Choosing where to store data on GCP
unstructured
Data analytics
workload
Transactional
workload
structured
SQL
Horizontal
scalability
No-SQL
Millisecond
Latency
Latency in
seconds
Cloud
Spanner
Cloud
SQL
Cloud
Storage
Cloud
Datastore
Cloud
Bigtable
BigQuery
One
database
enough
76
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Run Spark/Hadoop jobs on Cloud Dataproc
Cloud
Dataproc
Input and Output
Data Sources
Cloud
Storage
Cloud
Bigtable
BigQuery
Client
Direct
access
API
Applications on
cluster
Input and
output
connectors
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
On GCP, you can have the same data processing pipeline for
processing both batch and stream
Cloud
Storage
Raw logs,
files, assets,
Google
Analytics data,
and so on
Events,
metrics,
and so on Stream
Batch
Bigtable
B CA
Cloud ML
Engine
Applications
and Reports
Cloud
Datalab
Data Studio
Dashboards/BI
Co-workers
Cloud
Pub/Sub
Cloud
Dataflow
BigQuery
77
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Match the use case on the left with the product on the right
Module review
A. Decoupling producers and consumers of data
in large organizations and complex systems
B. Scalable, fault-tolerant multi-step
processing of data
1. Cloud Dataflow
2. Cloud Pub/Sub
78
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Match the use case on the left with the product on the right
Module review
A. Decoupling producers and consumers of data
in large organizations and complex systems
B. Scalable, fault-tolerant multi-step
processing of data
1. Cloud Dataflow
2. Cloud Pub/Sub
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (1 of 2)
Cloud Pub/Sub https://cloud.google.com/pubsub/
Cloud Dataflow https://cloud.google.com/dataflow/
Processing media using
Cloud Pub/Sub and
Compute Engine
https://cloud.google.com/solutions/me
dia-processing-pub-sub-compute-engine
79
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (2 of 2)
Reverse Geocoding of
Geolocation Telemetry
in the Cloud Using the
Maps API
https://cloud.google.com/solutions/reverse-
geocoding-geolocation-telemetry-cloud-maps-
ap
Using Cloud Pub/Sub for
Long-running Tasks
https://cloud.google.com/solutions/us
ing-cloud-pub-sub-long-running-tasks
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Summary
Version #1.1
80
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
An Evolving Cloud
Your kit, someone
else’s building.
Yours to manage.
1st Wave
Standard virtual
kit,for rent.
Still yours to manage.
2nd Wave
Invest your energy
in great apps
3rd Wave
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
Google Cloud provides a way to take advantage of Google’s
investments in infrastructure and data processing innovation
2002
Cloud
Storage
2004 2006 2008 2010 2012 2014 2016
DataProc Bigtable BigQuery
Cloud
Storage
DataFlow
DataStore
DataFlow
Pub/Sub
ML Engine
2018
Auto ML
Cloud
Spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
81
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Typical Big Data Processing
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Big Data with Google: Focus on insight, not infrastructure.
Programming
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In summary, GCP offers you ways to...
To get the most
out of data and
secure competitive
advantage.
Create citizen
data scientists
Transform your
organization into
a truly data driven
company. Putting
tools into hands of
domain experts.
Apply machine
learning broadly
and easily
We make it simple and
practical to
incorporate machine
learning models
within custom
applications.
We’ve “automated
out” the complexity
of building and
maintaining data
and analytics
systems.
Spend less on ops
and administration
Incorporate real-time
data into apps and
architectures
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Today
Google Cloud Platform
Fundamentals: Big Data
and Machine Learning
Next Steps on your Google Cloud learning journey
1 2 3
Tomorrow
Complete hands-on labs:
Baseline: Data, ML, AI quest
google.qwiklabs.com
Future
Find more training online
cloud.google.com/training
82
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Complete 10 hands-on labs free on Qwiklabs
by 30 April, and receive $200 in GCP credits
[Only for Cloud OnBoard Attendees]
1
2
3
Receive a follow up email after event
Username
Password
Create Qwiklabs account with the email
you used to register for Cloud OnBoard
Open your email and confirm account
4 Return to Qwiklabs and log in
5 Enroll in the Baseline: Data, ML, AI quest and
take your first lab!
6
Complete all 10 labs and we will send you an
email after 30 April with instructions to redeem
the $200 credits. Make sure you opt-in to receive
emails from Qwiklabs.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
1
2
3
Go to
https://www.coursera.org/voucher/CloudOnBoardML
Activate voucher and sign
up for a free account
Enroll in Serverless Data
Analysis with Google BigQuery
and Cloud Dataflow for Free
-Limited period offer!
Explore other Courses at
Coursera.org/Googlecloud
To help you get started
Activate your voucher now for a free course worth $99!
83
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Make Google Cloud certification your goal!
cloud.google.com/certification
Find study guides, tips, practice
exams, and testing sites
Confidential & Proprietary
A special offer for Cloud Onboard Singapore attendees:
Visit https://goo.gl/bmJwwk before May 18th to enroll, and eligible
startups* receive $3,000 in Google Cloud and Firebase credits.
Google Cloud Startup Program
$3,000
g.co/cloudstartups
cloud.startups@google.com
Google Cloud is a perfect fit for launching
and scaling your early-stage startup. What’s an eligible
startup?
• Raised no more than a Series A
• Less than 5 years old
• Are located in our approved
countries
• Have not participated in
the Google Cloud Startup
program before
in credits
84
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Be part of the
GCP User Group SG Community!
or bit.ly/gcpusergroupsg
Learn from leads, users,
and tech expertsLearn
Network, share, learn -
all about Google CloudConnect
Gain access to the
Google Cloud team
and the latest
capabilities
Access
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Resources
Big data and machine learning blog https://cloud.google.com/blog/big-data/
Google Cloud Platform blog https://cloudplatform.googleblog.com/
Google Cloud Platform curated articles https://medium.com/google-cloud
85
<MoreTraining>
https://cloud.google.com/training/
</MoreTraining>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Mais conteúdo relacionado

Mais procurados

Google Cloud Connect Korea - Sep 2017
Google Cloud Connect Korea - Sep 2017Google Cloud Connect Korea - Sep 2017
Google Cloud Connect Korea - Sep 2017Google Cloud Korea
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies OverviewChris Schalk
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesChris Schalk
 
MongoDB Days UK: Run MongoDB on Google Cloud Platform
MongoDB Days UK: Run MongoDB on Google Cloud PlatformMongoDB Days UK: Run MongoDB on Google Cloud Platform
MongoDB Days UK: Run MongoDB on Google Cloud PlatformMongoDB
 
Introduction to Google Compute Engine
Introduction to Google Compute EngineIntroduction to Google Compute Engine
Introduction to Google Compute EngineColin Su
 
Cloud-Native Roadshow Google Cloud Platform - Los Angeles
Cloud-Native Roadshow Google Cloud Platform - Los AngelesCloud-Native Roadshow Google Cloud Platform - Los Angeles
Cloud-Native Roadshow Google Cloud Platform - Los AngelesVMware Tanzu
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsBarton Rhodes
 
Google Cloud Platform Update
Google Cloud Platform UpdateGoogle Cloud Platform Update
Google Cloud Platform UpdateIdo Green
 
Google Cloud - Scale With A Smile (Dec 2014)
Google Cloud - Scale With A Smile (Dec 2014)Google Cloud - Scale With A Smile (Dec 2014)
Google Cloud - Scale With A Smile (Dec 2014)Ido Green
 
Introduction to Google Cloud Services / Platforms
Introduction to Google Cloud Services / PlatformsIntroduction to Google Cloud Services / Platforms
Introduction to Google Cloud Services / PlatformsNilanchal
 
Using Google Compute Engine
Using Google Compute EngineUsing Google Compute Engine
Using Google Compute EngineLynn Langit
 
Getting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformGetting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformAaron Taylor
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductSergey Smetanin
 
Google cloud platform introduction
Google cloud platform introductionGoogle cloud platform introduction
Google cloud platform introductionSimon Su
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateSimon Su
 
Cloud computing by Google Cloud Platform - Presentation
Cloud computing by Google Cloud Platform - PresentationCloud computing by Google Cloud Platform - Presentation
Cloud computing by Google Cloud Platform - PresentationTinarivosoaAbaniaina
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data UnlimitedAudrey Huvet
 
Google Compute Engine
Google Compute EngineGoogle Compute Engine
Google Compute EngineCsaba Toth
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
 

Mais procurados (20)

Google Cloud Connect Korea - Sep 2017
Google Cloud Connect Korea - Sep 2017Google Cloud Connect Korea - Sep 2017
Google Cloud Connect Korea - Sep 2017
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies Overview
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud Technologies
 
MongoDB Days UK: Run MongoDB on Google Cloud Platform
MongoDB Days UK: Run MongoDB on Google Cloud PlatformMongoDB Days UK: Run MongoDB on Google Cloud Platform
MongoDB Days UK: Run MongoDB on Google Cloud Platform
 
Introduction to Google Compute Engine
Introduction to Google Compute EngineIntroduction to Google Compute Engine
Introduction to Google Compute Engine
 
Cloud-Native Roadshow Google Cloud Platform - Los Angeles
Cloud-Native Roadshow Google Cloud Platform - Los AngelesCloud-Native Roadshow Google Cloud Platform - Los Angeles
Cloud-Native Roadshow Google Cloud Platform - Los Angeles
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teams
 
Google Cloud Platform Update
Google Cloud Platform UpdateGoogle Cloud Platform Update
Google Cloud Platform Update
 
Google Cloud - Scale With A Smile (Dec 2014)
Google Cloud - Scale With A Smile (Dec 2014)Google Cloud - Scale With A Smile (Dec 2014)
Google Cloud - Scale With A Smile (Dec 2014)
 
Introduction to Google Cloud Services / Platforms
Introduction to Google Cloud Services / PlatformsIntroduction to Google Cloud Services / Platforms
Introduction to Google Cloud Services / Platforms
 
Using Google Compute Engine
Using Google Compute EngineUsing Google Compute Engine
Using Google Compute Engine
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Getting Started on Google Cloud Platform
Getting Started on Google Cloud PlatformGetting Started on Google Cloud Platform
Getting Started on Google Cloud Platform
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your Product
 
Google cloud platform introduction
Google cloud platform introductionGoogle cloud platform introduction
Google cloud platform introduction
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
 
Cloud computing by Google Cloud Platform - Presentation
Cloud computing by Google Cloud Platform - PresentationCloud computing by Google Cloud Platform - Presentation
Cloud computing by Google Cloud Platform - Presentation
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited
 
Google Compute Engine
Google Compute EngineGoogle Compute Engine
Google Compute Engine
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
 

Semelhante a Getting started with Google Cloud Training Material - 2018

Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen KeynoteData Con LA
 
Top Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud PlatformTop Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud PlatformKinsta WordPress Hosting
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep DiveMichelle Holley
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute EngineArun Nagarajan
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesIntel® Software
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalAvere Systems
 
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingBattling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingEdwin Poot
 
MongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
Google Cloud Platfrom
Google Cloud PlatfromGoogle Cloud Platfrom
Google Cloud PlatfromVirendra Bora
 
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...Codemotion
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Dimension Data Saugatuk Webinar
Dimension Data Saugatuk WebinarDimension Data Saugatuk Webinar
Dimension Data Saugatuk WebinarKeao Caindec
 
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 "How overlay networks can make public clouds your global WAN" by Ryan Koop o... "How overlay networks can make public clouds your global WAN" by Ryan Koop o...
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...Cohesive Networks
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Karel Dumon
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
Why You Need to Move Your Website to the Cloud
Why You Need to Move Your Website to the CloudWhy You Need to Move Your Website to the Cloud
Why You Need to Move Your Website to the CloudEktron
 

Semelhante a Getting started with Google Cloud Training Material - 2018 (20)

Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 
Top Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud PlatformTop Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud Platform
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep Dive
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT Services
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingBattling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud Computing
 
MongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Google Cloud Platfrom
Google Cloud PlatfromGoogle Cloud Platfrom
Google Cloud Platfrom
 
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Dimension Data Saugatuk Webinar
Dimension Data Saugatuk WebinarDimension Data Saugatuk Webinar
Dimension Data Saugatuk Webinar
 
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 "How overlay networks can make public clouds your global WAN" by Ryan Koop o... "How overlay networks can make public clouds your global WAN" by Ryan Koop o...
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Why You Need to Move Your Website to the Cloud
Why You Need to Move Your Website to the CloudWhy You Need to Move Your Website to the Cloud
Why You Need to Move Your Website to the Cloud
 

Mais de JK Baseer

Future of Asia pacific cities report 2019 - Full of insights.
Future of Asia pacific cities report 2019 - Full of insights.Future of Asia pacific cities report 2019 - Full of insights.
Future of Asia pacific cities report 2019 - Full of insights.JK Baseer
 
Deliveroo Marketing Plan - Focused on Paid social media Strategy
Deliveroo Marketing Plan - Focused on Paid social media StrategyDeliveroo Marketing Plan - Focused on Paid social media Strategy
Deliveroo Marketing Plan - Focused on Paid social media StrategyJK Baseer
 
Tradly - Open source marketplace app
Tradly - Open source marketplace appTradly - Open source marketplace app
Tradly - Open source marketplace appJK Baseer
 
Why personalization important for fintech enterprises?
Why personalization important for fintech enterprises?  Why personalization important for fintech enterprises?
Why personalization important for fintech enterprises? JK Baseer
 
Giving week 2017 - CheatSheet of ideas for corporates
Giving week 2017 - CheatSheet of ideas for corporatesGiving week 2017 - CheatSheet of ideas for corporates
Giving week 2017 - CheatSheet of ideas for corporatesJK Baseer
 
Smart Nation Initiatives in singapore
Smart Nation Initiatives in singaporeSmart Nation Initiatives in singapore
Smart Nation Initiatives in singaporeJK Baseer
 
Be a S.U.R.E. Learner by NLB Singapore
Be a S.U.R.E. Learner by NLB SingaporeBe a S.U.R.E. Learner by NLB Singapore
Be a S.U.R.E. Learner by NLB SingaporeJK Baseer
 
Jk Baseer Visual resume
Jk Baseer Visual resumeJk Baseer Visual resume
Jk Baseer Visual resumeJK Baseer
 

Mais de JK Baseer (8)

Future of Asia pacific cities report 2019 - Full of insights.
Future of Asia pacific cities report 2019 - Full of insights.Future of Asia pacific cities report 2019 - Full of insights.
Future of Asia pacific cities report 2019 - Full of insights.
 
Deliveroo Marketing Plan - Focused on Paid social media Strategy
Deliveroo Marketing Plan - Focused on Paid social media StrategyDeliveroo Marketing Plan - Focused on Paid social media Strategy
Deliveroo Marketing Plan - Focused on Paid social media Strategy
 
Tradly - Open source marketplace app
Tradly - Open source marketplace appTradly - Open source marketplace app
Tradly - Open source marketplace app
 
Why personalization important for fintech enterprises?
Why personalization important for fintech enterprises?  Why personalization important for fintech enterprises?
Why personalization important for fintech enterprises?
 
Giving week 2017 - CheatSheet of ideas for corporates
Giving week 2017 - CheatSheet of ideas for corporatesGiving week 2017 - CheatSheet of ideas for corporates
Giving week 2017 - CheatSheet of ideas for corporates
 
Smart Nation Initiatives in singapore
Smart Nation Initiatives in singaporeSmart Nation Initiatives in singapore
Smart Nation Initiatives in singapore
 
Be a S.U.R.E. Learner by NLB Singapore
Be a S.U.R.E. Learner by NLB SingaporeBe a S.U.R.E. Learner by NLB Singapore
Be a S.U.R.E. Learner by NLB Singapore
 
Jk Baseer Visual resume
Jk Baseer Visual resumeJk Baseer Visual resume
Jk Baseer Visual resume
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Getting started with Google Cloud Training Material - 2018

  • 1. Getting Started With Google Cloud <Cloud OnBoard> </Cloud OnBoard> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 2. Cloud OnBoard <!Content> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 { } (’Module 1’) Introducing Google Cloud Platform Page 1 - 11 (’Module 2’) Compute & Storage Fundamentals Page 11 - 21 (’Module 3’) Data Analysis�on the Cloud Page 22 - 35 (’Module 4’) Scaling Data Analysis Page 35 - 49 (’Module 5’) Machine Learning Page 50 - 70 (’Module 6’) Data Processing Architecture Page 70 - 79 Summary | Continue learning with Google Cloud Page 79 - 85
  • 3. 1 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Cloud OnBoard Cloud OnBoard Introducing Google Cloud Platform: Big Data and Machine Learning Version #1.1 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning What is Google Cloud Platform Google Cloud Big Data products Agenda
  • 4. 2 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud computing is a continuation of a long-term shift in how computing resources are managed Now 2000 1980s Next First Wave Server on-premises You own everything. It is yours to manage. Second Wave Data centers You pay for the hardware but rent the space. Still yours to manage. First Generation Cloud Virtualized data centers You don’t rent hardware and space, but still control and configure virtual machines. Pay for what you provision. Third Wave Managed service Completely elastic storage, processing, and machine learning so that you can invest your energy in great apps. Pay for what you use. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning What is Google Cloud Platform Google Cloud Big Data products Agenda
  • 5. 3 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Google’s mission is to organize the world’s information and make it universally accessible and useful 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard To organize the world’s information,Google has been building the most powerful infrastructure on the planet
  • 6. 4 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Edge node locations >1000 Edge points of presence >100 Network Network sea cable investments In terms of hardware, Google Cloud has the largest cloud network, with over 100 points of presence, and 100,000s of miles of fiber optic cable. Unity (US, JP) 2010 Monet (US, BR) 2017 Tannat (BR, UY, AR) 2017 Junior (Rio, Santos) 2017 FASTER (US, JP, TW) 2016 PLCN (HK, LA) 2019 Indigo (SG, ID, AU) 2019 SJC (JP, HK, SG) 2013 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Future region and number of zones The network connects 15 regions, with 3 more coming Current region and number of zones S Carolina N Virginia Oregon Iowa Montreal Los Angeles 3 4 3 3 3 3 Frankfurt Belgium London FinlandNetherlands 3 33 2 3 São Paulo 3 Taiwan Singapore Mumbai Sydney Tokyo 3 3 2 3 3 3 HongKong 3 2 3
  • 7. 5 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 In terms of software, organizing the world’s information has meant that Google needed to invent data processing methods 2002 2004 2006 2008 2010 2012 2014 2016 GFS MapReduce TensorFlow Bigtable Dremel Colossus Flume Megastore Spanner Millwheel Pub/Sub F1 http://research.google.com/pubs/papers.html TPU 2018 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Google Cloud opens up that innovation and infrastructure to you 2002 2004 2006 2008 2010 2012 2014 2016 ML Engine Pub/Sub Dataflow Datastore Dataflow Cloud Storage BigQuery Bigtable Dataproc Cloud Storage 2018 Auto ML Cloud Spanner
  • 8. 6 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A suite of products that can be put together for data processing Foundation Databases Data-handling frameworks Analytics and ML Cloud Storage Cloud Bigtable BigQuery Cloud Dataproc Compute Engine Cloud SQL Cloud Datalab Cloud Dataflow Cloud Pub/Sub Cloud Spanner ML APIs ... 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Spotify illustrates the typical journey of companies that come to Google Cloud: From lower costs to increased reliability to business transformation 2 3 1 Spend less No-ops, Pay for use, Secure Flexible Complete Innovative Powerful
  • 9. 7 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A suite of products that can be put together for data processing Change how you computeChange where you compute Improve scalability and reliability 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Atomic Fiction lowered their costs with per-minute (now per-second) billing https://www.youtube.com/watch?v=mBY-RjE15WA Change where you compute
  • 10. 8 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 FIS was able to improve reliability and scalability on a massive data-processing challenge 6 BILLION MARKET EVENTS WRITTEN PER HOUR 1.7 GIGs PER SECOND PER HOUR 6 TBs 10 BN WRITTEN PER HOUR BURSTS 1.7 GIGABYTES PER SECOND 10 TERABYTES PER HOUR The Consolidated Audit Trail (CAT) is a data repository of all equities and options orders, quotes, and events; FIS processed the CAT to organize 100 billion market events into an “order lifecycle” in a 4-hour window using Cloud Bigtable. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Rooms to Go transformed its business with data and machine learning https://www.thinkwithgoogle.com/case-studies/rooms-to-go-improves-the-shopper-experience.html BigQuery Analyze CRM Customer Relationship Manager customer demographics, past purchases Rooms to Go Google Analytics Premium landing pages, views Combine data Collect data completely designed room packages
  • 11. 9 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 In summary, Google Cloud offers you ways to… Become a truly data-driven company Apply machine learning broadly and easily Spend less on ops and administration Incorporate real- time data into apps and architectures 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Module Review
  • 12. 10 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Google Cloud Platform is: (select all of the correct options) Cloud OnBoard Module review Operated by Google on the same infrastructure it uses A set of modular services from which you can compose cloud-based applications Most cost-effective if you pre- purchase instances on a yearly basis A platform on which to host scalable and fast distributed applications 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Google Cloud Platform is: (select all of the correct options) Cloud OnBoard Module review Operated by Google on the same infrastructure it uses A set of modular services from which you can compose cloud-based applications Most cost-effective if you pre- purchase instances on a yearly basis A platform on which to host scalable and fast distributed applications
  • 13. 11 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Resources Google Cloud Platform https://cloud.google.com/ Datacenters https://www.google.com/about/datacenters/ Google IT security https://cloud.google.com/files/Google- CommonSecurity-WhitePaper-v1.4.pdf Why Google Cloud Platform? https://cloud.google.com/why-google/ Pricing Philosophy https://cloud.google.com/pricing/philosophy/ 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Cloud OnBoard Cloud OnBoard Compute & Storage Fundamentals Version #1.1
  • 14. 12 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning CPUs on demand + Demo A global filesystem + Demo Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Google Cloud provides an earth-scale computer Data storage Compute power Networking
  • 15. 13 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Custom/changeable machine types, preemptible machines, and automatic discounts lead to simplicity and agility 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Create a Compute Engine instance
  • 16. 14 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 In this demo, we will : 1. Create a Compute Engine instance 2. SSH into the instance 3. Install the software package git (for source code version control) Cloud OnBoard Demo : Create a Compute Engine Instance 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning CPUs on demand + Demo A global filesystem + Demo Agenda
  • 17. 15 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Use Cloud Storage for persistent storage and as staging ground for import to other Google Cloud products 31 Compute Engine + Disk Cloud SQL BigQuery Dataproc Cloud Storage Store/StageTransform Raw data (any format) 4 Ingest/ Extract Load 2 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Create a bucket and copy the data over using the Cloud SDK; blobs are referenced through a gs://.../ URL sales*.csv gs://acme-sales/data/ Copy Google Cloud Platform Project Bucket Objects Data and metadata Bucket Objects Data and metadata gsutil cp
  • 18. 16 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Cloud Storage gives you durability, reliability, and global reach Cloud Storage Cloud SQL Compute Engine Transfer Services are useful for ingest Ingest Store Import Control access at project, bucket and/or object level Publish Use Cloud Storage as staging area 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Control latency and availability with zones and regions Choose the closest zone/region so as to to reduce latency. Region: North America Zone: us-central1-a ... Region: Europe Zone: europe-west1-b ... Region: ... Zone: ... ... Distribute your apps and data across regions for global availability. Distribute your apps and data across zones to reduce service disruptions.
  • 19. 17 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Interact with Cloud Storage 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 In this demo, we carry out the steps of an ingest- transform-and-publish data pipeline manually 1. Ingest data into a Compute Engine instance 2. Transform data on the Compute Engine instance 3. Store the transformed data on Cloud Storage 4. Publish Cloud Storage data to the web Cloud OnBoard Demo : Interact with Cloud Storage
  • 20. 18 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Ingest-Transform-Publish using core infrastructure Cloud Storage Cloud SQL Compute Engine Step 1 Ingest/ Extract Store Import Publish Step 2 Step 3 Step 4 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Cloud Shell gives you an easy command-line Cloud Shell comes pre-installed with the tools, libraries, and so on you need to interact with Google Cloud Platform Click Do Now
  • 21. 19 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Module Review 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Compute nodes on GCP are: (select the correct option) ❏ Allocated on demand, and you pay for the time that they are up. ❏ Expensive to create and teardown ❏ Pre-installed with all the software packages you might ever need. ❏ One of ~50 choices in terms of CPU and memory Cloud OnBoard Module review (1 of 2)
  • 22. 20 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Compute nodes on GCP are: (select the correct option) ➔ Allocated on demand, and you pay for the time that they are up. ❏ Expensive to create and teardown ❏ Pre-installed with all the software packages you might ever need. ❏ One of ~50 choices in terms of CPU and memory Cloud OnBoard Module review answers (1 of 2) 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Google Cloud Storage is a good option for storing data that: (select all of the correct options) ❏ Is ingested in real-time from sensors and other devices ❏ Will be frequently read/written from a compute node ❏ May be required to be read at some later time ❏ May be imported into a cluster for analysis Cloud OnBoard Module review (2 of 2)
  • 23. 21 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Google Cloud Storage is a good option for storing data that: (select all of the correct options) ❏ Is ingested in real-time from sensors and other devices ❏ Will be frequently read/written from a compute node ➔ May be required to be read at some later time ➔ May be imported into a cluster for analysis Cloud OnBoard Module review (2 of 2) 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Resources Google Cloud Platform https://cloud.google.com/compute/ Datacenters https://cloud.google.com/storage/ Pricing https://cloud.google.com/pricing/ Cloud Launcher https://cloud.google.com/launcher/ Pricing Philosophy https://cloud.google.com/pricing/philosophy/
  • 24. 22 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Cloud OnBoard Cloud OnBoard Data Analysis on the Cloud Version #1.1 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Stepping stones to transformation Your SQL database in the cloud + Demo Managed Hadoop in the cloud + Demo Agenda
  • 25. 23 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Google Cloud Platform began in 2008, with App Engine, a serverless way to run web applications 1 Develop 2 Upload 3 Autoscales Reliable App Engine Your code http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html http://googleappengine.blogspot.com/2013/05/the-google-app-engine-blog-is-moving.html 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Compute Engine Container Engine App Engine Flex App Engine There [was] something fundamentally wrong with what we were doing in 2008 … We didn't get the right stepping stones into the cloud … -- Eric Schmidt, Executive Chairman, Google
  • 26. 24 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 GCP now consists of a suite of products that together provide these stepping stones in a business’ transformative journey Change how you computeChange where you compute Flexibility, scalability and reliability Cost effective virtual machines, storage, Hadoop, and MySQL to migrate your current workloads to the public cloud. Reliable, autoscaling messaging, data processing, and storage. Fully managed products for data warehousing, data analysis, streaming, and machine learning. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Machine learning. This is the next transformation … the programming paradigm is changing. Instead of programming a computer, you teach a computer to learn something and it does what you want. Cloud OnBoard Eric Schmidt, Executive Chairman, Google
  • 27. 25 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard “If you want to teach a neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out.” WIRED’s headline 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Search People who bought ... Spam filtering Suggest next video Route planning Smart Reply Cloud OnBoard Machine Learning is not new, but it is now mainstream What’s common to all of these use cases of Machine Learning? ?
  • 28. 26 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 There are three components in a recommendation system RecommendingRating Training Users rate a few houses explicitly or implicitly A machine learning model is created to predict a user’s rating of a house For each user, the model is applied to every unrated house and the top 5 houses for that user are saved. What else is needed?? 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 The ML algorithm essentially clusters users and items How often do you need to compute the predicted ratings? Where would you save them? 2 3 Who is like this user? Is this a good house? Predict rating Is this house similar to houses that people similar to this user like? Predicted rating = user-preference * item-quality 1 ?
  • 29. 27 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 In addition to the ML algorithm, you also need sophisticated data management Data Collection Data Analysis Machine Learning Serving Scalable front end to collect customer actions Data that is accessible and not silo-ed (Re-)training and experimentation Scalable, real-time system to serve recommendations 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Stepping stones to transformation Your SQL database in the cloud + Demo Managed Hadoop in the cloud + Demo Agenda
  • 30. 28 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Choose your storage solution based on your access pattern Capacity Access metaphor Read Write Update granularity Usage Petabytes + Like files in a file system Have to copy to local disk One file An object (a “file”) Store blobs Gigabytes Relational database SELECT rows INSERT row Field No-ops SQL database on the cloud Terabytes Persistent Hashmap Filter objects on property put object Attribute Structured data from AppEngine apps Petabytes Key-value(s), HBase API scan rows put row Row No-ops, high throughput, scalable, flattened data Petabytes Relational SELECT rows Batch/stream Field Interactive SQL* querying fully managed warehouse Cloud Storage Cloud SQL Datastore Bigtable BigQuery 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud SQL is a fully managed database service Cloud SQL Google-managed MySQL or Postgres Flexible pricing Familiar Managed backups Automatic replication Fast connection from GCE & GAE Connect from anywhere Google Security
  • 31. 29 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Set up rentals data in Cloud SQL 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard In this demo, we populate rentals data in Cloud SQL for the recommendation engine to use: 1. Create Cloud SQL instance 2. Create database tables by importing .sql files from Cloud Storage 3. Populate the tables by importing .csv files from Cloud Storage 4. Allow access to Cloud SQL 5. Explore the rentals data using SQL statements from Cloud Shell Demo: Setup rentals data in Cloud SQL Cloud Storage Cloud SQL External machine Import
  • 32. 30 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Stepping stones to transformation Your SQL database in the cloud + Demo Managed Hadoop in the cloud + Demo Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard There is a rich open-source ecosystem for big data http://hadoop.apache.org/ http://pig.apache.org/ http://hive.apache.org/ http://spark.apache.org/
  • 33. 31 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Dataproc reduces the cost and complexity associated with Spark and Hadoop clusters Dataproc Google-managed: Hadoop Pig Hive Spark Image Versioning Familiar Resize in seconds Automated cluster mgmt Integrates with Google Cloud Flexible VMs Google Security 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Recommendations ML with Dataproc
  • 34. 32 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard In this demo, we implement machine learning recommendations using Cloud Dataproc: 1. Launch Dataproc 2. Train and apply ML model written in PySpark to create product recommendations 3. Explore inserted rows in Cloud SQL Dataproc Cloud SQL Train model Show recommendations 1 Demo: Recommendations ML with Cloud Dataproc 2 3 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Module Review
  • 35. 33 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Relational databases are a good choice when you need: (select all of the correct options) ❏ Streaming, high-throughput writes ❏ Fast queries on terabytes of data ❏ Aggregations on unstructured data ❏ Transactional updates on relatively small datasets Module review (1 of 2) 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Relational databases are a good choice when you need: (select all of the correct options) ❏ Streaming, high-throughput writes ❏ Fast queries on terabytes of data ❏ Aggregations on unstructured data ✓ Transactional updates on relatively small datasets Module review (1 of 2)
  • 36. 34 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform? (select all of the correct options) ❏ It’s the same API, but Google implements it better ❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on ❏ Fully-managed versions of the software offer no-ops ❏ Running it on Google infrastructure offers reliability and cost savings Module review (2 of 2) 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform? (select all of the correct options) ❏ It’s the same API, but Google implements it better ❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on ✓ Fully-managed versions of the software offer no-ops ✓ Running it on Google infrastructure offers reliability and cost savings Module review (2 of 2)
  • 37. 35 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud SQL Cloud Dataproc Cloud Solutions Resources https://cloud.google.com/sql/ https://cloud.google.com/dataproc/ https://cloud.google.com/solutions/ 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Cloud OnBoard Cloud OnBoard Scaling Data Analysis Version #1.1
  • 38. 36 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Fast random access Warehouse and interactively query petabytes Interactive, iterative development + Demo Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Choosing where to store data on GCP unstructured Data analytics workload Transactional workload structuredNeed MOBILE SDKs SQL Horizontal scalability No-SQL Millisecond Latency Latency in secondsCloud Spanner Cloud SQL Firebase Storage Cloud Storage Firebase Realtime DB Cloud Datastore Cloud Bigtable BigQuery Need MOBILE SDKs One database enough
  • 39. 37 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Use cloud spanner if you need globally consistent data or more than one Cloud SQL instance Source: https://quizlet.com/blog/ quizlet-cloud-spanner 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL document NoSQL wide column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Transactions Yes Single-row No Yes Yes No Complex queries No No No Yes Yes Yes Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+ Unit size 1 MB/entity ~10 MB/cell ~100 MB/row 5 TB/object Determined by DB engine 10,240 MiB/ row 10 MB/row Comparing storage options: technical details
  • 40. 38 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL document NoSQL wide column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Best for Getting started, App Engine applications “Flat” data, Heavy read/write, events, analytical data Structured and unstructured binary or object data Web frameworks, existing applications Large-scale database applications (> ~2 TB) Interactive querying, offline analytics Use cases Getting started, App Engine applications AdTech, Financial and IoT data Images, large media files, backups User credentials, customer orders Whenever high I/O, global consistency is needed Data warehousing Comparing storage options: use cases 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Bigtable is meant for high throughput data where access is primarily for a range of Row Key prefixes NASDAQ#1426535612045 ... MD:SYMBOL: ZXZZT ... Row Key Column data MD:LASTSALE: 600.58 ... MD:LASTSIZE: 300 ... MD:TRADETIME: 1426535612045 ... MD:EXCHANGE: NASDAQ ... Tables should be tall and narrow Store changes as new rows Bigtable will automatically compact the table
  • 41. 39 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Short meaningful column names reduce storage and RPC overhead NASDAQ#1426535612045 MD:SYMBOL: ZXZZT Row Key Column data MD:LASTSALE: 600.58 MD:LASTSIZE: 300 MD:TRADETIME: 1426535612045 MD:EXCHANGE: NASDAQ Design row key with most common query in mind Design row key to minimize hotspots Column families is a quick way to get some hierarchy Use short column names Designed for sparse tables 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Can work with Bigtable using the HBase API import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.*; byte[] CF = Bytes.toBytes("MD"); // column family Connection connection = ConnectionFactory.createConnection(...) Table table = null; try { table = connection.getTable(TABLE_NAME); Put p = new Put(Bytes.toBytes("NASDAQ#GOOG #1234561234561")); p.addColumn(CF, Bytes.toBytes("SYMBOL"), Bytes.toBytes("GOOG")); p.addColumn(CF, Bytes.toBytes("LASTSALE"), Bytes.toBytes(742.03d)); ... table.put(p); } finally { if (table != null) table.close(); }
  • 42. 40 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL document NoSQL wide column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Transactions Yes Single-row No Yes Yes No Complex queries No No No Yes Yes Yes Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+ Unit size 1 MB/entity ~10 MB/cell ~100 MB/row 5 TB/object Determined by DB engine 10,240 MiB/ row 10 MB/row Comparing storage options: technical details 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL document NoSQL wide column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Best for Getting started, App Engine applications “Flat” data, Heavy read/write, events, analytical data Structured and unstructured binary or object data Web frameworks, existing applications Large-scale database applications (> ~2 TB) Interactive querying, offline analytics Use cases Getting started, App Engine applications AdTech, Financial and IoT data Images, large media files, backups User credentials, customer orders Whenever high I/O, global consistency is needed Data warehousing Comparing storage options: use cases
  • 43. 41 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Fast random access Warehouse and interactively query petabytes Interactive, iterative development + Demo Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 BigQuery is a fully managed data warehouse that lets you do ad-hoc SQL queries on massive volumes of data Project X Dataset A Dataset B Project Y Dataset C Dataset D Table 1 Table 2 Table 1 Table 2 Table 1 Table 2 Table 1 Table 2 BigQuery Service
  • 44. 42 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A demo of BigQuery on a 10 billion-row dataset shows what it is and what it can do #standardsql SELECT language, SUM(views) as views FROM `bigquery-samples.wikipedia_benchmark.Wiki10B` WHERE title like "%google%" GROUP by language ORDER by views DESC Familiar, SQL 2011 query language Interactive ad-hoc analysis of petabyte-scale databases No need to provision clusters 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Three ways of loading data into BigQuery POST Serverless ETL CSV JSON AVRO Google Sheets Files on disk or Cloud Storage Stream Data Federated data source
  • 45. 43 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 With Federated data sources, you can directly query files on Cloud Storage, without having to ingest them into BigQuery Also: Google Drive, Bigtable Also: JSON/Avro/Google Sheet Can also pass in a schema 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL document NoSQL wide column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Transactions Yes Single-row No Yes Yes No Complex queries No No No Yes Yes Yes Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+ Unit size 1 MB/entity ~10 MB/cell ~100 MB/row 5 TB/object Determined by DB engine 10,240 MiB/ row 10 MB/row Comparing storage options: technical details
  • 46. 44 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud Datastore Bigtable Cloud Storage Cloud SQL Cloud Spanner BigQuery Type NoSQL document NoSQL wide column Blobstore Relational SQL for OLTP Relational SQL for OLTP Relational SQL for OLAP Best for Getting started, App Engine applications “Flat” data, Heavy read/write, events, analytical data Structured and unstructured binary or object data Web frameworks, existing applications Large-scale database applications (> ~2 TB) Interactive querying, offline analytics Use cases Getting started, App Engine applications AdTech, Financial and IoT data Images, large media files, backups User credentials, customer orders Whenever high I/O, global consistency is needed Data warehousing Comparing storage options: use cases 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Fast random access Warehouse and interactively query petabytes Interactive, iterative development + Demo Agenda
  • 47. 45 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Increasingly, data analysis and machine learning are carried out in self-descriptive, shareable, executable notebooks Code Output Markup Share A typical notebook contains code, charts, and explanations Image Source: Git Logo from Wikipedia 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Datalab is an open-source notebook built on Jupyter (IPython) Analyze data in BigQuery, Compute Engine or Cloud Storage Use existing Python packages Datalab is free—just pay for Google Cloud resources
  • 48. 46 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Datalab notebooks are developed in an iterative, collaborative process Development Process in Cloud Datalab PHASE 1 Write code in Python PHASE 5 Share and collaborate PHASE 2 Run cell (Shift+Enter) PHASE 4 Write commentary in markdown PHASE 3 Examine Output 1 3 4 5 2 5 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Datalab supports BigQuery %%sql To Pandas
  • 49. 47 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Create ML dataset with BigQuery 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard In this demo, we use BigQuery to create a dataset that we later use to build a taxi demand forecast system using Machine Learning. ● What kinds of things affect taxi demand? ● What are some ways to measure “demand”? Demo: Create ML dataset with BigQuery
  • 50. 48 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Demo: Create ML dataset with BigQuery In this demo, we use BigQuery to create a dataset that we later use to build a taxi demand forecast system using Machine Learning. 1. Use BigQuery and Datalab to explore and visualize data 2. Build a Pandas dataframe that will be used as the training dataset for machine learning using TensorFlow 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Module Review
  • 51. 49 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Module review Match the use case on the left with the product on the right Global consistency needed High-throughput writes of wide-column data Warehousing structured data Develop Big Data algorithms interactively in Python 1. Datalab 2. BigTable 3. BigQuery 4. Spanner 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Module review Match the use case on the left with the product on the right Global consistency needed (4) High-throughput writes of wide-column data (2) Warehousing structured data (3) Develop Big Data algorithms interactively in Python (1) 1. Datalab 2. BigTable 3. BigQuery 4. Spanner
  • 52. 50 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Cloud OnBoard Cloud OnBoard Machine Learning Version #1.1 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Machine learning with TensorFlow + Demo Pre-built machine learning models + Demo Agenda
  • 53. 51 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 TensorFlow is an open source library that underlies many Google products 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Demo: Playing with neural networks to learn what they are http://playground.tensorflow.org/
  • 54. 52 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Supervised machine learning requires features and labels … … Neural Network … Input features Prediction Cost Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Machine Learning with TensorFlow involves four steps: 1 Gather Data Gather training data (input features and labels) Create model Train the model based on input data Use the model on new data 2 Create 3 Train 4 Use
  • 55. 53 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Gather training data and select input features Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons Input features targetdiscard 1 Gather Data 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 All input features need to be numeric Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons 1 Gather Data One-hot encodingUse as-is
  • 56. 54 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Create a neural network model, defining the number of feature columns and hidden units 2 Create npredictors nhidden noutputs … … estimator = DNNRegressor(hidden_units=[5], feature_columns=[...]) Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Train the model on the collected data 3 Train estimator.fit(predictors, targets, steps=1000) Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons … … Predicted value of taxicab demand model True value of taxicab demand CostUpdate model based on Cost npredictors
  • 57. 55 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Train the model on the collected data 4 Use True value of taxicab demand rain Max temp … … Predicted value of taxicab demand Update model based on Cost Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons model Cost 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Train the model on the collected data input = pd.DataFrame.from_dict(data = {'dayofweek' : [4, 5, 6], 'mintemp' : [60, 15, 60], 'maxtemp' : [80, 80, 65], 'rain' : [0, 0.8, 0]}) # read trained model from /tmp/trained_model estimator = DNNRegressor(model_dir='/tmp/trained_model', hidden_units=[5]) pred = estimator.predict(input.values) print pred 4 Use
  • 58. 56 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo 2 Part 2: Carry out ML with TensorFlow 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Demo 2, Part 2: Carry out ML with TensorFlow Neural Network Inputs Prediction In this demo, we build a neural network to predict taxicab demand on a day-by-day basis using TensorFlow.
  • 59. 57 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Machine learning with TensorFlow + Demo Pre-built machine learning models + Demo Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 The accuracy of a ML problem is driven largely by the size and quality of the dataset; this is why ML requires massive compute Size of dataset Scale of Compute Problem Accuracy https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
  • 60. 58 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 CloudML Engine simplifies the use of Distributed TensorFlow ... ... ... . . . . . .Size of dataset 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ML APIs are pre-trained ML models (trained off Google’s data) for common tasks; they are accessible through REST APIs Use your own data to train models Machine Learning as an API Cloud Vision API Cloud Translation API Cloud Natural Language API Cloud Speech API Cloud Machine Learning Engine TensorFlow Cloud Video Intelligence
  • 61. 59 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Logo Detection 117 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Face detection "detectionConfidence" : 0.93568963, "joyLikelihood" : "VERY_LIKELY", "panAngle" : 4.150538, "sorrowLikelihood" : "VERY_UNLIKELY", "tiltAngle" : -19.377356, "underExposedLikelihood" : "VERY_UNLIKELY", "blurredLikelihood" : "VERY_UNLIKELY" "faceAnnotations" : [ { "headwearLikelihood" : "VERY_UNLIKELY", "surpriseLikelihood" : "VERY_UNLIKELY", rollAngle" : -4.6490049, "angerLikelihood" : "VERY_UNLIKELY", "landmarks" : [ { "type" : "LEFT_EYE", "position" : { "x" : 691.97974, "y" : 373.11096, "z" : 0.000037421443 } }, ... ], "boundingPoly" : { "vertices" : [ { "x" : 743, "y" : 449 }, ...
  • 62. 60 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Web annotations { "entityId": "/m/016ms7", "score": 1.44038, "description": "Ford Anglia" } { "entityId": "/m/0gff2yr", "score": 5.92256, "description": "ArtScience Museum" } { "entityId": "/m/0h898pd", "score": 7.4162, "description": "Harry Potter (Literary Series)" } CC-BY 2.0 Rev Stan: https://www.flickr.com/photos/revstan/6865880240 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Try it in the browser with your own images cloud.google.com/vision
  • 63. 61 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 The Translation API supports 100+ languages https://cloud.google.com/translate/ 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Wootric uses the Cloud Natural Language API (entity and sentiment) to make sense of qualitative customer feedback
  • 64. 62 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Extracted entities are tied into a knowledge graph Joanne "Jo" Rowling, pen names J. K. Rowling and Robert Galbraith, is a British novelist, screenwriter and film producer best known as the author of the Harry Potter fantasy series { "name": "Joanne 'Jo' Rowling", "type": "PERSON", "metadata": { "mid": "/m/042xh", "wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling" } { "name": "British", "type": "LOCATION", "metadata": { "mid": "/m/07ssc", "wikipedia_url": "http://en.wikipedia.org/wiki/United_Kingdom" } { "name": "Harry Potter", "type": "PERSON", "metadata": { "mid": "/m/078ffw", "wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter" } 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 When you analyze sentiment, you get a score (positive/negative) as well as a magnitude (how intense?) { "documentSentiment": { "score": 0.8, "magnitude": 0.8 } } The food was excellent, I would definitely go back!
  • 65. 63 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 The Cloud Speech API can be used to transcribe audio to text http://cloud.google.com/speech 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Like the Vision API, the Video Intelligence API can identify labels in a video, along with a timestamp { "description": "Bird's-eye view", "language_code": "en-us", "locations": { "segment": { "start_time_offset": 71905212, "end_time_offset": 73740392 }, "confidence": 0.96653205 } } https://cloud.google.com/video-intelligence/
  • 66. 64 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo 2 Part 3: Machine Learning APIs 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Use several of the Machine Learning APIs (Vision, Translate, Natural Language Processing, Speech) from Python Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Demo 2, Part 3: Machine Learning APIs
  • 67. 65 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning “How much is this car worth?” 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning “Hi Ocado, I love your website. I have children so it’s easier for me to do the shopping online. Many thanks for saving my time! Regards” Feedback Customer is happy Improves natural language processing of customer service claims “Thanks to the Google Cloud Platform, Ocado was able to use the power of cloud computing and train our models in parallel.”
  • 68. 66 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 50% of enterprises will be spending more per annum on bots and chatbot creation than traditional mobile app development by 2021 – Gartner 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Use Dialogflow to create a new shopping experience Custom image model to price cars Build off NLP API to route customer emails Use Vision API as-is to find text in memes
  • 69. 67 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Confidential & Proprietary UPDATEDEPLOYEVALUATETUNE ML MODEL PARAMETERS ML MODEL DESIGN DATA PREPROCESSING Introducing Cloud AutoML A technology that can automatically create a Machine Learning Model UPDATEDEPLOYEVALUATETUNE ML MODEL PARAMETERS ML MODEL DESIGN DATA PREPROCESSING 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Handbag Shoe Hat Cloud AutoML Cloud AutoML Vision Model is now trained and ready to make prediction. This model can scale as needed to adapt to customer demands. Upload and label images Train your model in minutes or one day Evaluate
  • 70. 68 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Module Review 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Match the use case on the left with the product on the right Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Module review Create, test new machine learning methods No-ops, custom machine learning applications at scale Automatically reject inappropriate image content Build application to monitor Spanish twitter feed Transcribe customer support calls 1. Vision API 2. TensorFlow 3. Speech API 4. Cloud ML 5. Translation API
  • 71. 69 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Match the use case on the left with the product on the right Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Module review Create, test new machine learning methods (2) No-ops, custom machine learning applications at scale (4) Automatically reject inappropriate image content (1) Build application to monitor Spanish twitter feed (5) Transcribe customer support calls (3) 1. Vision API 2. TensorFlow 3. Speech API 4. Cloud ML 5. Translation API 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Resources (1 of 2) Cloud Spanner https://cloud.google.com/spanner/ Cloud Bigtable https://cloud.google.com/bigtable/ Google BigQuery https://cloud.google.com/bigquery/ Cloud Datalab https://cloud.google.com/datalab/ TensorFlow https://www.tensorflow.org/
  • 72. 70 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Resources (2 of 2) Cloud Machine Learning https://cloud.google.com/ml/ Vision API https://cloud.google.com/vision/ Translation API https://cloud.google.com/translate/ Speech API https://cloud.google.com/speech/ Video Intelligence API https://cloud.google.com/video- intelligence 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Data Processing Architecture Cloud OnBoard Cloud OnBoard
  • 73. 71 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Message-oriented architectures Serverless data pipelines GCP Reference Architecture Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Asynchronous processing is useful for long-lived tasks or to have loose coupling between two systems P1 P2 P3 Producers C1 C2 C3 Consumers Potential use cases: 1. Send an SMS 2. Train ML model 3. Process data from multiple sources 4. Weekly reports … Message Queue
  • 74. 72 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 For robust asynchronous processing, you need: P1 P2 P3 C1 C2 C3 1. A global, highly available queue 2. Scale without over-provisioning 4. Reliable delivery of messages 3. Queue must be interoperable 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Pub/Sub provides a no-ops, serverless global message queue
  • 75. 73 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Message-oriented architectures Serverless data pipelines GCP Reference Architecture Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Dataflow offers NoOps data pipelines in Java and Python p = beam.Pipeline(options=options) p.run(); traffic = lines | beam.Map(parse_data).with_output_types(unicode) | beam.GroupByKey() # (sensor, [speed]) output = traffic | beam.io.WriteToText(‘gs://...]’) lines = p | beam.io.ReadFromText(‘gs://…’) Each of these steps is run in parallel and autoscaled by execution framework Open-source API (Apache Beam) can be executed on Flink, Spark, etc. also Group Transform 3 Transform 4 Write Read Transform 1 Input Output Transform 2 | beam.Map(get_speedbysensor) # (sensor, speed) | beam.Map(avg_speed) # (sensor, avgspeed) | beam.Map(lambda tup: '%s: %d' % tup)) Map Group-By Reduce
  • 76. 74 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Same code does real-time and batch options = PipelineOptions(pipeline_args) options.view_as(StandardOptions).streaming = True p = beam.Pipeline(options=options) lines = p | beam.io.ReadStringsFromPubSub(input_topic) traffic = (lines | beam.Map(parse_data).with_output_types(unicode) | beam.Map(get_speedbysensor) # (sensor, speed) | beam.WindowInto(window.FixedWindows(15, 0)) | beam.GroupByKey() # (sensor, [speed]) | beam.Map(avg_speed) # (sensor, avgspeed) | beam.Map(lambda tup: '%s: %d' % tup)) traffic | beam.io.WriteStringsToPubSub(output_topic) p.run() Cloud Pub/Sub Cloud Dataflow BigQuery Cloud Storage Cloud Pub/Sub Cloud Storage 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Dataflow does ingest, transform, and load; consider using it instead of Spark
  • 77. 75 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Message-oriented architectures Serverless data pipelines GCP Reference Architecture Agenda 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Choosing where to store data on GCP unstructured Data analytics workload Transactional workload structured SQL Horizontal scalability No-SQL Millisecond Latency Latency in seconds Cloud Spanner Cloud SQL Cloud Storage Cloud Datastore Cloud Bigtable BigQuery One database enough
  • 78. 76 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Run Spark/Hadoop jobs on Cloud Dataproc Cloud Dataproc Input and Output Data Sources Cloud Storage Cloud Bigtable BigQuery Client Direct access API Applications on cluster Input and output connectors 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 On GCP, you can have the same data processing pipeline for processing both batch and stream Cloud Storage Raw logs, files, assets, Google Analytics data, and so on Events, metrics, and so on Stream Batch Bigtable B CA Cloud ML Engine Applications and Reports Cloud Datalab Data Studio Dashboards/BI Co-workers Cloud Pub/Sub Cloud Dataflow BigQuery
  • 79. 77 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Demo: Module Review 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Match the use case on the left with the product on the right Module review A. Decoupling producers and consumers of data in large organizations and complex systems B. Scalable, fault-tolerant multi-step processing of data 1. Cloud Dataflow 2. Cloud Pub/Sub
  • 80. 78 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Match the use case on the left with the product on the right Module review A. Decoupling producers and consumers of data in large organizations and complex systems B. Scalable, fault-tolerant multi-step processing of data 1. Cloud Dataflow 2. Cloud Pub/Sub 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Resources (1 of 2) Cloud Pub/Sub https://cloud.google.com/pubsub/ Cloud Dataflow https://cloud.google.com/dataflow/ Processing media using Cloud Pub/Sub and Compute Engine https://cloud.google.com/solutions/me dia-processing-pub-sub-compute-engine
  • 81. 79 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Resources (2 of 2) Reverse Geocoding of Geolocation Telemetry in the Cloud Using the Maps API https://cloud.google.com/solutions/reverse- geocoding-geolocation-telemetry-cloud-maps- ap Using Cloud Pub/Sub for Long-running Tasks https://cloud.google.com/solutions/us ing-cloud-pub-sub-long-running-tasks 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 Cloud OnBoard Cloud OnBoard Summary Version #1.1
  • 82. 80 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard An Evolving Cloud Your kit, someone else’s building. Yours to manage. 1st Wave Standard virtual kit,for rent. Still yours to manage. 2nd Wave Invest your energy in great apps 3rd Wave 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard Google Cloud provides a way to take advantage of Google’s investments in infrastructure and data processing innovation 2002 Cloud Storage 2004 2006 2008 2010 2012 2014 2016 DataProc Bigtable BigQuery Cloud Storage DataFlow DataStore DataFlow Pub/Sub ML Engine 2018 Auto ML Cloud Spanner 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  • 83. 81 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Typical Big Data Processing Programming Resource provisioning Performance tuning Monitoring Reliability Deployment & configuration Handling growing scale Utilization improvements 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Big Data with Google: Focus on insight, not infrastructure. Programming
  • 84. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard In summary, GCP offers you ways to... To get the most out of data and secure competitive advantage. Create citizen data scientists Transform your organization into a truly data driven company. Putting tools into hands of domain experts. Apply machine learning broadly and easily We make it simple and practical to incorporate machine learning models within custom applications. We’ve “automated out” the complexity of building and maintaining data and analytics systems. Spend less on ops and administration Incorporate real-time data into apps and architectures 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Today Google Cloud Platform Fundamentals: Big Data and Machine Learning Next Steps on your Google Cloud learning journey 1 2 3 Tomorrow Complete hands-on labs: Baseline: Data, ML, AI quest google.qwiklabs.com Future Find more training online cloud.google.com/training 82
  • 85. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Complete 10 hands-on labs free on Qwiklabs by 30 April, and receive $200 in GCP credits [Only for Cloud OnBoard Attendees] 1 2 3 Receive a follow up email after event Username Password Create Qwiklabs account with the email you used to register for Cloud OnBoard Open your email and confirm account 4 Return to Qwiklabs and log in 5 Enroll in the Baseline: Data, ML, AI quest and take your first lab! 6 Complete all 10 labs and we will send you an email after 30 April with instructions to redeem the $200 credits. Make sure you opt-in to receive emails from Qwiklabs. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard 1 2 3 Go to https://www.coursera.org/voucher/CloudOnBoardML Activate voucher and sign up for a free account Enroll in Serverless Data Analysis with Google BigQuery and Cloud Dataflow for Free -Limited period offer! Explore other Courses at Coursera.org/Googlecloud To help you get started Activate your voucher now for a free course worth $99! 83
  • 86. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Make Google Cloud certification your goal! cloud.google.com/certification Find study guides, tips, practice exams, and testing sites Confidential & Proprietary A special offer for Cloud Onboard Singapore attendees: Visit https://goo.gl/bmJwwk before May 18th to enroll, and eligible startups* receive $3,000 in Google Cloud and Firebase credits. Google Cloud Startup Program $3,000 g.co/cloudstartups cloud.startups@google.com Google Cloud is a perfect fit for launching and scaling your early-stage startup. What’s an eligible startup? • Raised no more than a Series A • Less than 5 years old • Are located in our approved countries • Have not participated in the Google Cloud Startup program before in credits 84
  • 87. 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Be part of the GCP User Group SG Community! or bit.ly/gcpusergroupsg Learn from leads, users, and tech expertsLearn Network, share, learn - all about Google CloudConnect Gain access to the Google Cloud team and the latest capabilities Access 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cloud OnBoard Resources Big data and machine learning blog https://cloud.google.com/blog/big-data/ Google Cloud Platform blog https://cloudplatform.googleblog.com/ Google Cloud Platform curated articles https://medium.com/google-cloud 85