Explore and learn!
Note: This share is to help people learn about Google cloud solutions. Myself or the company associated with have no other thoughts.
3. 1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Introducing
Google Cloud Platform:
Big Data and Machine Learning
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
What is Google Cloud Platform
Google Cloud Big Data products
Agenda
4. 2
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud computing is a continuation of a long-term
shift in how computing resources are managed
Now
2000
1980s
Next
First Wave
Server on-premises
You own everything.
It is yours to manage.
Second Wave
Data centers
You pay for the hardware
but rent the space.
Still yours to manage.
First Generation
Cloud Virtualized
data centers
You don’t rent hardware and
space, but still control
and configure virtual
machines. Pay for what
you provision.
Third Wave
Managed service
Completely elastic storage,
processing, and machine
learning so that you can
invest your energy in great
apps. Pay for what you use.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
What is Google Cloud Platform
Google Cloud Big Data products
Agenda
5. 3
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google’s mission is to organize
the world’s information and make
it universally accessible and
useful
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
To organize the world’s
information,Google has been
building the most powerful
infrastructure on the planet
6. 4
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Edge node locations >1000
Edge points of presence >100
Network
Network sea cable investments
In terms of hardware, Google Cloud has the largest cloud network, with
over 100 points of presence, and 100,000s of miles of fiber optic cable.
Unity (US, JP) 2010
Monet (US, BR) 2017
Tannat (BR, UY, AR) 2017
Junior (Rio, Santos) 2017
FASTER (US, JP, TW) 2016
PLCN (HK, LA) 2019
Indigo (SG, ID, AU) 2019
SJC (JP, HK, SG) 2013
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Future region and
number of zones
The network connects 15 regions,
with 3 more coming
Current region and
number of zones
S Carolina
N Virginia
Oregon
Iowa
Montreal
Los Angeles
3
4
3
3
3
3
Frankfurt
Belgium
London
FinlandNetherlands 3
33
2
3
São Paulo
3
Taiwan
Singapore
Mumbai
Sydney
Tokyo
3
3
2
3
3
3
HongKong 3
2
3
7. 5
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In terms of software, organizing the world’s information
has meant that Google needed to invent data processing methods
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
Bigtable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
Pub/Sub
F1
http://research.google.com/pubs/papers.html
TPU
2018
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud opens up that innovation and infrastructure to you
2002 2004 2006 2008 2010 2012 2014 2016
ML Engine
Pub/Sub
Dataflow
Datastore
Dataflow
Cloud Storage
BigQuery
Bigtable
Dataproc
Cloud Storage
2018
Auto ML
Cloud Spanner
8. 6
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A suite of products that can be put together for data processing
Foundation Databases
Data-handling
frameworks
Analytics and ML
Cloud
Storage
Cloud
Bigtable
BigQuery
Cloud Dataproc
Compute
Engine
Cloud SQL
Cloud
Datalab Cloud Dataflow
Cloud Pub/Sub
Cloud
Spanner
ML APIs
...
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Spotify illustrates the typical journey of companies that come to
Google Cloud: From lower costs to increased reliability to business
transformation
2
3
1
Spend less
No-ops, Pay
for use, Secure
Flexible
Complete
Innovative
Powerful
9. 7
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A suite of products that can be put together for data processing
Change how you computeChange where you compute
Improve scalability
and reliability
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Atomic Fiction lowered their costs with per-minute
(now per-second) billing
https://www.youtube.com/watch?v=mBY-RjE15WA
Change where you compute
10. 8
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FIS was able to improve reliability and scalability
on a massive data-processing challenge
6 BILLION
MARKET EVENTS
WRITTEN PER HOUR
1.7 GIGs
PER SECOND
PER HOUR
6 TBs
10 BN
WRITTEN
PER HOUR
BURSTS
1.7 GIGABYTES
PER SECOND
10 TERABYTES
PER HOUR
The Consolidated Audit Trail (CAT) is a data repository of all equities and options
orders, quotes, and events; FIS processed the CAT to organize 100 billion market events
into an “order lifecycle” in a 4-hour window using Cloud Bigtable.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Rooms to Go transformed its business with data and machine learning
https://www.thinkwithgoogle.com/case-studies/rooms-to-go-improves-the-shopper-experience.html
BigQuery
Analyze
CRM
Customer Relationship Manager
customer demographics, past purchases
Rooms
to Go
Google Analytics
Premium
landing pages,
views
Combine
data
Collect
data
completely
designed room
packages
11. 9
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In summary, Google Cloud offers you ways to…
Become a truly
data-driven
company
Apply machine
learning broadly
and easily
Spend less
on ops and
administration
Incorporate real-
time data into
apps and
architectures
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
12. 10
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Platform is:
(select all of the correct options)
Cloud OnBoard
Module review
Operated by Google on the same
infrastructure it uses
A set of modular services from
which you can compose cloud-based
applications
Most cost-effective if you pre-
purchase instances on a yearly
basis
A platform on which to host
scalable and fast distributed
applications
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Platform is:
(select all of the correct options)
Cloud OnBoard
Module review
Operated by Google on the same
infrastructure it uses
A set of modular services from
which you can compose cloud-based
applications
Most cost-effective if you pre-
purchase instances on a yearly
basis
A platform on which to host
scalable and fast distributed
applications
13. 11
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Resources
Google Cloud Platform https://cloud.google.com/
Datacenters https://www.google.com/about/datacenters/
Google IT security https://cloud.google.com/files/Google-
CommonSecurity-WhitePaper-v1.4.pdf
Why Google Cloud
Platform?
https://cloud.google.com/why-google/
Pricing Philosophy https://cloud.google.com/pricing/philosophy/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Compute & Storage
Fundamentals
Version #1.1
14. 12
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
CPUs on demand + Demo
A global filesystem + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google Cloud provides an earth-scale computer
Data storage
Compute power
Networking
16. 14
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In this demo, we will :
1. Create a Compute Engine instance
2. SSH into the instance
3. Install the software package git
(for source code version control)
Cloud OnBoard
Demo : Create a Compute Engine Instance
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
CPUs on demand + Demo
A global filesystem + Demo
Agenda
17. 15
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Use Cloud Storage for persistent storage and as staging
ground for import to other Google Cloud products
31
Compute
Engine + Disk
Cloud SQL
BigQuery
Dataproc
Cloud Storage
Store/StageTransform
Raw data (any format)
4
Ingest/ Extract
Load
2
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Create a bucket and copy the data over using the Cloud
SDK; blobs are referenced through a gs://.../ URL
sales*.csv gs://acme-sales/data/
Copy
Google Cloud Platform Project
Bucket
Objects
Data and
metadata
Bucket
Objects
Data and
metadata
gsutil cp
18. 16
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud Storage gives you durability,
reliability, and global reach
Cloud
Storage
Cloud
SQL
Compute
Engine
Transfer Services
are useful for ingest
Ingest
Store
Import
Control access at project,
bucket and/or object level
Publish
Use Cloud Storage
as staging area
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Control latency and availability
with zones and regions
Choose the closest
zone/region so as
to to reduce latency.
Region: North America
Zone: us-central1-a
...
Region: Europe
Zone: europe-west1-b
...
Region: ...
Zone: ...
...
Distribute your apps and data across
regions for global availability.
Distribute your apps
and data across zones
to reduce service
disruptions.
19. 17
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Interact with
Cloud Storage
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In this demo, we carry out the steps of an ingest-
transform-and-publish data pipeline manually
1. Ingest data into a Compute Engine instance
2. Transform data on the Compute Engine instance
3. Store the transformed data on Cloud Storage
4. Publish Cloud Storage data to the web
Cloud OnBoard
Demo : Interact with Cloud Storage
20. 18
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Ingest-Transform-Publish
using core infrastructure
Cloud
Storage
Cloud
SQL
Compute
Engine
Step 1
Ingest/
Extract
Store
Import
Publish
Step 2 Step 3
Step 4
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud Shell gives you an easy command-line
Cloud Shell comes pre-installed with the tools, libraries,
and so on you need to interact with Google Cloud Platform
Click
Do Now
21. 19
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Compute nodes on GCP are:
(select the correct option)
❏ Allocated on demand, and you pay for the time that they are up.
❏ Expensive to create and teardown
❏ Pre-installed with all the software packages you might ever need.
❏ One of ~50 choices in terms of CPU and memory
Cloud OnBoard
Module review (1 of 2)
22. 20
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Compute nodes on GCP are:
(select the correct option)
➔ Allocated on demand, and you pay for the time that they are up.
❏ Expensive to create and teardown
❏ Pre-installed with all the software packages you might ever need.
❏ One of ~50 choices in terms of CPU and memory
Cloud OnBoard
Module review answers (1 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Storage is a good option for storing data that:
(select all of the correct options)
❏ Is ingested in real-time from sensors and other devices
❏ Will be frequently read/written from a compute node
❏ May be required to be read at some later time
❏ May be imported into a cluster for analysis
Cloud OnBoard
Module review (2 of 2)
23. 21
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Storage is a good option for storing data that:
(select all of the correct options)
❏ Is ingested in real-time from sensors and other devices
❏ Will be frequently read/written from a compute node
➔ May be required to be read at some later time
➔ May be imported into a cluster for analysis
Cloud OnBoard
Module review (2 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Resources
Google Cloud Platform https://cloud.google.com/compute/
Datacenters https://cloud.google.com/storage/
Pricing https://cloud.google.com/pricing/
Cloud Launcher https://cloud.google.com/launcher/
Pricing Philosophy https://cloud.google.com/pricing/philosophy/
24. 22
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Data Analysis
on the Cloud
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
25. 23
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google Cloud Platform began in 2008, with App Engine,
a serverless way to run web applications
1
Develop
2
Upload
3
Autoscales Reliable
App Engine
Your code
http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html
http://googleappengine.blogspot.com/2013/05/the-google-app-engine-blog-is-moving.html
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Compute
Engine
Container
Engine
App Engine
Flex
App Engine
There [was] something fundamentally
wrong with what we were doing in 2008
… We didn't get the right stepping
stones into the cloud …
-- Eric Schmidt, Executive Chairman, Google
26. 24
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GCP now consists of a suite of products that together provide these
stepping stones in a business’ transformative journey
Change how you computeChange where you compute
Flexibility, scalability
and reliability
Cost effective virtual machines,
storage, Hadoop, and MySQL to
migrate your current workloads to
the public cloud.
Reliable, autoscaling messaging,
data processing, and storage.
Fully managed products for data
warehousing, data analysis,
streaming, and machine learning.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Machine learning. This is the next
transformation … the programming
paradigm is changing. Instead of
programming a computer, you teach a
computer to learn something and it
does what you want.
Cloud OnBoard
Eric Schmidt,
Executive Chairman,
Google
27. 25
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
“If you want to teach a neural network to
recognize a cat, for instance, you don’t
tell it to look for whiskers, ears, fur,
and eyes. You simply show it thousands
and thousands of photos of cats, and
eventually it works things out.”
WIRED’s headline
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Search
People who bought ...
Spam filtering
Suggest next video
Route planning
Smart Reply
Cloud OnBoard
Machine Learning is not new,
but it is now mainstream
What’s common to all of
these use cases of Machine
Learning?
?
28. 26
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
There are three components in a recommendation system
RecommendingRating Training
Users rate a few houses
explicitly or implicitly
A machine learning model is
created to predict a user’s
rating of a house
For each user, the model is
applied to every unrated
house and the top 5 houses
for that user are saved.
What else is needed??
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The ML algorithm essentially clusters users and items
How often do you need to compute
the predicted ratings?
Where would you save them?
2
3
Who is like this user? Is this a good house?
Predict rating
Is this house similar to houses that
people similar to this user like?
Predicted rating = user-preference *
item-quality
1
?
29. 27
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In addition to the ML algorithm, you also need
sophisticated data management
Data Collection
Data Analysis
Machine Learning
Serving
Scalable front end to collect customer actions
Data that is accessible and not silo-ed
(Re-)training and experimentation
Scalable, real-time system to serve
recommendations
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
30. 28
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Choose your storage solution based on your access pattern
Capacity
Access
metaphor
Read
Write
Update
granularity
Usage
Petabytes +
Like files in a
file system
Have to copy to
local disk
One file
An object
(a “file”)
Store blobs
Gigabytes
Relational
database
SELECT rows
INSERT row
Field
No-ops SQL
database on
the cloud
Terabytes
Persistent
Hashmap
Filter objects
on property
put object
Attribute
Structured
data from
AppEngine apps
Petabytes
Key-value(s),
HBase API
scan rows
put row
Row
No-ops, high
throughput,
scalable,
flattened data
Petabytes
Relational
SELECT rows
Batch/stream
Field
Interactive SQL*
querying fully
managed warehouse
Cloud
Storage
Cloud SQL Datastore Bigtable BigQuery
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud SQL is a fully managed database service
Cloud SQL
Google-managed
MySQL or Postgres
Flexible pricing
Familiar
Managed backups
Automatic replication
Fast connection from GCE & GAE
Connect from anywhere
Google Security
31. 29
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Set up rentals data
in Cloud SQL
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we populate rentals data in Cloud
SQL for the recommendation engine to use:
1. Create Cloud SQL instance
2. Create database tables by importing .sql
files from Cloud Storage
3. Populate the tables by importing .csv
files from Cloud Storage
4. Allow access to Cloud SQL
5. Explore the rentals data using SQL
statements from Cloud Shell
Demo: Setup rentals data in Cloud SQL
Cloud
Storage
Cloud SQL
External
machine
Import
32. 30
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
There is a rich open-source ecosystem for big data
http://hadoop.apache.org/
http://pig.apache.org/
http://hive.apache.org/
http://spark.apache.org/
33. 31
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataproc reduces the cost and complexity associated with
Spark and Hadoop clusters
Dataproc
Google-managed:
Hadoop
Pig
Hive
Spark
Image Versioning
Familiar
Resize in seconds
Automated cluster mgmt
Integrates with Google Cloud
Flexible VMs
Google Security
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Recommendations ML
with Dataproc
34. 32
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we implement
machine learning recommendations
using Cloud Dataproc:
1. Launch Dataproc
2. Train and apply ML model
written in PySpark to create
product recommendations
3. Explore inserted rows in
Cloud SQL
Dataproc
Cloud SQL
Train
model
Show
recommendations
1
Demo: Recommendations ML with Cloud Dataproc
2
3
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
35. 33
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Relational databases are a good choice when you need:
(select all of the correct options)
❏ Streaming, high-throughput writes
❏ Fast queries on terabytes of data
❏ Aggregations on unstructured data
❏ Transactional updates on relatively small datasets
Module review (1 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Relational databases are a good choice when you need:
(select all of the correct options)
❏ Streaming, high-throughput writes
❏ Fast queries on terabytes of data
❏ Aggregations on unstructured data
✓ Transactional updates on relatively small datasets
Module review (1 of 2)
36. 34
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
(select all of the correct options)
❏ It’s the same API, but Google implements it better
❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on
❏ Fully-managed versions of the software offer no-ops
❏ Running it on Google infrastructure offers reliability and cost savings
Module review (2 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
(select all of the correct options)
❏ It’s the same API, but Google implements it better
❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on
✓ Fully-managed versions of the software offer no-ops
✓ Running it on Google infrastructure offers reliability and cost savings
Module review (2 of 2)
38. 36
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Choosing where to store data on GCP
unstructured
Data analytics
workload
Transactional
workload
structuredNeed
MOBILE
SDKs
SQL
Horizontal
scalability
No-SQL Millisecond
Latency
Latency in
secondsCloud
Spanner
Cloud
SQL
Firebase
Storage
Cloud
Storage
Firebase
Realtime DB
Cloud
Datastore
Cloud
Bigtable
BigQuery
Need
MOBILE
SDKs
One
database
enough
39. 37
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Use cloud spanner if you need globally consistent data or more
than one Cloud SQL instance
Source:
https://quizlet.com/blog/
quizlet-cloud-spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
40. 38
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Bigtable is meant for high throughput data where access is primarily
for a range of Row Key prefixes
NASDAQ#1426535612045
...
MD:SYMBOL:
ZXZZT
...
Row Key Column data
MD:LASTSALE:
600.58
...
MD:LASTSIZE:
300
...
MD:TRADETIME:
1426535612045
...
MD:EXCHANGE:
NASDAQ
...
Tables should be tall and narrow
Store changes as new rows
Bigtable will automatically
compact the table
41. 39
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Short meaningful column names reduce storage and RPC overhead
NASDAQ#1426535612045
MD:SYMBOL:
ZXZZT
Row Key Column data
MD:LASTSALE:
600.58
MD:LASTSIZE:
300
MD:TRADETIME:
1426535612045
MD:EXCHANGE:
NASDAQ
Design row key with most
common query in mind
Design row key to minimize hotspots
Column families is a quick
way to get some hierarchy
Use short column names
Designed for sparse tables
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Can work with Bigtable using the HBase API
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;
byte[] CF = Bytes.toBytes("MD"); // column family
Connection connection = ConnectionFactory.createConnection(...)
Table table = null;
try {
table = connection.getTable(TABLE_NAME);
Put p = new Put(Bytes.toBytes("NASDAQ#GOOG #1234561234561"));
p.addColumn(CF, Bytes.toBytes("SYMBOL"), Bytes.toBytes("GOOG"));
p.addColumn(CF, Bytes.toBytes("LASTSALE"), Bytes.toBytes(742.03d));
...
table.put(p);
} finally {
if (table != null) table.close();
}
42. 40
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
43. 41
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
BigQuery is a fully managed data warehouse that lets you do ad-hoc
SQL queries on massive volumes of data
Project X
Dataset A Dataset B
Project Y
Dataset C Dataset D
Table 1
Table 2
Table 1
Table 2
Table 1
Table 2
Table 1
Table 2
BigQuery Service
44. 42
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A demo of BigQuery on a 10 billion-row dataset shows what it is
and what it can do
#standardsql
SELECT
language, SUM(views) as views
FROM `bigquery-samples.wikipedia_benchmark.Wiki10B`
WHERE
title like "%google%"
GROUP by language
ORDER by views DESC
Familiar, SQL 2011 query
language
Interactive ad-hoc analysis
of petabyte-scale databases
No need to provision
clusters
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Three ways of loading data into BigQuery
POST
Serverless
ETL
CSV
JSON
AVRO
Google
Sheets
Files on disk or Cloud
Storage
Stream Data Federated data source
45. 43
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
With Federated data sources, you can directly query files on
Cloud Storage, without having to ingest them into BigQuery
Also: Google Drive, Bigtable
Also: JSON/Avro/Google Sheet
Can also pass in a schema
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
46. 44
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
47. 45
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Increasingly, data analysis and machine learning are carried
out in self-descriptive, shareable, executable notebooks
Code
Output
Markup
Share
A typical notebook
contains code,
charts, and
explanations
Image Source:
Git Logo from
Wikipedia
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab is an open-source notebook built on Jupyter (IPython)
Analyze data in BigQuery,
Compute Engine or Cloud Storage
Use existing
Python packages
Datalab is free—just pay
for Google Cloud resources
48. 46
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab notebooks are developed in an iterative, collaborative process
Development
Process in
Cloud Datalab
PHASE 1
Write code in
Python
PHASE 5
Share and
collaborate
PHASE 2
Run cell
(Shift+Enter)
PHASE 4
Write
commentary in
markdown
PHASE 3
Examine Output
1
3
4
5
2
5
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab supports BigQuery
%%sql
To
Pandas
49. 47
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Create ML dataset
with BigQuery
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we use BigQuery to create a
dataset that we later use to build a taxi
demand forecast system using Machine Learning.
● What kinds of things affect taxi demand?
● What are some ways to measure “demand”?
Demo: Create ML dataset with
BigQuery
50. 48
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo: Create ML dataset with BigQuery
In this demo, we use BigQuery to create a dataset that we later
use to build a taxi demand forecast system using Machine Learning.
1. Use BigQuery and Datalab to explore and visualize data
2. Build a Pandas dataframe that will be used as the training
dataset for machine learning using TensorFlow
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
51. 49
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Match the use case on the left with the product on the right
Global consistency needed
High-throughput writes of wide-column data
Warehousing structured data
Develop Big Data algorithms interactively in Python
1. Datalab
2. BigTable
3. BigQuery
4. Spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Match the use case on the left with the product on the right
Global consistency needed (4)
High-throughput writes of wide-column data (2)
Warehousing structured data (3)
Develop Big Data algorithms interactively in Python (1)
1. Datalab
2. BigTable
3. BigQuery
4. Spanner
53. 51
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
TensorFlow is an open source library that underlies many Google products
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo: Playing with neural networks to learn what they are
http://playground.tensorflow.org/
54. 52
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Supervised machine learning requires features and labels
…
…
Neural Network
…
Input
features Prediction
Cost
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Machine Learning with TensorFlow involves four steps:
1
Gather
Data
Gather training data (input features and labels)
Create model
Train the model based on input data
Use the model on new data
2
Create
3
Train
4
Use
55. 53
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Gather training data and select input features
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
Input features
targetdiscard
1
Gather
Data
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
All input features need to be numeric
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
Gather
Data
One-hot encodingUse as-is
56. 54
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Create a neural network model, defining the number of feature columns
and hidden units
2
Create
npredictors
nhidden
noutputs
…
…
estimator = DNNRegressor(hidden_units=[5], feature_columns=[...])
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
3
Train
estimator.fit(predictors, targets, steps=1000)
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
…
…
Predicted
value of
taxicab
demand
model
True value of
taxicab
demand
CostUpdate
model based
on Cost
npredictors
57. 55
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
4
Use
True value of
taxicab demand
rain
Max temp
…
…
Predicted value
of taxicab
demand
Update model
based on
Cost
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
model
Cost
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
input = pd.DataFrame.from_dict(data =
{'dayofweek' : [4, 5, 6],
'mintemp' : [60, 15, 60],
'maxtemp' : [80, 80, 65],
'rain' : [0, 0.8, 0]})
# read trained model from /tmp/trained_model
estimator = DNNRegressor(model_dir='/tmp/trained_model',
hidden_units=[5])
pred = estimator.predict(input.values)
print pred
4
Use
58. 56
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo 2 Part 2:
Carry out ML
with TensorFlow
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo 2, Part 2: Carry out ML with TensorFlow
Neural Network
Inputs Prediction
In this demo, we build a neural network to predict taxicab demand
on a day-by-day basis using TensorFlow.
59. 57
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Machine learning with TensorFlow + Demo
Pre-built machine learning models + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The accuracy of a ML problem is driven largely by the size and quality
of the dataset; this is why ML requires massive compute
Size of dataset
Scale of Compute Problem
Accuracy
https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
60. 58
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CloudML Engine simplifies the use of Distributed TensorFlow
...
...
...
.
.
.
.
.
.Size of
dataset
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ML APIs are pre-trained ML models (trained off Google’s data) for common
tasks; they are accessible through REST APIs
Use your own data to train models Machine Learning as an API
Cloud
Vision API
Cloud
Translation API
Cloud
Natural Language
API
Cloud
Speech API
Cloud Machine
Learning Engine
TensorFlow
Cloud Video
Intelligence
62. 60
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Web annotations
{
"entityId": "/m/016ms7",
"score": 1.44038,
"description": "Ford Anglia"
}
{
"entityId": "/m/0gff2yr",
"score": 5.92256,
"description": "ArtScience Museum"
}
{
"entityId": "/m/0h898pd",
"score": 7.4162,
"description": "Harry Potter (Literary Series)"
}
CC-BY 2.0 Rev Stan: https://www.flickr.com/photos/revstan/6865880240
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Try it in the browser with your own images
cloud.google.com/vision
63. 61
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The Translation API supports 100+ languages
https://cloud.google.com/translate/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Wootric uses the Cloud Natural Language API (entity and sentiment) to
make sense of qualitative customer feedback
64. 62
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Extracted entities are tied into a knowledge graph
Joanne "Jo" Rowling, pen names J. K. Rowling and Robert Galbraith,
is a British novelist, screenwriter and film producer best known as
the author of the Harry Potter fantasy series
{
"name": "Joanne 'Jo' Rowling",
"type": "PERSON",
"metadata": {
"mid": "/m/042xh",
"wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
}
{
"name": "British",
"type": "LOCATION",
"metadata": {
"mid": "/m/07ssc",
"wikipedia_url": "http://en.wikipedia.org/wiki/United_Kingdom"
}
{
"name": "Harry Potter",
"type": "PERSON",
"metadata": {
"mid": "/m/078ffw",
"wikipedia_url":
"http://en.wikipedia.org/wiki/Harry_Potter"
}
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
When you analyze sentiment, you get a score (positive/negative) as well
as a magnitude (how intense?)
{
"documentSentiment": {
"score": 0.8,
"magnitude": 0.8
}
}
The food was excellent, I would definitely go back!
65. 63
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The Cloud Speech API can be used to transcribe audio to text
http://cloud.google.com/speech
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Like the Vision API, the Video Intelligence API can identify labels in a
video, along with a timestamp
{
"description": "Bird's-eye view",
"language_code": "en-us",
"locations": {
"segment": {
"start_time_offset": 71905212,
"end_time_offset": 73740392
},
"confidence": 0.96653205
}
}
https://cloud.google.com/video-intelligence/
66. 64
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo 2 Part 3:
Machine Learning APIs
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Use several of the Machine Learning
APIs (Vision, Translate, Natural
Language Processing, Speech) from
Python
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo 2, Part 3: Machine
Learning APIs
67. 65
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
“How much is this car worth?”
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
“Hi Ocado,
I love your website. I have children so it’s
easier for me to do the shopping online.
Many thanks for saving my time!
Regards”
Feedback Customer is happy
Improves natural
language processing
of customer service
claims
“Thanks to the Google Cloud Platform, Ocado was able to use
the power of cloud computing and train our models in
parallel.”
68. 66
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
50%
of enterprises will be
spending more per annum
on bots and chatbot
creation than traditional
mobile app development by
2021 – Gartner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Use
Dialogflow to
create a new
shopping
experience
Custom image
model to
price cars
Build off NLP
API to route
customer
emails
Use Vision
API as-is to
find text in
memes
69. 67
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Confidential & Proprietary
UPDATEDEPLOYEVALUATETUNE ML MODEL
PARAMETERS
ML MODEL DESIGN
DATA
PREPROCESSING
Introducing Cloud AutoML
A technology that can automatically create a Machine Learning Model
UPDATEDEPLOYEVALUATETUNE ML MODEL
PARAMETERS
ML MODEL
DESIGN
DATA
PREPROCESSING
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Handbag Shoe Hat
Cloud AutoML
Cloud AutoML Vision
Model is now trained and ready to make prediction.
This model can scale as needed to adapt to customer demands.
Upload and label
images
Train your model
in minutes or one day Evaluate
70. 68
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Match the use case on the left with the
product on the right
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Create, test new machine learning methods
No-ops, custom machine learning applications at scale
Automatically reject inappropriate image content
Build application to monitor Spanish twitter feed
Transcribe customer support calls
1. Vision API
2. TensorFlow
3. Speech API
4. Cloud ML
5. Translation API
71. 69
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Match the use case on the left with the
product on the right
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Create, test new machine learning methods (2)
No-ops, custom machine learning applications at scale (4)
Automatically reject inappropriate image content (1)
Build application to monitor Spanish twitter feed (5)
Transcribe customer support calls (3)
1. Vision API
2. TensorFlow
3. Speech API
4. Cloud ML
5. Translation API
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (1 of 2)
Cloud Spanner https://cloud.google.com/spanner/
Cloud Bigtable https://cloud.google.com/bigtable/
Google BigQuery https://cloud.google.com/bigquery/
Cloud Datalab https://cloud.google.com/datalab/
TensorFlow https://www.tensorflow.org/
72. 70
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (2 of 2)
Cloud Machine Learning https://cloud.google.com/ml/
Vision API https://cloud.google.com/vision/
Translation API https://cloud.google.com/translate/
Speech API https://cloud.google.com/speech/
Video Intelligence API
https://cloud.google.com/video-
intelligence
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Data Processing Architecture
Cloud OnBoard
Cloud OnBoard
73. 71
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Asynchronous processing is useful for
long-lived tasks or to have loose
coupling between two systems
P1 P2 P3 Producers
C1 C2 C3 Consumers
Potential use cases:
1. Send an SMS
2. Train ML model
3. Process data from multiple sources
4. Weekly reports …
Message
Queue
74. 72
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
For robust asynchronous processing, you need:
P1 P2 P3
C1 C2 C3
1. A global, highly available queue
2. Scale without over-provisioning
4. Reliable delivery of messages
3. Queue
must be
interoperable
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Pub/Sub provides a no-ops, serverless global message queue
75. 73
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataflow offers NoOps data pipelines in Java and Python
p = beam.Pipeline(options=options)
p.run();
traffic = lines | beam.Map(parse_data).with_output_types(unicode)
| beam.GroupByKey() # (sensor, [speed])
output = traffic | beam.io.WriteToText(‘gs://...]’)
lines = p | beam.io.ReadFromText(‘gs://…’)
Each of these steps is run
in parallel and autoscaled
by execution framework
Open-source API (Apache
Beam) can be executed on
Flink, Spark, etc. also
Group
Transform 3
Transform 4
Write
Read
Transform 1
Input
Output
Transform 2 | beam.Map(get_speedbysensor) # (sensor, speed)
| beam.Map(avg_speed) # (sensor, avgspeed)
| beam.Map(lambda tup: '%s: %d' % tup))
Map
Group-By
Reduce
77. 75
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Choosing where to store data on GCP
unstructured
Data analytics
workload
Transactional
workload
structured
SQL
Horizontal
scalability
No-SQL
Millisecond
Latency
Latency in
seconds
Cloud
Spanner
Cloud
SQL
Cloud
Storage
Cloud
Datastore
Cloud
Bigtable
BigQuery
One
database
enough
78. 76
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Run Spark/Hadoop jobs on Cloud Dataproc
Cloud
Dataproc
Input and Output
Data Sources
Cloud
Storage
Cloud
Bigtable
BigQuery
Client
Direct
access
API
Applications on
cluster
Input and
output
connectors
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
On GCP, you can have the same data processing pipeline for
processing both batch and stream
Cloud
Storage
Raw logs,
files, assets,
Google
Analytics data,
and so on
Events,
metrics,
and so on Stream
Batch
Bigtable
B CA
Cloud ML
Engine
Applications
and Reports
Cloud
Datalab
Data Studio
Dashboards/BI
Co-workers
Cloud
Pub/Sub
Cloud
Dataflow
BigQuery
79. 77
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Match the use case on the left with the product on the right
Module review
A. Decoupling producers and consumers of data
in large organizations and complex systems
B. Scalable, fault-tolerant multi-step
processing of data
1. Cloud Dataflow
2. Cloud Pub/Sub
80. 78
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Match the use case on the left with the product on the right
Module review
A. Decoupling producers and consumers of data
in large organizations and complex systems
B. Scalable, fault-tolerant multi-step
processing of data
1. Cloud Dataflow
2. Cloud Pub/Sub
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (1 of 2)
Cloud Pub/Sub https://cloud.google.com/pubsub/
Cloud Dataflow https://cloud.google.com/dataflow/
Processing media using
Cloud Pub/Sub and
Compute Engine
https://cloud.google.com/solutions/me
dia-processing-pub-sub-compute-engine
81. 79
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (2 of 2)
Reverse Geocoding of
Geolocation Telemetry
in the Cloud Using the
Maps API
https://cloud.google.com/solutions/reverse-
geocoding-geolocation-telemetry-cloud-maps-
ap
Using Cloud Pub/Sub for
Long-running Tasks
https://cloud.google.com/solutions/us
ing-cloud-pub-sub-long-running-tasks
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Summary
Version #1.1
82. 80
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
An Evolving Cloud
Your kit, someone
else’s building.
Yours to manage.
1st Wave
Standard virtual
kit,for rent.
Still yours to manage.
2nd Wave
Invest your energy
in great apps
3rd Wave
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
Google Cloud provides a way to take advantage of Google’s
investments in infrastructure and data processing innovation
2002
Cloud
Storage
2004 2006 2008 2010 2012 2014 2016
DataProc Bigtable BigQuery
Cloud
Storage
DataFlow
DataStore
DataFlow
Pub/Sub
ML Engine
2018
Auto ML
Cloud
Spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
84. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In summary, GCP offers you ways to...
To get the most
out of data and
secure competitive
advantage.
Create citizen
data scientists
Transform your
organization into
a truly data driven
company. Putting
tools into hands of
domain experts.
Apply machine
learning broadly
and easily
We make it simple and
practical to
incorporate machine
learning models
within custom
applications.
We’ve “automated
out” the complexity
of building and
maintaining data
and analytics
systems.
Spend less on ops
and administration
Incorporate real-time
data into apps and
architectures
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Today
Google Cloud Platform
Fundamentals: Big Data
and Machine Learning
Next Steps on your Google Cloud learning journey
1 2 3
Tomorrow
Complete hands-on labs:
Baseline: Data, ML, AI quest
google.qwiklabs.com
Future
Find more training online
cloud.google.com/training
82
85. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Complete 10 hands-on labs free on Qwiklabs
by 30 April, and receive $200 in GCP credits
[Only for Cloud OnBoard Attendees]
1
2
3
Receive a follow up email after event
Username
Password
Create Qwiklabs account with the email
you used to register for Cloud OnBoard
Open your email and confirm account
4 Return to Qwiklabs and log in
5 Enroll in the Baseline: Data, ML, AI quest and
take your first lab!
6
Complete all 10 labs and we will send you an
email after 30 April with instructions to redeem
the $200 credits. Make sure you opt-in to receive
emails from Qwiklabs.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
1
2
3
Go to
https://www.coursera.org/voucher/CloudOnBoardML
Activate voucher and sign
up for a free account
Enroll in Serverless Data
Analysis with Google BigQuery
and Cloud Dataflow for Free
-Limited period offer!
Explore other Courses at
Coursera.org/Googlecloud
To help you get started
Activate your voucher now for a free course worth $99!
83
86. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Make Google Cloud certification your goal!
cloud.google.com/certification
Find study guides, tips, practice
exams, and testing sites
Confidential & Proprietary
A special offer for Cloud Onboard Singapore attendees:
Visit https://goo.gl/bmJwwk before May 18th to enroll, and eligible
startups* receive $3,000 in Google Cloud and Firebase credits.
Google Cloud Startup Program
$3,000
g.co/cloudstartups
cloud.startups@google.com
Google Cloud is a perfect fit for launching
and scaling your early-stage startup. What’s an eligible
startup?
• Raised no more than a Series A
• Less than 5 years old
• Are located in our approved
countries
• Have not participated in
the Google Cloud Startup
program before
in credits
84
87. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Be part of the
GCP User Group SG Community!
or bit.ly/gcpusergroupsg
Learn from leads, users,
and tech expertsLearn
Network, share, learn -
all about Google CloudConnect
Gain access to the
Google Cloud team
and the latest
capabilities
Access
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Resources
Big data and machine learning blog https://cloud.google.com/blog/big-data/
Google Cloud Platform blog https://cloudplatform.googleblog.com/
Google Cloud Platform curated articles https://medium.com/google-cloud
85