5. Why are customers adopting cloud computing?
Variable expense
Replace capital
expenditure with variable
expense
Elastic capacity
No need to guess
capacity requirements
and over-provision
Speed and agility
Infrastructure in minutes
not weeks
Global Reach
Go global in minutes and
reach a global audience
6. Mobile
Push
Notifications
Mobile
Analytics
Cognito
Cognito
Sync
AWS Global Infrastructure
Your Applications
AWS Global Infrastructure11 Regions 30 Availability Zones 53 Edge Locations
Network
VPC
Direct
Connect
Route 53
API
Human Interaction
Support
Web Console
Interaction
Command Line
Libraries, SDK’s
Database
DynamoDBRDS ElastiCache
Deployment & Management
Elastic
Beanstalk
OpsWorks
Cloud
Formation
Code
Deploy
Code
Pipeline
Code
Commit
Security & Administration
CloudWatch Config
Cloud
Trail
IAM Directory KMS
Application
SQS SWF
App
Stream
Elastic
Transcoder
SES
Cloud
Search
SNS
Enterprise Applications
WorkSpaces WorkMail WorkDocs
Compute
EC2 ELB
Auto
Scaling
LambdaECS
Analytics
Kinesis
Data
Pipeline
RedShift EMR
Machine
Learning
Storage
EBS Glacier CloudFrontEFSS3
15. JDBC/ODBC
ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
16. v
• Column storage
• Data compression
• Zone maps
• With row storage you do unnecessary I/O
• To get average Amount by State, you have
to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Dramatically reduces I/O
17. v
• With column storage, you only
read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• Column storage
• Data compression
• Zone maps
Dramatically reduces I/O
18. v analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
• Column storage
• Data compression
• Zone maps
• COPY compresses automatically
• You can analyze and override
• More performance, less cost
Dramatically reduces I/O
19. v
• Column storage
• Data compression
• Zone maps
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Track the minimum and maximum
value for each block
• Skip over blocks that don’t contain
relevant data
Dramatically reduces I/O
20. v
Q3. What’s good about it?
Performance, Scalability, Ease of Use, Cost
21. v
Performance Evaluation on 2B Rows
Aggregate by month 02:08:35 00:35:46 00:00:12
Traditional
SQL Database
Amazon
Redshift
33. v
EMR
EMR ClusterS3
1. Put the
data into S3
2. Choose: Hadoop
distribution, # of nodes, types
of nodes, Hadoop apps like
Hive/Pig/HBase
4. Get the output
from S3
3. Launch the cluster using
the EMR console, CLI, SDK,
or APIs
47. Putting data into Kinesis
• Each shard
• 1000 Tx Per Second
• 1MB Per Second
• 50KB Payload Per Tx
• Messages kept for 24 hours
• Simple PUT interface to store data in Kinesis
• A Partition Key is used to distribute the PUTs across Shards
• A unique Sequence # is created
48. v
Getting data out of Kinesis
Kinesis Client Library (KCL):
• Abstracts code from individual shards
• Starts a Kinesis Worker for each shard
• Increases and decreases workers
• Tracks a Worker’s location in the stream
54. Three types of data-driven development
Retrospective
analysis and
reporting
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
55. Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing and
dashboards
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift,
Amazon RDS
Amazon S3
Amazon EMR
56. Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing and
dashboards
Predictions
to enable smart
applications
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift,
Amazon RDS
Amazon S3
Amazon EMR
57. v
Machine learning and smart applications
• Machine learning is the technology that automatically
finds patterns in your data and uses them to make
predictions for new data points as they become
available
58. v
Machine learning and smart applications
• Machine learning is the technology that automatically
finds patterns in your data and uses them to make
predictions for new data points as they become
available
Your data + machine learning = smart applications
59. v
Smart applications by example
Based on what you know
about the user:
Will they use your product?
60. v
Smart applications by example
Based on what you know
about the user:
Will they use your product?
Based on what you know
about an order:
Is this order fraudulent?
61. v
Smart applications by example
Based on what you know
about the user:
Will they use your product?
Based on what you know
about an order:
Is this order fraudulent?
Based on what you know about a
news article:
What other articles are
interesting?
62. v
Challenges to Building Smart Applications Today
Expertise Technology Operationalization
Limited supply of
data scientists
Many choices, few
mainstays
Complex and error-
prone data workflows
Expensive to hire
or outsource
Difficult to use and scale Custom platforms and
APIs
64. v
Amazon Machine Learning
• Easy to use, managed machine learning service
built for developers
• Robust, powerful machine learning technology
based on Amazon’s internal systems
• Create models using your data already stored in
the AWS cloud
• Deploy models to production in seconds
65. v
Easy to use and developer-friendly
• Use the intuitive, powerful service console to build and
explore your initial models
• Data retrieval
• Model training, quality evaluation, fine-tuning
• Deployment and management
• Automate model lifecycle with fully featured APIs and
SDKs
• Java, Python, .NET, JavaScript, Ruby, PHP
• Easily create smart iOS and Android applications with AWS
Mobile SDK
66. v
Powerful machine learning technology
• Based on Amazon’s battle-hardened internal systems
• Not just the algorithms:
• Smart data transformations
• Input data and model quality alerts
• Built-in industry best practices
• Grows with your needs
• Train on up to 100 GB of data
• Generate billions of predictions
• Obtain predictions in batches or real-time
67. v
Integrated with AWS Data Ecosystem
• Access data that is stored in Amazon S3, Amazon
Redshift, or MySQL databases in RDS
• Output predictions to Amazon S3 for easy integration
with your data flows
• Use AWS Identity and Access Management (IAM) for
fine-grained data-access permission policies
68. v
Fully-managed model and prediction services
• End-to-end service, with no servers to provision and
manage
• One-click production model deployment
• Programmatically query model metadata to enable
automatic retraining workflows
• Monitor prediction usage patterns with Amazon
CloudWatch metrics
69. v
Pay-as-you-go and inexpensive
• Data analysis, model training, and evaluation:
$0.42/instance hour
• Batch predictions: $0.10/1000
• Real-time predictions: $0.10/1000
• + hourly capacity reservation charge
70. v
Three Supported Types of Predictions
• Binary Classification
• Predict the answer to a Yes/No question
• Multi-class classification
• Predict the correct category from a list
• Regression
• Predict the value of a numeric variable
71. How Do I Get started Using
Amazon Machine Learning?
72. Get Started Quickly
• Create, access, and manage all Amazon
ML entities through the AWS
Management Console
• Easily learn to build a model with the
tutorial dataset provided
• Add prediction capabilities to your iOS
and Android applications with AWS
Mobile SDK
• Use Amazon ML APIs, CLIs, or SDKs
76. v
Train your model
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_ml_model(
ml_model_id=’my_model',
ml_model_type='REGRESSION',
training_data_source_id='my_datasource')
82. v
Batch predictions
• Asynchronous, large-volume prediction generation
• Request through service console or API
• Best for applications that deal with batches of data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_batch_prediction(
batch_prediction_id = 'my_batch_prediction’
batch_prediction_data_source_id = ’my_datasource’
ml_model_id = ’my_model',
output_uri = 's3://examplebucket/output/’)
83. v
Real-time predictions
• Synchronous, low-latency, high-throughput prediction generation
• Request through service API or server or mobile SDKs
• Best for interaction applications that deal with individual data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id=’my_model',
predict_endpoint=’example_endpoint’,
record={’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}
85. Batch predictions with Amazon EMR
Query for predictions with
Amazon ML batch API
Process data with
Amazon EMR
Raw data in
Amazon S3
Aggregated data
in Amazon S3
Predictions
in Amazon S3 Your application
86. Batch predictions with Amazon Redshift
Structured data
In Amazon Redshift
Load predictions into Amazon
Redshift
-or-
Read prediction results directly
from Amazon S3
Predictions
in Amazon S3
Query for predictions with
Amazon ML batch API
Your application
87. Real-time predictions for interactive applications
Your application
Query for predictions with
Amazon ML real-time API