SlideShare a Scribd company logo
1 of 90
Download to read offline
v
Chris Hampartsoumian
Technology Evangelist - ASEAN
End to End Data Flows on the Cloud
Structured, Unstructured & Streaming
July 2015
How is Cloud Computing important for Big Data
Applications?
v
?
…get into cloud computing?
How did Amazon…
11 Regions
30 Availability Zones
53 Edge locations
AWS Global Infrastructure
Why are customers adopting cloud computing?
Variable expense
Replace capital
expenditure with variable
expense
Elastic capacity
No need to guess
capacity requirements
and over-provision
Speed and agility
Infrastructure in minutes
not weeks
Global Reach
Go global in minutes and
reach a global audience
Mobile
Push
Notifications
Mobile
Analytics
Cognito
Cognito
Sync
AWS Global Infrastructure
Your Applications
AWS Global Infrastructure11 Regions 30 Availability Zones 53 Edge Locations
Network
VPC
Direct
Connect
Route 53
API
Human Interaction
Support
Web Console
Interaction
Command Line
Libraries, SDK’s
Database
DynamoDBRDS ElastiCache
Deployment & Management
Elastic
Beanstalk
OpsWorks
Cloud
Formation
Code
Deploy
Code
Pipeline
Code
Commit
Security & Administration
CloudWatch Config
Cloud
Trail
IAM Directory KMS
Application
SQS SWF
App
Stream
Elastic
Transcoder
SES
Cloud
Search
SNS
Enterprise Applications
WorkSpaces WorkMail WorkDocs
Compute
EC2 ELB
Auto
Scaling
LambdaECS
Analytics
Kinesis
Data
Pipeline
RedShift EMR
Machine
Learning
Storage
EBS Glacier CloudFrontEFSS3
v
Structure
LowHigh
Large
Small
Size
Traditional
Database
Hadoop
NoSQL
MPP Database
UnstructuredStructured Streaming
MPP Databases
Amazon Redshift
Hadoop
Amazon EMR
Real-time Analysis
Amazon Kinesis
v
• Standard SQL
• Optimized for fast analysis
• Very scalable
v
Amazon Redshift
v
Q1. What is it?
v
MPP SQL Database
Optimised for Analytics
Gigabytes to Petabytes
Fully relational
Fully managed
Amazon
Redshift
v
Q2. How does it work?
JDBC/ODBC
JDBC/ODBC
ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
v
• Column storage
• Data compression
• Zone maps
• With row storage you do unnecessary I/O
• To get average Amount by State, you have
to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Dramatically reduces I/O
v
• With column storage, you only
read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• Column storage
• Data compression
• Zone maps
Dramatically reduces I/O
v analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
• Column storage
• Data compression
• Zone maps
• COPY compresses automatically
• You can analyze and override
• More performance, less cost
Dramatically reduces I/O
v
• Column storage
• Data compression
• Zone maps
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Track the minimum and maximum
value for each block
• Skip over blocks that don’t contain
relevant data
Dramatically reduces I/O
v
Q3. What’s good about it?
Performance, Scalability, Ease of Use, Cost
v
Performance Evaluation on 2B Rows
Aggregate by month 02:08:35 00:35:46 00:00:12
Traditional
SQL Database
Amazon
Redshift
160 GBDW2.L
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
2 PB
v
Q4. How do I integrate with Redshift?
v
Works with your existing analysis tools
JDBC/ODBC
Amazon Redshift
S3
Redshift
DynamoDB
EMR
Linux
Loading data
Amazon
Redshift
Source
Systems
ETL
UnstructuredStructured Streaming
MPP Databases
Amazon Redshift
Hadoop
Amazon EMR
Real-time Analysis
Amazon Kinesis
Input
File
Hadoop cluster
Functions Output
1. Very Flexible
2. Very Scalable
3. Often Transient
v
Amazon Elastic MapReduce (EMR)
v
Q1. What is it?
Managed Hadoop
Input
File
EMR cluster
Functions Output
EC2
EC2
EC2
EC2
EC2
EC2
v
Q2. How does it work?
v
EMR
EMR ClusterS3
1. Put the
data into S3
2. Choose: Hadoop
distribution, # of nodes, types
of nodes, Hadoop apps like
Hive/Pig/HBase
4. Get the output
from S3
3. Launch the cluster using
the EMR console, CLI, SDK,
or APIs
v
EMR
EMR Cluster
S3
You can easily resize
the cluster
And launch parallel
clusters using the same
data
v
EMR
EMR Cluster
S3
Use Spot
nodes to save
time and
money
v
EMR ClusterS3
When processing is complete, you
can terminate the cluster (and stop
paying)
v
Q3. What’s good about it?
Scalability, Cost & Ease of Use
v
14 Hours
Duration:
Scenario #1
Duration:
7 Hours
Scenario #2
EMR with spot instances
#1: Cost without Spot
4 instances *14 hrs * $0.50 = $28
#2: Cost with Spot
4 instances *7 hrs * $0.50 = $14 +
5 instances * 7 hrs * $0.25 = $8.75
Total = $22.75
Time Savings: 50%
Cost Savings: ~22%
Master instance group
EMR cluster
Task instance groupCore instance group
HDFS HDFS
Amazon S3
Great for
Spot Instances
v
The Hadoop Ecosystem
UnstructuredStructured Streaming
MPP Databases
Amazon Redshift
Hadoop
Amazon EMR
Real-time Analysis
Amazon Kinesis
v
v
Q1. What is it?
vKinesis
A fully managed service for real-time processing
of high-volume, streaming data.
v
Q2. How does it work?
Availability
Zone
Availability
Zone
Availability
Zone
Data
Sources
Data
Sources
Data
Sources
Data
Sources
Data
Sources
Logging
Metrics
Analysis
Machine
Learning
S3
DynamoDB
Redshift
EMR
Kinesis
Stream
Putting data into Kinesis
• Each shard
• 1000 Tx Per Second
• 1MB Per Second
• 50KB Payload Per Tx
• Messages kept for 24 hours
• Simple PUT interface to store data in Kinesis
• A Partition Key is used to distribute the PUTs across Shards
• A unique Sequence # is created
v
Getting data out of Kinesis
Kinesis Client Library (KCL):
• Abstracts code from individual shards
• Starts a Kinesis Worker for each shard
• Increases and decreases workers
• Tracks a Worker’s location in the stream
v
Q3. What’s good about it?
v
Easy Administration Real-time Performance High Throughput.
Elastic
Integration
S3
Redshift
DynamoDB
Storm
ElasticSearch
Build Real-time
Applications
.
Low Cost
v
Amazon Machine Learning
v
A Legacy of Machine
Learning at Amazon
“Customers who bought this
also bought…”
Why Did We Build Amazon Machine Learning?
Three types of data-driven development
Retrospective
analysis and
reporting
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing and
dashboards
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift,
Amazon RDS
Amazon S3
Amazon EMR
Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing and
dashboards
Predictions
to enable smart
applications
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift,
Amazon RDS
Amazon S3
Amazon EMR
v
Machine learning and smart applications
• Machine learning is the technology that automatically
finds patterns in your data and uses them to make
predictions for new data points as they become
available
v
Machine learning and smart applications
• Machine learning is the technology that automatically
finds patterns in your data and uses them to make
predictions for new data points as they become
available
Your data + machine learning = smart applications
v
Smart applications by example
Based on what you know
about the user:
Will they use your product?
v
Smart applications by example
Based on what you know
about the user:
Will they use your product?
Based on what you know
about an order:
Is this order fraudulent?
v
Smart applications by example
Based on what you know
about the user:
Will they use your product?
Based on what you know
about an order:
Is this order fraudulent?
Based on what you know about a
news article:
What other articles are
interesting?
v
Challenges to Building Smart Applications Today
Expertise Technology Operationalization
Limited supply of
data scientists
Many choices, few
mainstays
Complex and error-
prone data workflows
Expensive to hire
or outsource
Difficult to use and scale Custom platforms and
APIs
What is Amazon Machine Learning?
v
Amazon Machine Learning
• Easy to use, managed machine learning service
built for developers
• Robust, powerful machine learning technology
based on Amazon’s internal systems
• Create models using your data already stored in
the AWS cloud
• Deploy models to production in seconds
v
Easy to use and developer-friendly
• Use the intuitive, powerful service console to build and
explore your initial models
• Data retrieval
• Model training, quality evaluation, fine-tuning
• Deployment and management
• Automate model lifecycle with fully featured APIs and
SDKs
• Java, Python, .NET, JavaScript, Ruby, PHP
• Easily create smart iOS and Android applications with AWS
Mobile SDK
v
Powerful machine learning technology
• Based on Amazon’s battle-hardened internal systems
• Not just the algorithms:
• Smart data transformations
• Input data and model quality alerts
• Built-in industry best practices
• Grows with your needs
• Train on up to 100 GB of data
• Generate billions of predictions
• Obtain predictions in batches or real-time
v
Integrated with AWS Data Ecosystem
• Access data that is stored in Amazon S3, Amazon
Redshift, or MySQL databases in RDS
• Output predictions to Amazon S3 for easy integration
with your data flows
• Use AWS Identity and Access Management (IAM) for
fine-grained data-access permission policies
v
Fully-managed model and prediction services
• End-to-end service, with no servers to provision and
manage
• One-click production model deployment
• Programmatically query model metadata to enable
automatic retraining workflows
• Monitor prediction usage patterns with Amazon
CloudWatch metrics
v
Pay-as-you-go and inexpensive
• Data analysis, model training, and evaluation:
$0.42/instance hour
• Batch predictions: $0.10/1000
• Real-time predictions: $0.10/1000
• + hourly capacity reservation charge
v
Three Supported Types of Predictions
• Binary Classification
• Predict the answer to a Yes/No question
• Multi-class classification
• Predict the correct category from a list
• Regression
• Predict the value of a numeric variable
How Do I Get started Using
Amazon Machine Learning?
Get Started Quickly
• Create, access, and manage all Amazon
ML entities through the AWS
Management Console
• Easily learn to build a model with the
tutorial dataset provided
• Add prediction capabilities to your iOS
and Android applications with AWS
Mobile SDK
• Use Amazon ML APIs, CLIs, or SDKs
v
Build
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
Building smart applications with Amazon ML
v
Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
Building smart applications with Amazon ML
- Create a Datasource object pointing to your data
- Explore and understand your data
- Transform data and train your model
v
Explore and understand your data
v
Train your model
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_ml_model(
ml_model_id=’my_model',
ml_model_type='REGRESSION',
training_data_source_id='my_datasource')
v
Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
Building smart applications with Amazon ML
- Understand model quality
- Adjust model interpretation
v
Explore model quality
v
Fine-tune model interpretation
v
Fine-tune model interpretation
v
Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
Building smart applications with Amazon ML
- Batch predictions
- Real-time predictions
v
Batch predictions
• Asynchronous, large-volume prediction generation
• Request through service console or API
• Best for applications that deal with batches of data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_batch_prediction(
batch_prediction_id = 'my_batch_prediction’
batch_prediction_data_source_id = ’my_datasource’
ml_model_id = ’my_model',
output_uri = 's3://examplebucket/output/’)
v
Real-time predictions
• Synchronous, low-latency, high-throughput prediction generation
• Request through service API or server or mobile SDKs
• Best for interaction applications that deal with individual data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id=’my_model',
predict_endpoint=’example_endpoint’,
record={’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}
Architecture Patterns for Smart
Applications
Batch predictions with Amazon EMR
Query for predictions with
Amazon ML batch API
Process data with
Amazon EMR
Raw data in
Amazon S3
Aggregated data
in Amazon S3
Predictions
in Amazon S3 Your application
Batch predictions with Amazon Redshift
Structured data
In Amazon Redshift
Load predictions into Amazon
Redshift
-or-
Read prediction results directly
from Amazon S3
Predictions
in Amazon S3
Query for predictions with
Amazon ML batch API
Your application
Real-time predictions for interactive applications
Your application
Query for predictions with
Amazon ML real-time API
Thank You!
aws.amazon.com/big-data
Thank you!
@AWSCloudSEAsia
Chris Hampartsoumian
Technology Evangelist ASEAN

More Related Content

What's hot

What's hot (20)

Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Power BI as a storyteller
Power BI as a storytellerPower BI as a storyteller
Power BI as a storyteller
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 

Similar to Building a Big Data & Analytics Platform using AWS

AWS Cloud Computing for Startups Werner Vogels -part i
AWS Cloud Computing for Startups   Werner Vogels -part iAWS Cloud Computing for Startups   Werner Vogels -part i
AWS Cloud Computing for Startups Werner Vogels -part i
Amazon Web Services
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
Amazon Web Services
 
Introduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric DepartmentIntroduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric Department
Kevin Lee
 
AWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner VogelsAWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner Vogels
Amazon Web Services
 

Similar to Building a Big Data & Analytics Platform using AWS (20)

AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
 
Getting Started with AWS Compute Services
Getting Started with AWS Compute ServicesGetting Started with AWS Compute Services
Getting Started with AWS Compute Services
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
AWS Cloud Computing for Startups Werner Vogels -part i
AWS Cloud Computing for Startups   Werner Vogels -part iAWS Cloud Computing for Startups   Werner Vogels -part i
AWS Cloud Computing for Startups Werner Vogels -part i
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Vn introduction to cloud computing with amazon web services
Vn   introduction to cloud computing with amazon web servicesVn   introduction to cloud computing with amazon web services
Vn introduction to cloud computing with amazon web services
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Avoid Embarrassment, Use Cloud
Avoid Embarrassment, Use CloudAvoid Embarrassment, Use Cloud
Avoid Embarrassment, Use Cloud
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Introduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric DepartmentIntroduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric Department
 
AWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner VogelsAWSSummit NYC- KeyNote by Werner Vogels
AWSSummit NYC- KeyNote by Werner Vogels
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)
AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)
AWS re:Invent 2016: 20k in 20 Days - Agile Genomic Analysis (ENT320)
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Building a Big Data & Analytics Platform using AWS

  • 1. v Chris Hampartsoumian Technology Evangelist - ASEAN End to End Data Flows on the Cloud Structured, Unstructured & Streaming July 2015
  • 2. How is Cloud Computing important for Big Data Applications?
  • 3. v ? …get into cloud computing? How did Amazon…
  • 4. 11 Regions 30 Availability Zones 53 Edge locations AWS Global Infrastructure
  • 5. Why are customers adopting cloud computing? Variable expense Replace capital expenditure with variable expense Elastic capacity No need to guess capacity requirements and over-provision Speed and agility Infrastructure in minutes not weeks Global Reach Go global in minutes and reach a global audience
  • 6. Mobile Push Notifications Mobile Analytics Cognito Cognito Sync AWS Global Infrastructure Your Applications AWS Global Infrastructure11 Regions 30 Availability Zones 53 Edge Locations Network VPC Direct Connect Route 53 API Human Interaction Support Web Console Interaction Command Line Libraries, SDK’s Database DynamoDBRDS ElastiCache Deployment & Management Elastic Beanstalk OpsWorks Cloud Formation Code Deploy Code Pipeline Code Commit Security & Administration CloudWatch Config Cloud Trail IAM Directory KMS Application SQS SWF App Stream Elastic Transcoder SES Cloud Search SNS Enterprise Applications WorkSpaces WorkMail WorkDocs Compute EC2 ELB Auto Scaling LambdaECS Analytics Kinesis Data Pipeline RedShift EMR Machine Learning Storage EBS Glacier CloudFrontEFSS3
  • 8. UnstructuredStructured Streaming MPP Databases Amazon Redshift Hadoop Amazon EMR Real-time Analysis Amazon Kinesis
  • 9. v • Standard SQL • Optimized for fast analysis • Very scalable
  • 12. v MPP SQL Database Optimised for Analytics Gigabytes to Petabytes Fully relational Fully managed Amazon Redshift
  • 13. v Q2. How does it work?
  • 15. JDBC/ODBC ID Name 1 John Smith 2 Jane Jones 3 Peter Black 4 Pat Partridge 5 Sarah Cyan 6 Brian Snail 1 John Smith 4 Pat Partridge 2 Jane Jones 5 Sarah Cyan 3 Peter Black 6 Brian Snail
  • 16. v • Column storage • Data compression • Zone maps • With row storage you do unnecessary I/O • To get average Amount by State, you have to read everything ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 Dramatically reduces I/O
  • 17. v • With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 • Column storage • Data compression • Zone maps Dramatically reduces I/O
  • 18. v analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw • Column storage • Data compression • Zone maps • COPY compresses automatically • You can analyze and override • More performance, less cost Dramatically reduces I/O
  • 19. v • Column storage • Data compression • Zone maps 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 10 324 375 623 637 959 • Track the minimum and maximum value for each block • Skip over blocks that don’t contain relevant data Dramatically reduces I/O
  • 20. v Q3. What’s good about it? Performance, Scalability, Ease of Use, Cost
  • 21. v Performance Evaluation on 2B Rows Aggregate by month 02:08:35 00:35:46 00:00:12 Traditional SQL Database Amazon Redshift
  • 22. 160 GBDW2.L 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 2 PB
  • 23. v Q4. How do I integrate with Redshift?
  • 24. v Works with your existing analysis tools JDBC/ODBC Amazon Redshift
  • 27. UnstructuredStructured Streaming MPP Databases Amazon Redshift Hadoop Amazon EMR Real-time Analysis Amazon Kinesis
  • 28. Input File Hadoop cluster Functions Output 1. Very Flexible 2. Very Scalable 3. Often Transient
  • 30. v Q1. What is it? Managed Hadoop
  • 32. v Q2. How does it work?
  • 33. v EMR EMR ClusterS3 1. Put the data into S3 2. Choose: Hadoop distribution, # of nodes, types of nodes, Hadoop apps like Hive/Pig/HBase 4. Get the output from S3 3. Launch the cluster using the EMR console, CLI, SDK, or APIs
  • 34. v EMR EMR Cluster S3 You can easily resize the cluster And launch parallel clusters using the same data
  • 35. v EMR EMR Cluster S3 Use Spot nodes to save time and money
  • 36. v EMR ClusterS3 When processing is complete, you can terminate the cluster (and stop paying)
  • 37. v Q3. What’s good about it? Scalability, Cost & Ease of Use
  • 38. v 14 Hours Duration: Scenario #1 Duration: 7 Hours Scenario #2 EMR with spot instances #1: Cost without Spot 4 instances *14 hrs * $0.50 = $28 #2: Cost with Spot 4 instances *7 hrs * $0.50 = $14 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $22.75 Time Savings: 50% Cost Savings: ~22%
  • 39. Master instance group EMR cluster Task instance groupCore instance group HDFS HDFS Amazon S3 Great for Spot Instances
  • 41. UnstructuredStructured Streaming MPP Databases Amazon Redshift Hadoop Amazon EMR Real-time Analysis Amazon Kinesis
  • 42. v
  • 44. vKinesis A fully managed service for real-time processing of high-volume, streaming data.
  • 45. v Q2. How does it work?
  • 47. Putting data into Kinesis • Each shard • 1000 Tx Per Second • 1MB Per Second • 50KB Payload Per Tx • Messages kept for 24 hours • Simple PUT interface to store data in Kinesis • A Partition Key is used to distribute the PUTs across Shards • A unique Sequence # is created
  • 48. v Getting data out of Kinesis Kinesis Client Library (KCL): • Abstracts code from individual shards • Starts a Kinesis Worker for each shard • Increases and decreases workers • Tracks a Worker’s location in the stream
  • 49. v Q3. What’s good about it?
  • 50. v Easy Administration Real-time Performance High Throughput. Elastic Integration S3 Redshift DynamoDB Storm ElasticSearch Build Real-time Applications . Low Cost
  • 52. v A Legacy of Machine Learning at Amazon “Customers who bought this also bought…”
  • 53. Why Did We Build Amazon Machine Learning?
  • 54. Three types of data-driven development Retrospective analysis and reporting Amazon Redshift Amazon RDS Amazon S3 Amazon EMR
  • 55. Three types of data-driven development Retrospective analysis and reporting Here-and-now real-time processing and dashboards Amazon Kinesis Amazon EC2 AWS Lambda Amazon Redshift, Amazon RDS Amazon S3 Amazon EMR
  • 56. Three types of data-driven development Retrospective analysis and reporting Here-and-now real-time processing and dashboards Predictions to enable smart applications Amazon Kinesis Amazon EC2 AWS Lambda Amazon Redshift, Amazon RDS Amazon S3 Amazon EMR
  • 57. v Machine learning and smart applications • Machine learning is the technology that automatically finds patterns in your data and uses them to make predictions for new data points as they become available
  • 58. v Machine learning and smart applications • Machine learning is the technology that automatically finds patterns in your data and uses them to make predictions for new data points as they become available Your data + machine learning = smart applications
  • 59. v Smart applications by example Based on what you know about the user: Will they use your product?
  • 60. v Smart applications by example Based on what you know about the user: Will they use your product? Based on what you know about an order: Is this order fraudulent?
  • 61. v Smart applications by example Based on what you know about the user: Will they use your product? Based on what you know about an order: Is this order fraudulent? Based on what you know about a news article: What other articles are interesting?
  • 62. v Challenges to Building Smart Applications Today Expertise Technology Operationalization Limited supply of data scientists Many choices, few mainstays Complex and error- prone data workflows Expensive to hire or outsource Difficult to use and scale Custom platforms and APIs
  • 63. What is Amazon Machine Learning?
  • 64. v Amazon Machine Learning • Easy to use, managed machine learning service built for developers • Robust, powerful machine learning technology based on Amazon’s internal systems • Create models using your data already stored in the AWS cloud • Deploy models to production in seconds
  • 65. v Easy to use and developer-friendly • Use the intuitive, powerful service console to build and explore your initial models • Data retrieval • Model training, quality evaluation, fine-tuning • Deployment and management • Automate model lifecycle with fully featured APIs and SDKs • Java, Python, .NET, JavaScript, Ruby, PHP • Easily create smart iOS and Android applications with AWS Mobile SDK
  • 66. v Powerful machine learning technology • Based on Amazon’s battle-hardened internal systems • Not just the algorithms: • Smart data transformations • Input data and model quality alerts • Built-in industry best practices • Grows with your needs • Train on up to 100 GB of data • Generate billions of predictions • Obtain predictions in batches or real-time
  • 67. v Integrated with AWS Data Ecosystem • Access data that is stored in Amazon S3, Amazon Redshift, or MySQL databases in RDS • Output predictions to Amazon S3 for easy integration with your data flows • Use AWS Identity and Access Management (IAM) for fine-grained data-access permission policies
  • 68. v Fully-managed model and prediction services • End-to-end service, with no servers to provision and manage • One-click production model deployment • Programmatically query model metadata to enable automatic retraining workflows • Monitor prediction usage patterns with Amazon CloudWatch metrics
  • 69. v Pay-as-you-go and inexpensive • Data analysis, model training, and evaluation: $0.42/instance hour • Batch predictions: $0.10/1000 • Real-time predictions: $0.10/1000 • + hourly capacity reservation charge
  • 70. v Three Supported Types of Predictions • Binary Classification • Predict the answer to a Yes/No question • Multi-class classification • Predict the correct category from a list • Regression • Predict the value of a numeric variable
  • 71. How Do I Get started Using Amazon Machine Learning?
  • 72. Get Started Quickly • Create, access, and manage all Amazon ML entities through the AWS Management Console • Easily learn to build a model with the tutorial dataset provided • Add prediction capabilities to your iOS and Android applications with AWS Mobile SDK • Use Amazon ML APIs, CLIs, or SDKs
  • 73. v Build model Evaluate and optimize Retrieve predictions 1 2 3 Building smart applications with Amazon ML
  • 74. v Train model Evaluate and optimize Retrieve predictions 1 2 3 Building smart applications with Amazon ML - Create a Datasource object pointing to your data - Explore and understand your data - Transform data and train your model
  • 76. v Train your model >>> import boto >>> ml = boto.connect_machinelearning() >>> model = ml.create_ml_model( ml_model_id=’my_model', ml_model_type='REGRESSION', training_data_source_id='my_datasource')
  • 77. v Train model Evaluate and optimize Retrieve predictions 1 2 3 Building smart applications with Amazon ML - Understand model quality - Adjust model interpretation
  • 81. v Train model Evaluate and optimize Retrieve predictions 1 2 3 Building smart applications with Amazon ML - Batch predictions - Real-time predictions
  • 82. v Batch predictions • Asynchronous, large-volume prediction generation • Request through service console or API • Best for applications that deal with batches of data records >>> import boto >>> ml = boto.connect_machinelearning() >>> model = ml.create_batch_prediction( batch_prediction_id = 'my_batch_prediction’ batch_prediction_data_source_id = ’my_datasource’ ml_model_id = ’my_model', output_uri = 's3://examplebucket/output/’)
  • 83. v Real-time predictions • Synchronous, low-latency, high-throughput prediction generation • Request through service API or server or mobile SDKs • Best for interaction applications that deal with individual data records >>> import boto >>> ml = boto.connect_machinelearning() >>> ml.predict( ml_model_id=’my_model', predict_endpoint=’example_endpoint’, record={’key1':’value1’, ’key2':’value2’}) { 'Prediction': { 'predictedValue': 13.284348, 'details': { 'Algorithm': 'SGD', 'PredictiveModelType': 'REGRESSION’ } } }
  • 84. Architecture Patterns for Smart Applications
  • 85. Batch predictions with Amazon EMR Query for predictions with Amazon ML batch API Process data with Amazon EMR Raw data in Amazon S3 Aggregated data in Amazon S3 Predictions in Amazon S3 Your application
  • 86. Batch predictions with Amazon Redshift Structured data In Amazon Redshift Load predictions into Amazon Redshift -or- Read prediction results directly from Amazon S3 Predictions in Amazon S3 Query for predictions with Amazon ML batch API Your application
  • 87. Real-time predictions for interactive applications Your application Query for predictions with Amazon ML real-time API