2. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Agenda
• Goal
• Concept
• Design
• Data Flow
• Scaling
• Security
• Metrics
• Cost Optimization
**Let’s Build Something!!**
3. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Goal
• Build a rapid prototype using native
features available in C2S today to ingest,
process, and analyze a large data set
4. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Disclaimer
• The purpose of this prototype is to display the
ease and speed at which new capabilities
can be created using AWS
• The prototype displayed is not an AWS
concept product – this is a simple demo
• Some AWS partners offer products with
similar (and more extensive) capabilities;
these can be found in the AWS Marketplace
5. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Concept
• Ingest a public data set from news media
outlets (e.g., the GDELT project)
• Perform some processing/analysis on the
data (e.g., a word cloud)
• Display the end-product to the customer in an
easily consumable format (e.g., a map to the
geographic coordinates of the article)
6. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Design
• Your mileage may vary (YMMV)
• Internet gateway with public IP addresses for
access to AWS endpoints (e.g., Amazon S3,
Amazon SQS, Amazon SNS, etc.)
• Baked AMIs with boot scripts downloaded from S3
• Amazon EC2 instance roles for resource access
7. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Design
Availability Zone
SNS
S3
Availability Zone
SQS
CloudWatch
Auto Scaling group [Monitor - 1:1]
Auto Scaling group [Worker Fleet - 0:n]
8. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Workflow
• Add GDELT article dump to S3 bucket
• SNS notification sent to SQS queue on new object addition
• Monitor instance parses each article into an SQS message
• SQS queue size triggers Auto Scaling group for worker fleet
• Workers begin polling SQS queue, downloading articles,
generating word clouds, writing to DynamoDB table
• Work completes, queue size decreases, Auto Scaling group
workers are terminated by Auto Scaling group
9. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Data Flow
S3
DynamoDBBucket
Monitor
SQSSNS
ASG
0:n
S3
ASG
1:1
Worker
Fleet
SQS
10. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Data Flow
• What does an SQS message look like?
11. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Scaling
• Use of AWS-managed services
– S3, SQS, & SNS are distributed across
multiple Availability Zones out of the box
– No infrastructure to maintain
• Use of Auto Scaling groups
– Scale on metrics, hands-off
12. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Scaling
• Two Auto Scaling groups
– Min 1, Max 1 ensures that our monitor instance is
available to process event notifications when new
data appears in our S3 bucket in case of an
instance loss or AZ outage
– Our worker fleet scales up/down based on the
number of messages waiting to be processed in
our SQS queue
13. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Security
• Amazon EC2 instance roles
• Amazon S3 bucket policies
• Amazon SQS queue access policies
14. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Demo
• Let’s go to the video tape
15. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Some Metrics
Attribute Short Run Long Run
Date range of articles 2015/06/08 2015/06/04-09
Total number of objects processed 183,064 867,393
Cumulative size of downloaded articles 15.1 GB 80.2 GB
16. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cost Optimization
• Generate an AWS CloudFormation template
to set up and execute test runs
• Use a combination of resource tags and
detailed billing reports (DBRs) to capture
costs per run
• Use Amazon CloudWatch metrics to ensure
adequate resource utilization
17. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cost Optimization
• Establish 4 lanes using different instance
types: r3.8xlarge, m3.xlarge, m3.medium,
and t2.micro.
• Aim for 80-100% average CPU utilization
across instance types
• Look for bottlenecks as you move things
around
19. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Data Flow – one lane
S3
DynamoDBBucket
Monitor
SQSSNS
ASG
0:n
S3
ASG
1:1
Worker
Fleet
SQS
20. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
SQS Queue & Instances – t2.micro
21. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
CPU Utilization – t2.micro
22. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Data Flow – all lanes
ASG
0:n
ASG
1:1
ASG
0:n
ASG
1:1
ASG
0:n
ASG
1:1
ASG
0:n
ASG
1:1
23. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
SQS Queue & Instances – all lanes
24. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
CPU Utilization – all lanes
25. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cost Comparison – Short Run
Instance Type Total Instances Time to Complete Amazon EC2
Cost
t2.micro 100 0:36 $1.30
m3.medium 100 0:59 $7.00
m3.xl 25 0:41 $7:00
r3.8xl 1 1:32 $5.60
Prices based on us-west-2 region
26. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cost Comparison – Long Run
Instance Type Total Instances Time to Complete Amazon EC2
Cost
t2.micro 100 22:53 $29.90
m3.medium 100 17:14 $120.60
m3.xl 25 16:29 $113.90
r3.8xl 1 19:53 $56.00
Prices based on us-west-2 region
27. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Other Cost Factors
• Amazon DynamoDB
• Amazon SQS
• Storage
• Bandwidth
28. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Final Thoughts
• The core of this demo took one person-
week to implement
• Using native features of C2S services to
implement applications will save time and
money
29. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
References
• The GDELT Project
http://gdeltproject.org/
• A simple Python word cloud
https://github.com/amueller/word_cloud
• Keyhole Markup Language (KML)
http://www.opengeospatial.org/standards/kml
30. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Thank You.
This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Notas do Editor
Thank you everyone for coming. This session is on Tech Tips for C2S. My name is [self] and I am a Solutions Architect with Amazon Web Services, and I am part of a larger team of Solutions Architects focused on helping customers build and deploy or migrate solutions to C2S. Call out Troy wrt the demo/coding.
For those of you who’ve attended other Tech Tips sessions we’ve had, this one is a little different. Usually we focus on one service and provide examples on how to take advantage of key features of that service. With this session we chose to build something on AWS. We are going to walkthrough our experience building, deploying, and optimizing a prototype rapidly on the AWS cloud using only services and service features that are deployed in C2S today.
Our goal for this prototype was to use as many native features of C2S services as possible to rapidly build a working, functional prototype capable of ingesting, processing and analyzing large data sets. You’ll see as we move through the presentation that our goal was to focus on scalability by using the elasticity of AWS services.
showcase AWS services and features you can use to deploy services quickly and easily
this prototype is not a concept product of AWS
if you are in search of mature products with these types of capabilities, I encourage you to check out our AWS Marketplace
Simple concept: ingest a large data set, perform some processing/analysis on the data, and display the results of that analysis in an easily consumable user interface
First we needed to find a large data set. While AWS has a number of publically hosted datasets, we wanted to demo a system that was pulling data from a wide variety of Internet sources as opposed to just data that already resided on AWS infrastructure. For that, we chose to use the GDELT.
Second we needed a way to analyze this data. We chose a word cloud.
Finally we wanted an easily consumable format to display the data, and for that we chose a geographic display.
When are using services with a broad selection of features like AWS provides, there are many implementation options available. The choices we made do not represent the only options available, merely our implementation decision.
For instance: we chose to deploy with an Internet Gateway/Public IP, we chose to bake custom AMIs for our autoscaling groups, and use EC2 Instance Roles to access resources. And for each one of these decisions, there are multiple alternatives.
Two autoscaling groups: 1:1 and 0:n
1:1 is for our monitoring instance.
0:n provides on-demand processing/cost-savings/elasticity/scalability
Will un-hide once data is available to be populated.
This is for the short run using t2.micros.
Will un-hide once data is available to be populated.
Will un-hide once data is available to be populated.
Will un-hide once data is available to be populated.
Will un-hide once data is available to be populated.