Logging at Scale with AWS

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Logging at Scale
Alex Smith - @alexjs
Solutions Architect
April 2016

No Users
5.2m users
(~80k rps)

Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation

Stealing Content…
‘Your First 10m Users’
ARC301 – re:Invent 2015
http://bitly.com/2015arc301
- Joel Williams
AWS Solutions Architect

>1 User
• Amazon Route 53 for DNS
• A single Elastic IP
• A single Amazon EC2 instance
• With full stack on this host
• Web app
• Database
• Management
• And so on…
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301

>1 User
• A single place to read logs
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301

>1 User
• A single place to read logs from
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301

@alexjs hacks – top URLs
# awk -F" '{print $2'} access_log
| awk '{print $2}'
| sort | uniq -c | sort –rn
11208 /
3287 /2016/04/23/welcome

@alexjs hacks – HTTP response codes
# awk '{print $9}' access_log
| sort | uniq -c | sort –rn
19307 200
1239 404
120 503
1 416

@alexjs hacks - top User-Agents
# awk -F" '{print $6'} access_log | sort | uniq -c | sort -rn
3774 Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0;
Trident/6.0; IEMobile/10.0; ARM; Touch; Microsoft; Lumia 640 XL)
2949 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/46.0.2490.86 Safari/537.36
2928 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
2900 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/39.5.2171.95 Safari/537.36

@alexjs hacks – requests per second (realtime)
# tail -F access_log
perl -e 'while (<>) {$l++;if (time > $e) {$e=time;print
"$ln";$l=0}}’
1
1
68
99
912
424
http://bitly.com/bashlps

Users >1000
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
ELB
Balancer
User
Amazon
Route 53

Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance
Web
Instance

Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301

Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance

Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Amazon
Route 53
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance

When the logs are written (AWS)
• Local memory
• Ephemeral Volumes
• EBS Volumes
• gp2
• st1/sc1

Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
• Insight

Three Problems of Persistence
• Somewhere to stage
• Somewhere to live
• Somewhere to search

To NoSQL, or not to NoSQL?
- Joel

Some folks won’t like this,
but…

Start with SQL databases
(even MPP SQL)

Why start with SQL?
• Established and well-worn technology.
• Lots of existing code, communities, books, and tools.
• You aren’t going to break SQL DBs in your first 10 million
users. No, really, you won’t.*
• Clear patterns to scalability (especially in analytics)
*Unless you are doing something SUPER peculiar with the data or you have MASSIVE amounts of it.
…but even then SQL will have a place in your stack.

Ah ha! You said
massive!
- Joel (again)

Why might you need NoSQL?
• Super low-latency applications
• Metadata-driven datasets
• Highly nonrelational data
• Need schema-less data constructs*
• Massive amounts of data (again, in the TB range)
• Rapid ingest of data (thousands of records/sec)
*Need!= “It’s easier to do dev without schemas”

Log Dispatcher Architecture Revisited
App Server App Server App Server App Server
Kinesis
Firehose
Log Index
ElasticSearch
Log Index
ElasticSearch
Visualisation
Amazon
S3
JSON

Amazon S3
• Simple Storage Service
• Canonical logging target for ELB, CloudFront, etc.
• Virtually unlimited amounts of storage
• Support for Lambda operations
• Very fast – ideal for feeding other services (Redshift,
EMR/Hadoop)
• Data can be automatically pushed here from Amazon
Firehose
Amazon
S3

• Long tail

Redshift
• PostgreSQL based MPP
database
• Petabyte scale data
warehousing
• Choice of nodes
• Dense compute
• Dense storage
• Already compatible with
your existing BI tools
dense
compute node
dense
storage node
Amazon
Redshift
Up to 128 nodes at 2PB
~256PB/cluster

(streaming data)

Amazon ElasticSearch Service
• ElasticSearch
• Popular/Open Source
• Commonly used for log
and clickstream
• Managed Solution
• We prepackage Kibana
• Integrated with IAM,
Firehose, etc
Amazon
Elasticsearch Service
Amazon
Kinesis
Firehose

ElasticSearch Index Mapping
curl -XPUT 'https://search-loggingatscale-demo-[...].us-east-
1.es.amazonaws.com/blog-apache-combined' -d '
{
"mappings": {
"blog-apache-combined": {
"properties": {
"datetime": {
"type": "date",
"format": "dd/MMM/yyyy:HH:mm:ss Z”
},
"agent": {
"type": "string",
"index": "not_analyzed”
}, [...]

How do I get my data in anyway?

Logging Architecture
Log
Aggregator
(Kafka/Kinesis/MQ)
Log
Aggregator
(Kafka/Kinesis/MQ)
Log
Index/Persist
(ElasticSearch, etc)
Log
Index/Persist
(ElasticSearch, etc)
Visualisation

Logging Architecture
Log
Aggregator
(Kafka/Kinesis/MQ)
Log
Aggregator
(Kafka/Kinesis/MQ)
ElasticSearch ElasticSearch
Visualisation

Amazon Kinesis
• Firstly, a massively
scalable, low cost way to
send JSON objects to a
’stream’ hosted by AWS
• Users can write applications
(using KCL) to take data
from it and parse/evaluate
• Apps can be written in Java,
Lambda (Node, Python, Java),
etc

Kinesis Streams
• What was previously Kinesis
• Still very customisable, for
innovative stream workloads
• Users still write app to parse
data from the stream
Amazon Kinesis: New Features (re:Invent 2015)
Kinesis Firehose
• Fully managed data ingest
service
• Provision end point
• Send data to end point
• ???
• Data!
• Outputs to S3, Redshift,
ElasticSearch Service
• (And can do two at once)

Amazon Kinesis: New Features (Apr 2016)
Amazon Kinesis Agent
• Standalone Java application from AWS
• Collect and send logs to Kinesis Firehose
• Built-in:
• File rotation
• Failure retries
• Checkpoints
• Integrated with CloudWatch for alerting

Amazon Kinesis Agent
• Multiple input options
• SINGLELINE
• CSVTOJSON
• LOGTOJSON
• LOGTOJSON
• Hoorah!

Demo: Local Capture + Dispatch

Problems
• Storage (Temp)
• Capture
• Storage (Perm)
• Visualisation

Kibana
• Pre-packaged with Amazon ElasticSearch Service
• Easy to manage with freeform data
• Dashboards!

Your existing BI tools
• As before – your data exists on S3 (JSON)
• S3 -> Redshift
• Commission a Redshift cluster with IAM roles
• Write a manifest of the files to load (JSON)
• Issue a load
• Redshift is PgSQL compatible
• Drivers exist for many tools

Recap / Lessons / Next
• Logging is really hard.
• Use tools like AWS Firehose, Kinesis Agent and
ElasticSearch Service to make it easier
• Reuse data, tools and people where possible

Lessons
Don’t be big data dog
Use the right tools at the right
time

Q&A
Twitter
@alexjs
LinkedIn
https://sg.linkedin.com/in/alexjs
Email
alexjs@amazon.com

Logging at Scale with AWS

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Logging at Scale with AWS

Semelhante a Logging at Scale with AWS (20)

Mais de Amazon Web Services

Mais de Amazon Web Services (20)

Último

Último (20)

Logging at Scale with AWS