Mais conteúdo relacionado Semelhante a Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018 (20) Mais de Amazon Web Services (20) Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build Your Own Log Analytics
Solutions on AWS
Pranav Nambiar
Senior Manager (PM)
AWS
A N T 3 2 3
Tommy Li
Senior Software Architect
Autodesk
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
There seems to be a problem. Do I have the logs?
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s log everything—Open the flood gates
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Too much data. Are we in a data glut?
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data overload
Gaming IoT sensorsDevices
External
systems
and
applications Web content
Logs, logs, and
more logs …
Databases Servers NetworkingStorage
Internal
systems
and
applications
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Am I operating efficiently?
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The problem with log files
Traditional approach = more time, less accurate, negative impact to business
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The solution
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Overview of Amazon Elasticsearch Service
Factors to consider while building log analytics solutions
Insights from Autodesk
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Amazon Elasticsearch Service?
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service is a
fully managed service that makes it
easy to deploy, manage, and scale
Amazon ES and Kibana
Amazon Elasticsearch Service
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of Amazon Elasticsearch Service
Supports open source APIs
and tools
Drop-in replacement with no need
to learn new APIs or skills
Easy to use
Deploy a production-ready
Elasticsearch cluster in minutes
Scalable
Resize your cluster with a few
clicks or a single API call
Secure
Deploy into your VPC and restrict
access using security groups and IAM
Highly available
Replicate across Availability
Zones, with monitoring and
automated self-healing
Integrated with
other AWS Services
Seamless data ingestion, security,
auditing, and orchestration
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leading Amazon Elasticsearch Service use cases
Application monitoring & root-
cause analysis
Security Information and Event
Management (SIEM)
IoT & mobile Business & clickstream analytics
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service architecture
Amazon Elasticsearch Service
data nodes
Amazon Elasticsearch Service
master nodes
Amazon Elasticsearch Service domain
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Factors to consider while building log
analytics solutions
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a log analytics solution
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a log analytics solution
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Log analytics—Decentralized
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Log analytics—Centralized
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key factors to consider
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingesting your data
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data source Collect Transform Deliver
Ingestion pipeline tasks
Buffer
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Logstash
Transform
Kafka
Rabbit MQLogstash
Buffer
Logstash
Deliver
Beats
Fluentd
Logstash
Collect
Amazon
CloudWatch
Logs agent
Worker
nodes
Application
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using Amazon Simple Storage Service (Amazon S3) as a data lake
• Use S3 create events to trigger a Lambda function
• The Lambda transforms and delivers the data
• S3 offers highly durable storage for your singe source of truth
• Easy to set up, robust, and high scale
Files
S3 events
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Firehose for robust ingest
Source records
Data source with
Kinesis agent
Source records
Transformed
records
Delivery failure
Data transformation using
AWS Lambda
Transformation failure
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing your Amazon Elasticsearch
Service domain
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tenancy—Single index vs. multiple indices
Factors to consider:
• Access patterns
• Performance
• Scaling
• Boundaries of access control
• Data retention
• Granularity of data backup/restore
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharding
38000000
39000000
40000000
41000000
42000000
43000000
44000000
45000000
0 2 4 6 8 10 12 14 16
Documentsloaded
Shards
Docs indexed M4.2xlarge
(8 vCPU)• Try to have #active-
shards per instance =
# vCPUs per instance
• Uniform shard size
drives better
performance
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Securing your Amazon Elasticsearch
Service domain
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Features Benefits
Authentication via AWS
Identity and Access
Management (IAM)
Role based authentication
Index level access control Granular control on a per-index basis
Auditing via AWS CloudTrail Ability to audit your calls
Monitoring and Alerting via
Amazon CloudWatch
Monitor health & usage metrics, set
up alarms to react to events
Security and monitoring
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Virtual Private Cloud (VPC)
• Access Amazon Elasticsearch Service
directly from customer’s VPC—data is
transferred within the Amazon
network
• VPC security groups for network level
access control
• Specify subnet to select availability
zone
• No additional cost
VPC
Subnet
Availability Zone 1
Security group
Data Master
Availability Zone 2
Subnet
Security group
Data Master
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use Amazon Cognito for Kibana sign-in
• Works for public and private endpoints
• Create users and roles within Amazon Cognito to control access
• Supports federated identities
• Access control is per-domain
Authentication
Kibana
Permissions PermissionsRole
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling your Amazon Elasticsearch Service
domain
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Understand your bottleneck
• Disk vs. CPU vs. Memory
Name Metric Threshold Periods
CPUUtilization/ MasterCPUUtilization Average >= 80% 3
JVMMemoryPressure/
MasterJVMMemoryPressure
Maximum >= 80% 3
FreeStorageSpace Minimum <= (25% of
avail space)
1
• Master node capacity
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Instance Max Storage Workload
T2 3.5TB You want to do dev/QA
M3, M4 150TB Your data and queries are “average”
R3, R4 150TB You have higher request volumes, larger
documents, or are using aggregations heavily
C4 150TB You need to support high concurrency
I2, I3 1.5 PB You have high IOPS and XL storage
requirements
Which instance type?
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summing it all up
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key takeaways
• Understand your workload requirements well to get your architecture
right
• Use Amazon S3 with AWS Lambda, and Amazon Kinesis Firehose for
robust ingestion pipeline
• Multi-tenancy can drive highest utilization but there are tradeoffs
• Use IAM and VPC to secure your data
• Understand your bottlenecks, scale smartly
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Instrumentation project drivers
Architecture
Amazon Elasticsearch Service
Lessons learned
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we do?
Autodesk gives you the
power to make anything.
42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Make anything
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we are doing?
44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autodesk Forge platform
45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer workflow
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Support scenario
“Hey Autodesk support, I am not
sure why I cannot see the model I
uploaded in the viewer yet? I
need to share this for review with
my contractors, and this is
holding up the project.”
Customer experience really matters
“Why can't I see my model yet?”
47. “Failures in todays complex, distributed
and interconnected systems are not the
exception. They are normal cases, not
predictable, and not avoidable”
Uwe Friedrichsen
https://www.slideshare.net/ufried/patterns-of-resilience
48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why instrumentation?
• Need a consistent way to collect and measure metrics of Forge services,
so that ...
• Monitoring
• Real-time operational problem detection and notification
• (MTTD improvement)
• Forensic
• Incident management
• (MTTR improvement)
• Analytics
• Derive insights to drive features, resiliency
• (MTBF improvement)
49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pain points
• Lack of distributed tracing capabilities across multiple services
• Log data quality issue
• Timeliness, reliability, completeness, consistent format
• Scalability and cost
• Tool centric
• Data lock in, data access problem, efficiency
• Analytics quality issue
• timeliness, completeness, accuracy
• Multiple pipelines, integration framework
• Incompatibility, integration complexity
50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture approach
• It is a platform!
• Separation of concerns, well-defined interfaces
• Enabling to draft best in class solution
• Use managed services
• Manage less to gain more value
• Simplification
• One solution choice per interface
• Metrics standardization
• Standard metrics
• Forge service specific metrics (extensions)
51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System architecture
52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grafana
53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unified logging
• Problem statement: log data in various formats
• Cross service tracing impossible
• Complexity for monitoring, forensics, analytics
• Solution
• Standardize log data model
• Annotate log records with distributed tracing states
• Adopt OpenTracing http://opentracing.io
• Provide SDK supporting major languages
54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unified logging—Example
55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unified logging End to end tracing (X-Ray)
56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forge service onboarding—Instrumentation
57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why Amazon Elasticsearch Service?
• Fully managed
• Highly available
• Simple provisioning and scaling process
• Instance types selection
• Seamless integration with Amazon Kinesis Data
Firehose
• Amazon CloudWatch metrics for monitoring
• Kibana built-in
• Cost-effective
58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service—Sizing
59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kibana widely adopted
60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing your Amazon Elasticsearch Service setup
• Before: Single index for all
services
• Elasticsearch queries were slow
• Failed to onboard more services
• Exceeded 1000 fields in an index
61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Options considered
• Option one: Index per service
• Each index can have up to 1000 fields
• Better query throughput: search on specific index
• Increased provisioning and manageability complexity
• Option two: Keep single index, set max fields = 2000
• Performance implication (ES anti-pattern)
• Can’t scale
• Option three: Keep single index, core fields + N custom fields
• Operationally simple
• Horrible usability
62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Other considerations
• Most services < 50GB daily ingestion
• Guideline following: 1 CPU per active shard
• Decision: One shard (and one replica) per index
• Performance load test (very important) - what we have done
• Simulate production traffic writing
• Simulate distributed tracing query - 500 queries per second
63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service—Current state
64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary and lessons learned
• Start-up model
• Inception Incubation Full funding Adoption
• Full business values depends on adoption velocity
• Company mandate helps in adoption
• Sizing exercise—finance partners early involvement
• Be agile
• Don’t overdo
• Prioritize risk, divide, and conquer
• Be determined to rip things out and start over
• Use managed services where possible
• Great partnership with AWS product teams and solution
architects
66. Time: 15 minutes after this session
Location: Speaker Lounge (ARIA East, Level 1, Willow Lounge)
Duration: 30 min.
67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.