SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build Your Own Log Analytics
Solutions on AWS
Pranav Nambiar
Senior Manager (PM)
AWS
A N T 3 2 3
Tommy Li
Senior Software Architect
Autodesk
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
There seems to be a problem. Do I have the logs?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s log everything—Open the flood gates
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Too much data. Are we in a data glut?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data overload
Gaming IoT sensorsDevices
External
systems
and
applications Web content
Logs, logs, and
more logs …
Databases Servers NetworkingStorage
Internal
systems
and
applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Am I operating efficiently?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The problem with log files
Traditional approach = more time, less accurate, negative impact to business
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The solution
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Overview of Amazon Elasticsearch Service
Factors to consider while building log analytics solutions
Insights from Autodesk
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Amazon Elasticsearch Service?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service is a
fully managed service that makes it
easy to deploy, manage, and scale
Amazon ES and Kibana
Amazon Elasticsearch Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of Amazon Elasticsearch Service
Supports open source APIs
and tools
Drop-in replacement with no need
to learn new APIs or skills
Easy to use
Deploy a production-ready
Elasticsearch cluster in minutes
Scalable
Resize your cluster with a few
clicks or a single API call
Secure
Deploy into your VPC and restrict
access using security groups and IAM
Highly available
Replicate across Availability
Zones, with monitoring and
automated self-healing
Integrated with
other AWS Services
Seamless data ingestion, security,
auditing, and orchestration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leading Amazon Elasticsearch Service use cases
Application monitoring & root-
cause analysis
Security Information and Event
Management (SIEM)
IoT & mobile Business & clickstream analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service architecture
Amazon Elasticsearch Service
data nodes
Amazon Elasticsearch Service
master nodes
Amazon Elasticsearch Service domain
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Factors to consider while building log
analytics solutions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a log analytics solution
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a log analytics solution
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Log analytics—Decentralized
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Log analytics—Centralized
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key factors to consider
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingesting your data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data source Collect Transform Deliver
Ingestion pipeline tasks
Buffer
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Logstash
Transform
Kafka
Rabbit MQLogstash
Buffer
Logstash
Deliver
Beats
Fluentd
Logstash
Collect
Amazon
CloudWatch
Logs agent
Worker
nodes
Application
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using Amazon Simple Storage Service (Amazon S3) as a data lake
• Use S3 create events to trigger a Lambda function
• The Lambda transforms and delivers the data
• S3 offers highly durable storage for your singe source of truth
• Easy to set up, robust, and high scale
Files
S3 events
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Firehose for robust ingest
Source records
Data source with
Kinesis agent
Source records
Transformed
records
Delivery failure
Data transformation using
AWS Lambda
Transformation failure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing your Amazon Elasticsearch
Service domain
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tenancy—Single index vs. multiple indices
Factors to consider:
• Access patterns
• Performance
• Scaling
• Boundaries of access control
• Data retention
• Granularity of data backup/restore
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharding
38000000
39000000
40000000
41000000
42000000
43000000
44000000
45000000
0 2 4 6 8 10 12 14 16
Documentsloaded
Shards
Docs indexed M4.2xlarge
(8 vCPU)• Try to have #active-
shards per instance =
# vCPUs per instance
• Uniform shard size
drives better
performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Securing your Amazon Elasticsearch
Service domain
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Features Benefits
Authentication via AWS
Identity and Access
Management (IAM)
Role based authentication
Index level access control Granular control on a per-index basis
Auditing via AWS CloudTrail Ability to audit your calls
Monitoring and Alerting via
Amazon CloudWatch
Monitor health & usage metrics, set
up alarms to react to events
Security and monitoring
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Virtual Private Cloud (VPC)
• Access Amazon Elasticsearch Service
directly from customer’s VPC—data is
transferred within the Amazon
network
• VPC security groups for network level
access control
• Specify subnet to select availability
zone
• No additional cost
VPC
Subnet
Availability Zone 1
Security group
Data Master
Availability Zone 2
Subnet
Security group
Data Master
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use Amazon Cognito for Kibana sign-in
• Works for public and private endpoints
• Create users and roles within Amazon Cognito to control access
• Supports federated identities
• Access control is per-domain
Authentication
Kibana
Permissions PermissionsRole
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling your Amazon Elasticsearch Service
domain
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Understand your bottleneck
• Disk vs. CPU vs. Memory
Name Metric Threshold Periods
CPUUtilization/ MasterCPUUtilization Average >= 80% 3
JVMMemoryPressure/
MasterJVMMemoryPressure
Maximum >= 80% 3
FreeStorageSpace Minimum <= (25% of
avail space)
1
• Master node capacity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Instance Max Storage Workload
T2 3.5TB You want to do dev/QA
M3, M4 150TB Your data and queries are “average”
R3, R4 150TB You have higher request volumes, larger
documents, or are using aggregations heavily
C4 150TB You need to support high concurrency
I2, I3 1.5 PB You have high IOPS and XL storage
requirements
Which instance type?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summing it all up
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key takeaways
• Understand your workload requirements well to get your architecture
right
• Use Amazon S3 with AWS Lambda, and Amazon Kinesis Firehose for
robust ingestion pipeline
• Multi-tenancy can drive highest utilization but there are tradeoffs
• Use IAM and VPC to secure your data
• Understand your bottlenecks, scale smartly
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Instrumentation project drivers
Architecture
Amazon Elasticsearch Service
Lessons learned
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we do?
Autodesk gives you the
power to make anything.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Make anything
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we are doing?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autodesk Forge platform
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer workflow
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Support scenario
“Hey Autodesk support, I am not
sure why I cannot see the model I
uploaded in the viewer yet? I
need to share this for review with
my contractors, and this is
holding up the project.”
Customer experience really matters
“Why can't I see my model yet?”
“Failures in todays complex, distributed
and interconnected systems are not the
exception. They are normal cases, not
predictable, and not avoidable”
Uwe Friedrichsen
https://www.slideshare.net/ufried/patterns-of-resilience
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why instrumentation?
• Need a consistent way to collect and measure metrics of Forge services,
so that ...
• Monitoring
• Real-time operational problem detection and notification
• (MTTD improvement)
• Forensic
• Incident management
• (MTTR improvement)
• Analytics
• Derive insights to drive features, resiliency
• (MTBF improvement)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pain points
• Lack of distributed tracing capabilities across multiple services
• Log data quality issue
• Timeliness, reliability, completeness, consistent format
• Scalability and cost
• Tool centric
• Data lock in, data access problem, efficiency
• Analytics quality issue
• timeliness, completeness, accuracy
• Multiple pipelines, integration framework
• Incompatibility, integration complexity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture approach
• It is a platform!
• Separation of concerns, well-defined interfaces
• Enabling to draft best in class solution
• Use managed services
• Manage less to gain more value
• Simplification
• One solution choice per interface
• Metrics standardization
• Standard metrics
• Forge service specific metrics (extensions)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grafana
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unified logging
• Problem statement: log data in various formats
• Cross service tracing impossible
• Complexity for monitoring, forensics, analytics
• Solution
• Standardize log data model
• Annotate log records with distributed tracing states
• Adopt OpenTracing http://opentracing.io
• Provide SDK supporting major languages
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unified logging—Example
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Unified logging  End to end tracing (X-Ray)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Forge service onboarding—Instrumentation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why Amazon Elasticsearch Service?
• Fully managed
• Highly available
• Simple provisioning and scaling process
• Instance types selection
• Seamless integration with Amazon Kinesis Data
Firehose
• Amazon CloudWatch metrics for monitoring
• Kibana built-in
• Cost-effective
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service—Sizing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kibana widely adopted
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing your Amazon Elasticsearch Service setup
• Before: Single index for all
services
• Elasticsearch queries were slow
• Failed to onboard more services
• Exceeded 1000 fields in an index
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Options considered
• Option one: Index per service
• Each index can have up to 1000 fields
• Better query throughput: search on specific index
• Increased provisioning and manageability complexity
• Option two: Keep single index, set max fields = 2000
• Performance implication (ES anti-pattern)
• Can’t scale
• Option three: Keep single index, core fields + N custom fields
• Operationally simple
• Horrible usability
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Other considerations
• Most services < 50GB daily ingestion
• Guideline following: 1 CPU per active shard
• Decision: One shard (and one replica) per index
• Performance load test (very important) - what we have done
• Simulate production traffic writing
• Simulate distributed tracing query - 500 queries per second
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elasticsearch Service—Current state
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary and lessons learned
• Start-up model
• Inception  Incubation  Full funding  Adoption
• Full business values depends on adoption velocity
• Company mandate helps in adoption
• Sizing exercise—finance partners early involvement
• Be agile
• Don’t overdo
• Prioritize risk, divide, and conquer
• Be determined to rip things out and start over
• Use managed services where possible
• Great partnership with AWS product teams and solution
architects
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Time: 15 minutes after this session
Location: Speaker Lounge (ARIA East, Level 1, Willow Lounge)
Duration: 30 min.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...
Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...
Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS (WIN324-R1) - AWS re...
Design, Deploy, and Optimize Microsoft SQL Server on AWS (WIN324-R1) - AWS re...Design, Deploy, and Optimize Microsoft SQL Server on AWS (WIN324-R1) - AWS re...
Design, Deploy, and Optimize Microsoft SQL Server on AWS (WIN324-R1) - AWS re...
 
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
 
Customer Uses of Data Lakes
Customer Uses of Data LakesCustomer Uses of Data Lakes
Customer Uses of Data Lakes
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
 
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
Building Your Geospatial Data Lake (WPS324) - AWS re:Invent 2018
 
AWS reInvent 2018 recap edition
AWS reInvent 2018 recap editionAWS reInvent 2018 recap edition
AWS reInvent 2018 recap edition
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive
 
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 

Semelhante a Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018

New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your Workload
Amazon Web Services
 

Semelhante a Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018 (20)

Serverless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesServerless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best Practices
 
Deep Dive into AWS X-Ray: Monitor Modern Applications (DEV324) - AWS re:Inven...
Deep Dive into AWS X-Ray: Monitor Modern Applications (DEV324) - AWS re:Inven...Deep Dive into AWS X-Ray: Monitor Modern Applications (DEV324) - AWS re:Inven...
Deep Dive into AWS X-Ray: Monitor Modern Applications (DEV324) - AWS re:Inven...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
Best Practices to Secure Data Lake on AWS (ANT327) - AWS re:Invent 2018
 
Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28
 
Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
 
Scaling Up To and Beyond 10M Users
Scaling Up To and Beyond 10M UsersScaling Up To and Beyond 10M Users
Scaling Up To and Beyond 10M Users
 
Using Amazon VPC Flow Logs for Predictive Security Analytics (NET319) - AWS r...
Using Amazon VPC Flow Logs for Predictive Security Analytics (NET319) - AWS r...Using Amazon VPC Flow Logs for Predictive Security Analytics (NET319) - AWS r...
Using Amazon VPC Flow Logs for Predictive Security Analytics (NET319) - AWS r...
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfCome scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your Workload
 
How can your business benefit from going Serverless
How can your business benefit from going ServerlessHow can your business benefit from going Serverless
How can your business benefit from going Serverless
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
How can your business benefit from going serverless?
How can your business benefit from going serverless?How can your business benefit from going serverless?
How can your business benefit from going serverless?
 
AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018
AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018
AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build Your Own Log Analytics Solutions on AWS Pranav Nambiar Senior Manager (PM) AWS A N T 3 2 3 Tommy Li Senior Software Architect Autodesk
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. There seems to be a problem. Do I have the logs?
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s log everything—Open the flood gates
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Too much data. Are we in a data glut?
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data overload Gaming IoT sensorsDevices External systems and applications Web content Logs, logs, and more logs … Databases Servers NetworkingStorage Internal systems and applications
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Am I operating efficiently?
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The problem with log files Traditional approach = more time, less accurate, negative impact to business
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The solution
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Overview of Amazon Elasticsearch Service Factors to consider while building log analytics solutions Insights from Autodesk
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Amazon Elasticsearch Service?
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elasticsearch Service is a fully managed service that makes it easy to deploy, manage, and scale Amazon ES and Kibana Amazon Elasticsearch Service
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits of Amazon Elasticsearch Service Supports open source APIs and tools Drop-in replacement with no need to learn new APIs or skills Easy to use Deploy a production-ready Elasticsearch cluster in minutes Scalable Resize your cluster with a few clicks or a single API call Secure Deploy into your VPC and restrict access using security groups and IAM Highly available Replicate across Availability Zones, with monitoring and automated self-healing Integrated with other AWS Services Seamless data ingestion, security, auditing, and orchestration
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Leading Amazon Elasticsearch Service use cases Application monitoring & root- cause analysis Security Information and Event Management (SIEM) IoT & mobile Business & clickstream analytics
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elasticsearch Service architecture Amazon Elasticsearch Service data nodes Amazon Elasticsearch Service master nodes Amazon Elasticsearch Service domain
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Factors to consider while building log analytics solutions
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a log analytics solution
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a log analytics solution
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Log analytics—Decentralized
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Log analytics—Centralized
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key factors to consider
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ingesting your data
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data source Collect Transform Deliver Ingestion pipeline tasks Buffer
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Logstash Transform Kafka Rabbit MQLogstash Buffer Logstash Deliver Beats Fluentd Logstash Collect Amazon CloudWatch Logs agent Worker nodes Application
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using Amazon Simple Storage Service (Amazon S3) as a data lake • Use S3 create events to trigger a Lambda function • The Lambda transforms and delivers the data • S3 offers highly durable storage for your singe source of truth • Easy to set up, robust, and high scale Files S3 events
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose for robust ingest Source records Data source with Kinesis agent Source records Transformed records Delivery failure Data transformation using AWS Lambda Transformation failure
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimizing your Amazon Elasticsearch Service domain
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tenancy—Single index vs. multiple indices Factors to consider: • Access patterns • Performance • Scaling • Boundaries of access control • Data retention • Granularity of data backup/restore
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sharding 38000000 39000000 40000000 41000000 42000000 43000000 44000000 45000000 0 2 4 6 8 10 12 14 16 Documentsloaded Shards Docs indexed M4.2xlarge (8 vCPU)• Try to have #active- shards per instance = # vCPUs per instance • Uniform shard size drives better performance
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Securing your Amazon Elasticsearch Service domain
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Features Benefits Authentication via AWS Identity and Access Management (IAM) Role based authentication Index level access control Granular control on a per-index basis Auditing via AWS CloudTrail Ability to audit your calls Monitoring and Alerting via Amazon CloudWatch Monitor health & usage metrics, set up alarms to react to events Security and monitoring
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Virtual Private Cloud (VPC) • Access Amazon Elasticsearch Service directly from customer’s VPC—data is transferred within the Amazon network • VPC security groups for network level access control • Specify subnet to select availability zone • No additional cost VPC Subnet Availability Zone 1 Security group Data Master Availability Zone 2 Subnet Security group Data Master
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use Amazon Cognito for Kibana sign-in • Works for public and private endpoints • Create users and roles within Amazon Cognito to control access • Supports federated identities • Access control is per-domain Authentication Kibana Permissions PermissionsRole
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scaling your Amazon Elasticsearch Service domain
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Understand your bottleneck • Disk vs. CPU vs. Memory Name Metric Threshold Periods CPUUtilization/ MasterCPUUtilization Average >= 80% 3 JVMMemoryPressure/ MasterJVMMemoryPressure Maximum >= 80% 3 FreeStorageSpace Minimum <= (25% of avail space) 1 • Master node capacity
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Instance Max Storage Workload T2 3.5TB You want to do dev/QA M3, M4 150TB Your data and queries are “average” R3, R4 150TB You have higher request volumes, larger documents, or are using aggregations heavily C4 150TB You need to support high concurrency I2, I3 1.5 PB You have high IOPS and XL storage requirements Which instance type?
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summing it all up
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key takeaways • Understand your workload requirements well to get your architecture right • Use Amazon S3 with AWS Lambda, and Amazon Kinesis Firehose for robust ingestion pipeline • Multi-tenancy can drive highest utilization but there are tradeoffs • Use IAM and VPC to secure your data • Understand your bottlenecks, scale smartly
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Instrumentation project drivers Architecture Amazon Elasticsearch Service Lessons learned
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What we do? Autodesk gives you the power to make anything.
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Make anything
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How we are doing?
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autodesk Forge platform
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Customer workflow
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Support scenario “Hey Autodesk support, I am not sure why I cannot see the model I uploaded in the viewer yet? I need to share this for review with my contractors, and this is holding up the project.” Customer experience really matters “Why can't I see my model yet?”
  • 47. “Failures in todays complex, distributed and interconnected systems are not the exception. They are normal cases, not predictable, and not avoidable” Uwe Friedrichsen https://www.slideshare.net/ufried/patterns-of-resilience
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why instrumentation? • Need a consistent way to collect and measure metrics of Forge services, so that ... • Monitoring • Real-time operational problem detection and notification • (MTTD improvement) • Forensic • Incident management • (MTTR improvement) • Analytics • Derive insights to drive features, resiliency • (MTBF improvement)
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pain points • Lack of distributed tracing capabilities across multiple services • Log data quality issue • Timeliness, reliability, completeness, consistent format • Scalability and cost • Tool centric • Data lock in, data access problem, efficiency • Analytics quality issue • timeliness, completeness, accuracy • Multiple pipelines, integration framework • Incompatibility, integration complexity
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture approach • It is a platform! • Separation of concerns, well-defined interfaces • Enabling to draft best in class solution • Use managed services • Manage less to gain more value • Simplification • One solution choice per interface • Metrics standardization • Standard metrics • Forge service specific metrics (extensions)
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System architecture
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Grafana
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified logging • Problem statement: log data in various formats • Cross service tracing impossible • Complexity for monitoring, forensics, analytics • Solution • Standardize log data model • Annotate log records with distributed tracing states • Adopt OpenTracing http://opentracing.io • Provide SDK supporting major languages
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified logging—Example
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified logging  End to end tracing (X-Ray)
  • 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Forge service onboarding—Instrumentation
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why Amazon Elasticsearch Service? • Fully managed • Highly available • Simple provisioning and scaling process • Instance types selection • Seamless integration with Amazon Kinesis Data Firehose • Amazon CloudWatch metrics for monitoring • Kibana built-in • Cost-effective
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elasticsearch Service—Sizing
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kibana widely adopted
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimizing your Amazon Elasticsearch Service setup • Before: Single index for all services • Elasticsearch queries were slow • Failed to onboard more services • Exceeded 1000 fields in an index
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Options considered • Option one: Index per service • Each index can have up to 1000 fields • Better query throughput: search on specific index • Increased provisioning and manageability complexity • Option two: Keep single index, set max fields = 2000 • Performance implication (ES anti-pattern) • Can’t scale • Option three: Keep single index, core fields + N custom fields • Operationally simple • Horrible usability
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Other considerations • Most services < 50GB daily ingestion • Guideline following: 1 CPU per active shard • Decision: One shard (and one replica) per index • Performance load test (very important) - what we have done • Simulate production traffic writing • Simulate distributed tracing query - 500 queries per second
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elasticsearch Service—Current state
  • 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary and lessons learned • Start-up model • Inception  Incubation  Full funding  Adoption • Full business values depends on adoption velocity • Company mandate helps in adoption • Sizing exercise—finance partners early involvement • Be agile • Don’t overdo • Prioritize risk, divide, and conquer • Be determined to rip things out and start over • Use managed services where possible • Great partnership with AWS product teams and solution architects
  • 65. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 66. Time: 15 minutes after this session Location: Speaker Lounge (ARIA East, Level 1, Willow Lounge) Duration: 30 min.
  • 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.