Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution.
5. Open Source Distributed Index
Managed Service using Elasticsearch and Kibana
Fully managed; Zero admin
Highly Available and Reliable
RESTful API for easy integration
Amazon
Elasticsearch
Service
6. Amazon Elasticsearch Service Leading Use Cases
Log Analytics &
Operational Monitoring
• Monitor the performance of
applications, web servers, and
hardware
• Easy to use, powerful data
visualization tools to detect
issues quickly
• Dig into logs in an intuitive,
fine-grained way
• Kibana provides fast, easy
visualization
Search
• Application or website provides
search capabilities over diverse
documents
• Tasked with making this knowledge
base searchable and accessible
• Text matching, faceting, filtering,
fuzzy search, auto complete,
highlighting, and other search
features
• Query API to support application
search
7. Leading enterprises trust Amazon Elasticsearch
Service for their search and analytics applications
Media &
Entertainment
Online
Services
Technology Other
8. Adobe Developer Platform (Adobe I/O)
P R O B L E M
• Cost effective monitor
for XL amount of log
data
• Over 200,000 API calls
per second at peak -
destinations, response
times, bandwidth
• Integrate seamlessly
with other components
of AWS eco-system.
S O L U T I O N
• Log data is routed with
Amazon Kinesis to
Amazon Elasticsearch
Service, then displayed
using AES Kibana
• Adobe team can easily
see traffic patterns and
error rates, quickly
identifying anomalies and
potential challenges
B E N E F I T S
• Management and
operational simplicity
• Flexibility to try out
different cluster config
during dev and test
Amazon
Kinesis
Streams
Spark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1
10. Shard 1 Shard 2 Shard 3 Shard 4
An index is a collection of documents, divided
into shards
Documents
Index
ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID
...
Indexing, compression
11. Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
12. How many instances?
The index size will be about the same as the
corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements
• Either local storage or 512 GB of Amazon Elastic
Block Store (EBS) per instance
• Example: 2 TB corpus will need 8 instances
– Assuming a replica and using EBS
– With i2.2xlarge nodes using 1.6 TB ephemeral storage, 4 nodes would
be enough
13. Cluster with no dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
14. Cluster with dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates
15. Cluster with zone awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3
16. Best practices
Data nodes = Storage needed/Storage per node
Use GP2 EBS volumes
Use 3 dedicated master nodes for production deployments
Enable Zone Awareness
Set indices.fielddata.cache.size = 40
17. Amazon Elasticsearch Service benefits
Easy to use
Open-source
compatible
Secure
Highly available
AWS integrated
Scalable
19. Kinesis Firehose overview
Delivery Stream: Underlying
AWS resource
Destination: Amazon ES,
Amazon Redshift, or Amazon
S3
Record: Put records in
streams to deliver to
destinations
21. Firehose delivery architecture with
transformations
intermediate
Amazon S3
bucket
backup S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records transformed
records
transformation failure
delivery failure
23. Best practices
Use smaller buffer sizes to increase throughput, but be
careful of concurrency
Use index rotation based on sizing
Default: stream limits: 2,000 transactions/second, 5,000
records/second, and 5 MB/second
24. Number of shards = index size/30GB
Define the number of shards
when you create the index
Less is more
Writes occupy 1 shard, reads
occupy all shards
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
25. Mapping controls how data is indexed
not_analyzed text is best for
Kibana visualizations
Define a _template to
apply to all new indexes
The template also defines the
number of shards
0 delete 1,3,5
1 get 2,3,4,6
2 head 1,7,9
3 post 2,8
4 put 24
Index
Writer
28. Best practices
• Use a template for settings
• Set number of shards based on 30GB per shard
• Best case, 1 active shard per node
• For analysis use cases, set not_analyzed on all fields
30. Amazon ES aggregations
Buckets – a collection of documents meeting some criterion
Metrics – calculations on the content of buckets
Bucket: time
Metric:count
31. Best practices
Make sure that your fields are not_analyzed
Visualizations are based on buckets/metrics
Use a histogram on the x-axis first, then sub-aggregate
32. Run Elasticsearch in the AWS cloud with Amazon
Elasticsearch Service
Use Kinesis Firehose to ingest data simply
Kibana for monitoring, Elasticsearch queries for
deeper analysisAmazon
Elasticsearch
Service
33. Find out more:
https://aws.amazon.com/elasticsearch-service/
AWS Centralized Logging:
https://aws.amazon.com/answers/logging/centralized-logging/
Elasticsearch at the AWS Database Blog:
https://aws.amazon.com/blogs/database/category/elasticsearch/
Or ask your Solutions Architect!
Amazon
Elasticsearch
Service