SlideShare uma empresa Scribd logo
1 de 41
OpenSearch
(Just About) Everything You Need to
Know About its Architecture
Seth Muthukaruppan
Consultant, Search Technologies
Instaclustr By NetApp
© Instaclustr Pty Limited, 2021
Data Con LA 2022
Agenda
● OpenSearch Overview
● Use Cases
● Apache Lucene
● OpenSearch Clustering
● Data Organization
● Document Indexing
● Document Searching
● Aggregations
© Instaclustr Pty Limited, 2021
● OpenSearch is a search and analytics engine built with the Apache
Lucene search library
● Extends Lucene to provide a distributed, horizontally scalable, and
highly available search and analytics platform
● OpenSearch is derived from Elasticsearch 7.10.2 and Kibana
7.10.2 from Elastic Co
● OpenSearch is 100% open-source and Apache 2.0 licensed - Free
to view, use, change and distribute the code
● Community driven and maintained by the open-source community
with backing from industry leaders such as Amazon, Red Hat
Enterprise-Grade
Same core features
with advanced add-
ons
100% Open Source
Apache 2.0
Free to view, use,
change and distribute
the code
Community-Driven
Developed and
maintained by open
source community
© Instaclustr Pty Limited, 2021
© Instaclustr Pty Limited, 2021
Search
Extremely fast
Powerful text search
Natural language
Built in analyzers
Fuzzy match
Auto completion
Scalable
Distributed architecture
Horizontally scalable
Thousands of nodes
Petabytes of data
Analytics
Faceting
Aggregations
Built-in reporting
Anomaly detection
Highly Available
Data replication
Zone awareness
Snapshots
Cross cluster replication
Ecosystem
Dashboard
Logstash
Beats
REST Clients
OpenSearch: Core Features
© Instaclustr Pty Limited, 2021
OpenSearch: Building Blocks
OpenSearch
Based on Elasticsearch 7.10.2
Built with Apache Lucene
Elasticsearch wire compatible (7.10.2)
OpenSearch Dashboard
Based on Kibana 7.10.2
Clients
Compatible with Logstash, beats,
REST clients for 7.10 ES
Upgrade Path
Rolling upgrade from ES 7.x(7.10)
Restart upgrade from ES 6.x
© Instaclustr Pty Limited, 2021
OpenSearch: Use Cases
© Instaclustr Pty Limited, 2021
OpenSearch: Use Cases
Log Analysis
● Search for patterns
● Normalize logs
● Correlate logs
● Filter logs
Document Store
● Natural language search
● Search with auto-
correction
● Search as you type
● Synonym search
Analytics
● Historical data
● Trend analysis
● Forecasting
e-Commerce
● Product search
● Product recommendations
● Auto completion
● Stock on hand
● Sales by category
© Instaclustr Pty Limited, 2021
OpenSearch: More Use Cases
Monitoring
● Network
● Hosts
● Sensors
SIEM
● Threat analysis
● Integrity monitoring
● Anomaly detection
● Compliance
APM
● Real-time performance
● Latency, load
● Failures
Time Series
● Machine learning
● Anomaly detection
© Instaclustr Pty Limited, 2021
OpenSearch: Apache Lucene
● Lucene is an open source, high-performance search library built with Java,
● Used by some of the popular search engines such as Apache Solr, Apache
Nutch, OpenSearch, and Elasticsearch
● Lucene uses an inverted search index to achieve incredibly fast search
results
● The inverted search index provides a mapping of terms to documents that
contain those terms
● Lucene supports storing several types of information such as numbers,
strings, and text fields
● Lucene has a rich search interface with support for natural language
searches, wildcard searches, fuzzy, and proximity searches
Apache Lucene: Overview
Apache Lucene: Inverted Index
1 Term Frequency Document
opensearch 1,1,3 1,2,3
search 1 1
analytics 1 1
suite 1 1
alv 1 2
licensed 1 2
includes 1 3
dashboards 1 3
© Instaclustr Pty Limited, 2021
OpenSearch is
a search and
analytics suite
2
OpenSearch is
ALv2 Licensed
3
OpenSearch
includes
OpenSearch
and
OpenSearch
Dashboards
© Instaclustr Pty Limited, 2021
OpenSearch: Cluster
● Lucene is a search library but not a scalable search engine
● OpenSearch uses Lucene at the core for search but has additional
capabilities that make it a full-featured search and analytics engine
● An OpenSearch cluster is a distributed collection of nodes that each perform
one or more cluster operations
● The cluster is horizontally scalable - adding additional nodes allows the
cluster capacity to increase linearly while maintaining similar performance
● With data replication and maintaining data across nodes in the cluster,
OpenSearch can handle node failures with no data loss or downtime
● Nodes in the cluster are differentiated based on the specific functions that
they perform although a node can perform any or all cluster operations
OpenSearch Cluster: Basics
OpenSearch Cluster: Composition
© Instaclustr Pty Limited, 2021
Master
Eligible
Master
Master
Eligible
Client Data Data Data Client
● Master
○ Responsible for maintaining the health and state of the cluster
○ Coordinator for creating, deleting, managing indices and shards
● Master-eligible nodes
○ Candidates master nodes - only one master at any given time
○ An odd number of nodes is required for tie-breaking
● Data nodes
○ Hold the actual data and handle ingestion, search, and aggregation
○ Run CPU and memory-intensive operations
● Client nodes
○ Act as a gateway and help load balance incoming requests
OpenSearch Cluster: Node Types
© Instaclustr Pty Limited, 2021
OpenSearch: Data Organization
Data Organization: Overview
Data Organization: Indices
● An Index is the basic unit by which end users manage their data
○ Similar to a collection in a NoSQL database
● Indices contain one or more documents which can be
○ a paragraph from a book
○ a logline
○ a tweet
○ weather data for a city
● Typically similar documents are grouped into the same index
● Indices are internally broken down into multiple sub-indices called shards
● Shards are then directly mapped to Lucene indices
Data Organization: Documents
● Documents are JSON structures that hold a collection of fields and values
○ Fields are key-value pairs that make up a document
○ Fields can be of several different types such as numbers, text,
keywords, geo points, etc
● Typically documents in an index can have similar content
Data Organization: Shards
● An index is broken up into one or more smaller units called shards
● Each shard maps to an underlying Lucene unit called index
○ In other words, each index is mapped to one or more Lucene indices
aka shards
● The number of shards per index is a configurable parameter and has major
implications on the performance of the cluster
● Search operations are performed at the shard level and having multiple
shards help with increasing the search speed
● Increasing the number of shards increases the cluster state information
which means more resources will be needed to manage
○ General practice is to keep the shard size between 30 and 50GB
Data Organization: Primary/Replica Shards
● To guard against data loss, OpenSearch allows configuring replicas
● As the index is stored in shards, configuring replicas cause replica shards to
be created and stored
● OpenSearch tries to allocate replica shards to nodes other than the ones
where the primary shard resides
● Number of replicas is an index-level setting and can be changed at any time
● With replicas, a node failure doesn’t lead to data loss or a data unavailability
○ Data can still be served from the replica copies
● Replica shards come with a price
○ Storing replicas require additional storage space
○ Can slow down indexing as data needs to be indexed in to both primary
and replica shards
© Instaclustr Pty Limited, 2021
OpenSearch: Document Indexing
● Input data to Elasticsearch/OpenSearch is analyzed and tokenized
before it gets stored
○ OpenSearch also stores the original document in a special field
called the _source
● Analyzers and normalizers convert the input fields into a sequence of
terms which then gets stored in the Lucene inverted index
● Pre-built analyzers and normalizers are available for common use cases
○ Standard analyzer breaks text into grammar-based tokens
○ whitespace analyzer breaks text into terms based on whitespace
Document Indexing: Basics
● An analyzer is a combination of character filters, tokenizers, and token
filters
● Custom analyzers can be built using the appropriate set of filters and
tokenizers.
Document Indexing: Analyzers
● Character filters pre-process the input text before forwarding it to the
tokenizer
● They work by adding, removing, or changing characters in the input text
● For example, the built-in HTML strip character filter strips HTML
elements and decodes HTML entities
● Multiple character filters can be specified and they will be applied in
order
Document Indexing: Character Filters
● Tokenizers convert the input stream of characters into tokens based on
certain criteria
○ For instance, the standard tokenizer breaks text into tokens based
on word boundaries and also removes punctuation
○ The whitespace tokenizer breaks text into tokens at whitespaces
● Token filters post-process the tokens from the tokenizer
○ Tokens can be added, removed, or modified
○ For example, the ASCII folding filter will convert Unicode characters
to the closest ASCII equivalent
○ The stemming token filter applies stemming rules to convert words
to their root form
Document Indexing: Tokenizers and Token Filters
● Text Type
○ Primarily used to index human-generated text such as tweets, social
media posts, book contents, product descriptions
○ Text fields are particularly useful for performing phrase queries,
fuzzy queries, etc
● Keyword Type
○ Typically used for indexing structured content such as names, ids,
ISBN, categories, etc
○ Keyword fields are particularly useful for sorting, aggregations and
running scripts as they are stored in a columnar format
Document Indexing: Field Data Types
● Numeric Type
○ Used for numeric data such as integers, unsigned integers, floats
○ When choosing a numeric type, the smallest type that could fit the
input range should be chosen to conserve storage space
○ Numeric fields are stored as BKD trees
● Geo Point Type
○ Geo point is used to represent latitude and longitude data
○ With geo points, queries that rely on location, distance can be
performed
○ BKD fields are used to store geo points.
Document Indexing: More Field Data Types
© Instaclustr Pty Limited, 2021
OpenSearch: Document Searching
● OpenSearch uses a distributed search algorithm to match documents
against the input
● Search can be exact match based such as keyword searches or
relevancy based such as text searches
● OpenSearch focuses more on search speed than accuracy. The level of
required accuracy is typically configurable
● OpenSearch provides a near real-time search whereby all documents
will be available for search except for the most recently indexed
documents that have not been refreshed. By default documents are
refreshed every sec
Document Searching: Basics
● OpenSearch distributed search algorithm uses a query and fetch phase
● Query phase
○ Query sent to all the shards associated with the index. Shards can
be primary or replica
○ Each shard will run the search locally and return the results.
○ Results only contain the document ids, scores, and other relevant
metadata but not the actual document
● Fetch Phase
○ Query results from all the shards are ordered to form the final set of
results.
○ A fetch is performed to rerieve the actual documents from the nodes
Document Searching: Search Phases
● Relevancy searching such as searching for words against text fields
involve scoring to determine which documents are the closese match
● Document scoring is achieved by the similarity module. The default
similarity is BM25
Document Searching: Scoring Algorithm
Score = TF * IDF * Norm
TF: How frequently the given term appears in the field. The higher the number of times the term appears in the
document, the more likely the document is to be relevant.
IDF: How frequently the term appears across all documents in the index. If it appears more commonly, then
the term is less relevant
Norm: Length normalization. For the same term frequency, a shorter field is more relevant than a longer field.
© Instaclustr Pty Limited, 2021
OpenSearch: Aggregations
● OpenSearch is not only a search engine but also have built-in,
advanced analytics capabilities
● Aggregations allow filtering and categorizing documents, calculate
metrics, and build aggregation pipelines by combining multiple
aggregations
● OpenSearch has support for many aggregation types which can be
calssified as
○ Metrics Aggregations
○ Bucket Aggregations
○ Pipeline Aggregations
Aggregations: Basics
Aggregations: Different Types
● Bucket aggregations categorize the matching set of documents into
buckets based on a bucketing criteria
● Bucketing criteria could be based on unique values of a field (terms
aggregation), date range (date histogram) aggregation, etc
● Bucket aggregations can be used to
○ paginate all buckets (composite aggregation)
○ provide faceting
○ act as inputs to metric aggregations
Aggregations: Bucket Aggregations
● Metrics aggregations calculate metrics on the values generated from the
documents. The values could be specific fields in the documents being
aggregated or generated dynamically through scripts
● They can be included as sub-aggregations to bucket aggregations and
will produce metrics per aggregated bucket
● Numeric metric aggregations produce numeric metrics such as max,
min, sum, and average values
● Some metric aggregations do produce output that are non-numeric. A
good example is the Top hits aggregation. Used as a sub-aggregation, it
produces top matching documents per bucket.
Aggregations: Metric Aggregations
● Pipeline aggregations can be used to compute metrics and they act on
the output of other aggregations making it possible to build a chain of
aggregations
● Pipeline aggregations can be further categorized as parent and sibling
pipeline aggregations
○ Parent pipeline aggregations compute new aggregations based on
the output of the parent aggregation
○ Sibling pipeline aggregations compute new aggregations based on
output from one or more sibling aggregations
Aggregations: Pipeline Aggregations
© Instaclustr Pty Limited, 2021
OpenSearch: Eco System
© Instaclustr Pty Limited, 2021
Seth Muthukaruppan
Seth.Muthukaruppan@netapp.com
linkedin.com/Seth.Muthukaruppan
Questions & Comments

Mais conteúdo relacionado

Mais procurados

Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost ManagementAmazon Web Services
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & LoggingJason Poley
 
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기Amazon Web Services Korea
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3Mark Cohen
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
EKS workshop 살펴보기
EKS workshop 살펴보기EKS workshop 살펴보기
EKS workshop 살펴보기Jinwoong Kim
 
Introduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsIntroduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsAmazon Web Services
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
 
Introduction to Amazon Elasticsearch Service
Introduction to  Amazon Elasticsearch ServiceIntroduction to  Amazon Elasticsearch Service
Introduction to Amazon Elasticsearch ServiceAmazon Web Services
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Getting Started with AWS Compute Services
Getting Started with AWS Compute ServicesGetting Started with AWS Compute Services
Getting Started with AWS Compute ServicesAmazon Web Services
 
Protecting Your Data With AWS KMS and AWS CloudHSM
Protecting Your Data With AWS KMS and AWS CloudHSM Protecting Your Data With AWS KMS and AWS CloudHSM
Protecting Your Data With AWS KMS and AWS CloudHSM Amazon Web Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 

Mais procurados (20)

Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost Management
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & Logging
 
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
AWS Summit Seoul 2023 | AWS에서 OpenTelemetry 기반의 애플리케이션 Observability 구축/활용하기
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
EKS workshop 살펴보기
EKS workshop 살펴보기EKS workshop 살펴보기
EKS workshop 살펴보기
 
Introduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsIntroduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless Applications
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
Deep Dive: Amazon RDS
Deep Dive: Amazon RDSDeep Dive: Amazon RDS
Deep Dive: Amazon RDS
 
Introduction to Amazon Elasticsearch Service
Introduction to  Amazon Elasticsearch ServiceIntroduction to  Amazon Elasticsearch Service
Introduction to Amazon Elasticsearch Service
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Getting Started with AWS Compute Services
Getting Started with AWS Compute ServicesGetting Started with AWS Compute Services
Getting Started with AWS Compute Services
 
Managing Security on AWS
Managing Security on AWSManaging Security on AWS
Managing Security on AWS
 
Protecting Your Data With AWS KMS and AWS CloudHSM
Protecting Your Data With AWS KMS and AWS CloudHSM Protecting Your Data With AWS KMS and AWS CloudHSM
Protecting Your Data With AWS KMS and AWS CloudHSM
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
infrastructure as code
infrastructure as codeinfrastructure as code
infrastructure as code
 

Semelhante a Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know About Its Architecture

Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Thierry Gayet
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch ArchitechtureAnurag Sharma
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdfAbhi Jain
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. ElasticsearchSelecto
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
web clustering engines
web clustering enginesweb clustering engines
web clustering enginesArun TR
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdferamfatima43
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibDavid Nzoputa Ofili
 
Text search with Elasticsearch on AWS
Text search with Elasticsearch on AWSText search with Elasticsearch on AWS
Text search with Elasticsearch on AWSŁukasz Przybyłek
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfcadejaumafiq
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Info 2402 irt-chapter_2
Info 2402 irt-chapter_2Info 2402 irt-chapter_2
Info 2402 irt-chapter_2Shahriar Rafee
 

Semelhante a Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know About Its Architecture (20)

Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)Centralization of all log (application, docker, security, ...)
Centralization of all log (application, docker, security, ...)
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch Architechture
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdf
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. Elasticsearch
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
web clustering engines
web clustering enginesweb clustering engines
web clustering engines
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Text search with Elasticsearch on AWS
Text search with Elasticsearch on AWSText search with Elasticsearch on AWS
Text search with Elasticsearch on AWS
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Solr5
Solr5Solr5
Solr5
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Info 2402 irt-chapter_2
Info 2402 irt-chapter_2Info 2402 irt-chapter_2
Info 2402 irt-chapter_2
 

Mais de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 

Último (20)

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 

Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know About Its Architecture

  • 1. OpenSearch (Just About) Everything You Need to Know About its Architecture Seth Muthukaruppan Consultant, Search Technologies Instaclustr By NetApp © Instaclustr Pty Limited, 2021 Data Con LA 2022
  • 2. Agenda ● OpenSearch Overview ● Use Cases ● Apache Lucene ● OpenSearch Clustering ● Data Organization ● Document Indexing ● Document Searching ● Aggregations © Instaclustr Pty Limited, 2021
  • 3. ● OpenSearch is a search and analytics engine built with the Apache Lucene search library ● Extends Lucene to provide a distributed, horizontally scalable, and highly available search and analytics platform ● OpenSearch is derived from Elasticsearch 7.10.2 and Kibana 7.10.2 from Elastic Co ● OpenSearch is 100% open-source and Apache 2.0 licensed - Free to view, use, change and distribute the code ● Community driven and maintained by the open-source community with backing from industry leaders such as Amazon, Red Hat
  • 4. Enterprise-Grade Same core features with advanced add- ons 100% Open Source Apache 2.0 Free to view, use, change and distribute the code Community-Driven Developed and maintained by open source community © Instaclustr Pty Limited, 2021
  • 5. © Instaclustr Pty Limited, 2021 Search Extremely fast Powerful text search Natural language Built in analyzers Fuzzy match Auto completion Scalable Distributed architecture Horizontally scalable Thousands of nodes Petabytes of data Analytics Faceting Aggregations Built-in reporting Anomaly detection Highly Available Data replication Zone awareness Snapshots Cross cluster replication Ecosystem Dashboard Logstash Beats REST Clients OpenSearch: Core Features
  • 6. © Instaclustr Pty Limited, 2021 OpenSearch: Building Blocks OpenSearch Based on Elasticsearch 7.10.2 Built with Apache Lucene Elasticsearch wire compatible (7.10.2) OpenSearch Dashboard Based on Kibana 7.10.2 Clients Compatible with Logstash, beats, REST clients for 7.10 ES Upgrade Path Rolling upgrade from ES 7.x(7.10) Restart upgrade from ES 6.x
  • 7. © Instaclustr Pty Limited, 2021 OpenSearch: Use Cases
  • 8. © Instaclustr Pty Limited, 2021 OpenSearch: Use Cases Log Analysis ● Search for patterns ● Normalize logs ● Correlate logs ● Filter logs Document Store ● Natural language search ● Search with auto- correction ● Search as you type ● Synonym search Analytics ● Historical data ● Trend analysis ● Forecasting e-Commerce ● Product search ● Product recommendations ● Auto completion ● Stock on hand ● Sales by category
  • 9. © Instaclustr Pty Limited, 2021 OpenSearch: More Use Cases Monitoring ● Network ● Hosts ● Sensors SIEM ● Threat analysis ● Integrity monitoring ● Anomaly detection ● Compliance APM ● Real-time performance ● Latency, load ● Failures Time Series ● Machine learning ● Anomaly detection
  • 10. © Instaclustr Pty Limited, 2021 OpenSearch: Apache Lucene
  • 11. ● Lucene is an open source, high-performance search library built with Java, ● Used by some of the popular search engines such as Apache Solr, Apache Nutch, OpenSearch, and Elasticsearch ● Lucene uses an inverted search index to achieve incredibly fast search results ● The inverted search index provides a mapping of terms to documents that contain those terms ● Lucene supports storing several types of information such as numbers, strings, and text fields ● Lucene has a rich search interface with support for natural language searches, wildcard searches, fuzzy, and proximity searches Apache Lucene: Overview
  • 12. Apache Lucene: Inverted Index 1 Term Frequency Document opensearch 1,1,3 1,2,3 search 1 1 analytics 1 1 suite 1 1 alv 1 2 licensed 1 2 includes 1 3 dashboards 1 3 © Instaclustr Pty Limited, 2021 OpenSearch is a search and analytics suite 2 OpenSearch is ALv2 Licensed 3 OpenSearch includes OpenSearch and OpenSearch Dashboards
  • 13. © Instaclustr Pty Limited, 2021 OpenSearch: Cluster
  • 14. ● Lucene is a search library but not a scalable search engine ● OpenSearch uses Lucene at the core for search but has additional capabilities that make it a full-featured search and analytics engine ● An OpenSearch cluster is a distributed collection of nodes that each perform one or more cluster operations ● The cluster is horizontally scalable - adding additional nodes allows the cluster capacity to increase linearly while maintaining similar performance ● With data replication and maintaining data across nodes in the cluster, OpenSearch can handle node failures with no data loss or downtime ● Nodes in the cluster are differentiated based on the specific functions that they perform although a node can perform any or all cluster operations OpenSearch Cluster: Basics
  • 15. OpenSearch Cluster: Composition © Instaclustr Pty Limited, 2021 Master Eligible Master Master Eligible Client Data Data Data Client
  • 16. ● Master ○ Responsible for maintaining the health and state of the cluster ○ Coordinator for creating, deleting, managing indices and shards ● Master-eligible nodes ○ Candidates master nodes - only one master at any given time ○ An odd number of nodes is required for tie-breaking ● Data nodes ○ Hold the actual data and handle ingestion, search, and aggregation ○ Run CPU and memory-intensive operations ● Client nodes ○ Act as a gateway and help load balance incoming requests OpenSearch Cluster: Node Types
  • 17. © Instaclustr Pty Limited, 2021 OpenSearch: Data Organization
  • 19. Data Organization: Indices ● An Index is the basic unit by which end users manage their data ○ Similar to a collection in a NoSQL database ● Indices contain one or more documents which can be ○ a paragraph from a book ○ a logline ○ a tweet ○ weather data for a city ● Typically similar documents are grouped into the same index ● Indices are internally broken down into multiple sub-indices called shards ● Shards are then directly mapped to Lucene indices
  • 20. Data Organization: Documents ● Documents are JSON structures that hold a collection of fields and values ○ Fields are key-value pairs that make up a document ○ Fields can be of several different types such as numbers, text, keywords, geo points, etc ● Typically documents in an index can have similar content
  • 21. Data Organization: Shards ● An index is broken up into one or more smaller units called shards ● Each shard maps to an underlying Lucene unit called index ○ In other words, each index is mapped to one or more Lucene indices aka shards ● The number of shards per index is a configurable parameter and has major implications on the performance of the cluster ● Search operations are performed at the shard level and having multiple shards help with increasing the search speed ● Increasing the number of shards increases the cluster state information which means more resources will be needed to manage ○ General practice is to keep the shard size between 30 and 50GB
  • 22. Data Organization: Primary/Replica Shards ● To guard against data loss, OpenSearch allows configuring replicas ● As the index is stored in shards, configuring replicas cause replica shards to be created and stored ● OpenSearch tries to allocate replica shards to nodes other than the ones where the primary shard resides ● Number of replicas is an index-level setting and can be changed at any time ● With replicas, a node failure doesn’t lead to data loss or a data unavailability ○ Data can still be served from the replica copies ● Replica shards come with a price ○ Storing replicas require additional storage space ○ Can slow down indexing as data needs to be indexed in to both primary and replica shards
  • 23. © Instaclustr Pty Limited, 2021 OpenSearch: Document Indexing
  • 24. ● Input data to Elasticsearch/OpenSearch is analyzed and tokenized before it gets stored ○ OpenSearch also stores the original document in a special field called the _source ● Analyzers and normalizers convert the input fields into a sequence of terms which then gets stored in the Lucene inverted index ● Pre-built analyzers and normalizers are available for common use cases ○ Standard analyzer breaks text into grammar-based tokens ○ whitespace analyzer breaks text into terms based on whitespace Document Indexing: Basics
  • 25. ● An analyzer is a combination of character filters, tokenizers, and token filters ● Custom analyzers can be built using the appropriate set of filters and tokenizers. Document Indexing: Analyzers
  • 26. ● Character filters pre-process the input text before forwarding it to the tokenizer ● They work by adding, removing, or changing characters in the input text ● For example, the built-in HTML strip character filter strips HTML elements and decodes HTML entities ● Multiple character filters can be specified and they will be applied in order Document Indexing: Character Filters
  • 27. ● Tokenizers convert the input stream of characters into tokens based on certain criteria ○ For instance, the standard tokenizer breaks text into tokens based on word boundaries and also removes punctuation ○ The whitespace tokenizer breaks text into tokens at whitespaces ● Token filters post-process the tokens from the tokenizer ○ Tokens can be added, removed, or modified ○ For example, the ASCII folding filter will convert Unicode characters to the closest ASCII equivalent ○ The stemming token filter applies stemming rules to convert words to their root form Document Indexing: Tokenizers and Token Filters
  • 28. ● Text Type ○ Primarily used to index human-generated text such as tweets, social media posts, book contents, product descriptions ○ Text fields are particularly useful for performing phrase queries, fuzzy queries, etc ● Keyword Type ○ Typically used for indexing structured content such as names, ids, ISBN, categories, etc ○ Keyword fields are particularly useful for sorting, aggregations and running scripts as they are stored in a columnar format Document Indexing: Field Data Types
  • 29. ● Numeric Type ○ Used for numeric data such as integers, unsigned integers, floats ○ When choosing a numeric type, the smallest type that could fit the input range should be chosen to conserve storage space ○ Numeric fields are stored as BKD trees ● Geo Point Type ○ Geo point is used to represent latitude and longitude data ○ With geo points, queries that rely on location, distance can be performed ○ BKD fields are used to store geo points. Document Indexing: More Field Data Types
  • 30. © Instaclustr Pty Limited, 2021 OpenSearch: Document Searching
  • 31. ● OpenSearch uses a distributed search algorithm to match documents against the input ● Search can be exact match based such as keyword searches or relevancy based such as text searches ● OpenSearch focuses more on search speed than accuracy. The level of required accuracy is typically configurable ● OpenSearch provides a near real-time search whereby all documents will be available for search except for the most recently indexed documents that have not been refreshed. By default documents are refreshed every sec Document Searching: Basics
  • 32. ● OpenSearch distributed search algorithm uses a query and fetch phase ● Query phase ○ Query sent to all the shards associated with the index. Shards can be primary or replica ○ Each shard will run the search locally and return the results. ○ Results only contain the document ids, scores, and other relevant metadata but not the actual document ● Fetch Phase ○ Query results from all the shards are ordered to form the final set of results. ○ A fetch is performed to rerieve the actual documents from the nodes Document Searching: Search Phases
  • 33. ● Relevancy searching such as searching for words against text fields involve scoring to determine which documents are the closese match ● Document scoring is achieved by the similarity module. The default similarity is BM25 Document Searching: Scoring Algorithm Score = TF * IDF * Norm TF: How frequently the given term appears in the field. The higher the number of times the term appears in the document, the more likely the document is to be relevant. IDF: How frequently the term appears across all documents in the index. If it appears more commonly, then the term is less relevant Norm: Length normalization. For the same term frequency, a shorter field is more relevant than a longer field.
  • 34. © Instaclustr Pty Limited, 2021 OpenSearch: Aggregations
  • 35. ● OpenSearch is not only a search engine but also have built-in, advanced analytics capabilities ● Aggregations allow filtering and categorizing documents, calculate metrics, and build aggregation pipelines by combining multiple aggregations ● OpenSearch has support for many aggregation types which can be calssified as ○ Metrics Aggregations ○ Bucket Aggregations ○ Pipeline Aggregations Aggregations: Basics
  • 37. ● Bucket aggregations categorize the matching set of documents into buckets based on a bucketing criteria ● Bucketing criteria could be based on unique values of a field (terms aggregation), date range (date histogram) aggregation, etc ● Bucket aggregations can be used to ○ paginate all buckets (composite aggregation) ○ provide faceting ○ act as inputs to metric aggregations Aggregations: Bucket Aggregations
  • 38. ● Metrics aggregations calculate metrics on the values generated from the documents. The values could be specific fields in the documents being aggregated or generated dynamically through scripts ● They can be included as sub-aggregations to bucket aggregations and will produce metrics per aggregated bucket ● Numeric metric aggregations produce numeric metrics such as max, min, sum, and average values ● Some metric aggregations do produce output that are non-numeric. A good example is the Top hits aggregation. Used as a sub-aggregation, it produces top matching documents per bucket. Aggregations: Metric Aggregations
  • 39. ● Pipeline aggregations can be used to compute metrics and they act on the output of other aggregations making it possible to build a chain of aggregations ● Pipeline aggregations can be further categorized as parent and sibling pipeline aggregations ○ Parent pipeline aggregations compute new aggregations based on the output of the parent aggregation ○ Sibling pipeline aggregations compute new aggregations based on output from one or more sibling aggregations Aggregations: Pipeline Aggregations
  • 40. © Instaclustr Pty Limited, 2021 OpenSearch: Eco System
  • 41. © Instaclustr Pty Limited, 2021 Seth Muthukaruppan Seth.Muthukaruppan@netapp.com linkedin.com/Seth.Muthukaruppan Questions & Comments

Notas do Editor

  1. Seth Muthukaruppan