SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Introduction to
By Melvyn Peignon
What will I cover?
- Company and products presentation
- Elasticsearch architecture
- Presentation of Kibana
- Presentation of the search API
- Analyzer
- TF/IDF and relevance
- Elasticsearch use case
- Conclusion
Elastic
Founded in 2012
- Is behind:
- Kibana
- Elasticsearch
- Logstash
- Beats
What is elasticsearch?
- Full text search engine
- Based on Lucene
- Highly available
- Distributed
- Scalable
- RESTful
- Open Source
Shay
Bannon
Trending between search-engine (ES is blue)
How do they make money?
CRUD
CREATE
READ
UPDATE
DELETE
Some concepts to know
- Near real time (NRT)
- Cluster
- Node
- Index
- Document
- Shards and Replicas
Documents, Types, indexes
- An index is a collection of documents that share similar
properties.
- A document is the basic piece of information that can be
indexed.
- A type is a logical partition of the data in your index
Cluster, Nodes, Shards and Replicas
Cluster
Node 1
S1 S2
S3 S4
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2
S3 S4S1 S2
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2
S3 S4S1 S2
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Ping
PongPing
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Responsibilities of the master
- Cluster health
- All the creation of index
- Repartition of the Shards
- Repartition of the Replicas
Cluster recommendation
- Your servers in the same data center
- Your machines on different Rack
- Keeping at least 3 eligible master node (Quorum of 2 is 2)
What’s Kibana?
- Another elastic product
- A tool allowing you to communicate in a more “human”
way to your elasticsearch
- A product that allow you to do dashboard and data
visualization
Let’s go for a demonstration
Demonstration done on Kibana
Query can be found on Github:
The analyzer
{“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]}
Standard Analyzer
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0],
“wood”: [id_0], “probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
Standard Analyzer
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0], “probability”:[id_1],
“complete”:[id_1], “guide”:[id_1]}
[id_0, id_1]
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0],
“probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
[]
The english analyzer
English Analyzer
{“walk”: [id_0], “wood”: [id_0]}
The english analyzer
{ “walk”: [id_0], “wood”: [id_0]}
[]
What is relevance?
Two theories to know:
- Boolean model
- Space vector model
Boolean model
O0 = “Eric is ... always feeding”
O1 = “Jherez is ... with the friends”
….
O6 = “Manage Idea… to Melvyn)”
QT= {“lab”, “manager”} QO = “OR”
T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”:
feeding}
D = {D0, D1, …, D6}
D0 = {Eric, is, …, feeding}
D1 = {Jherez, is, …, friends}
D6 = {Manage, idea, …,
Melvyn}
S1 = {D0, D1, D6}
S2 = {D0, D6}
SF = S1 ∪ S2 = S1
Space vector model
S1 = {D0, D1, D6}
T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0)
T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0)
T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)
Weight of a token in a document
- Term frequency
TF = √Frequency
- Inverse Document Frequency
IDF = 1 + log(1/ (docFrequency + 1))
- Field length
FL = 1 / √TokenInField
Weight = TF x IDF x FL
Relevance
Vq = [1, 1.47]
V0 = [0.81, 0.85]
V1 = [0.37, 0]
V6 = [0.8, 1.2]
Relevance(Vq, Vx) = cos(Vq, Vx) =
(Vq . Vx) / (॥Vq॥.॥Vx॥)
Let’s Kaggle with elasticsearch
https://www.kaggle.com/c/whats-cooking
Results of our “Classifier”
Explanation of the methodology:
http://melvyn.pythonanywhere.com/posts/1/
Last advices?
- Mapping (I highly recommend having a mapping. You cannot update the type
defined in a field in the mapping)
- Elasticsearch as a database (I prefer having both, easier for reindexation,
having a back up, do my search and analytics on ES and use my database for
identification, etc ...)
- Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but
nice to have if you wanna do a quick implementation for a POC)
Hope you enjoyed the presentation!
Thank you for your attention!
Questions?

Mais conteúdo relacionado

Mais procurados

Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
Fadel Chafai
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
Vineet .
 

Mais procurados (20)

Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Kibana overview
Kibana overviewKibana overview
Kibana overview
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data Analytics
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 

Semelhante a Introduction to elasticsearch

Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 

Semelhante a Introduction to elasticsearch (20)

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with java
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Libsys 7 to koha
Libsys 7 to kohaLibsys 7 to koha
Libsys 7 to koha
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
Scaling Dropbox
Scaling DropboxScaling Dropbox
Scaling Dropbox
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Introduction to elasticsearch

  • 2. What will I cover? - Company and products presentation - Elasticsearch architecture - Presentation of Kibana - Presentation of the search API - Analyzer - TF/IDF and relevance - Elasticsearch use case - Conclusion
  • 3. Elastic Founded in 2012 - Is behind: - Kibana - Elasticsearch - Logstash - Beats
  • 4. What is elasticsearch? - Full text search engine - Based on Lucene - Highly available - Distributed - Scalable - RESTful - Open Source Shay Bannon
  • 6. How do they make money?
  • 8. Some concepts to know - Near real time (NRT) - Cluster - Node - Index - Document - Shards and Replicas
  • 9. Documents, Types, indexes - An index is a collection of documents that share similar properties. - A document is the basic piece of information that can be indexed. - A type is a logical partition of the data in your index
  • 10. Cluster, Nodes, Shards and Replicas Cluster Node 1 S1 S2 S3 S4
  • 11. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 S3 S4S1 S2
  • 12. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 S3 S4S1 S2
  • 13. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 14. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 15. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3 Ping PongPing
  • 16. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 17. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 18. Responsibilities of the master - Cluster health - All the creation of index - Repartition of the Shards - Repartition of the Replicas
  • 19. Cluster recommendation - Your servers in the same data center - Your machines on different Rack - Keeping at least 3 eligible master node (Quorum of 2 is 2)
  • 20. What’s Kibana? - Another elastic product - A tool allowing you to communicate in a more “human” way to your elasticsearch - A product that allow you to do dashboard and data visualization
  • 21.
  • 22. Let’s go for a demonstration
  • 23. Demonstration done on Kibana Query can be found on Github:
  • 24. The analyzer {“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]} Standard Analyzer
  • 25. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} Standard Analyzer
  • 26. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} [id_0, id_1]
  • 27. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} []
  • 28. The english analyzer English Analyzer {“walk”: [id_0], “wood”: [id_0]}
  • 29. The english analyzer { “walk”: [id_0], “wood”: [id_0]} []
  • 30. What is relevance? Two theories to know: - Boolean model - Space vector model
  • 31. Boolean model O0 = “Eric is ... always feeding” O1 = “Jherez is ... with the friends” …. O6 = “Manage Idea… to Melvyn)” QT= {“lab”, “manager”} QO = “OR” T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”: feeding} D = {D0, D1, …, D6} D0 = {Eric, is, …, feeding} D1 = {Jherez, is, …, friends} D6 = {Manage, idea, …, Melvyn} S1 = {D0, D1, D6} S2 = {D0, D6} SF = S1 ∪ S2 = S1
  • 32. Space vector model S1 = {D0, D1, D6} T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0) T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0) T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)
  • 33. Weight of a token in a document - Term frequency TF = √Frequency - Inverse Document Frequency IDF = 1 + log(1/ (docFrequency + 1)) - Field length FL = 1 / √TokenInField Weight = TF x IDF x FL
  • 34. Relevance Vq = [1, 1.47] V0 = [0.81, 0.85] V1 = [0.37, 0] V6 = [0.8, 1.2] Relevance(Vq, Vx) = cos(Vq, Vx) = (Vq . Vx) / (॥Vq॥.॥Vx॥)
  • 35. Let’s Kaggle with elasticsearch https://www.kaggle.com/c/whats-cooking
  • 36. Results of our “Classifier” Explanation of the methodology: http://melvyn.pythonanywhere.com/posts/1/
  • 37. Last advices? - Mapping (I highly recommend having a mapping. You cannot update the type defined in a field in the mapping) - Elasticsearch as a database (I prefer having both, easier for reindexation, having a back up, do my search and analytics on ES and use my database for identification, etc ...) - Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but nice to have if you wanna do a quick implementation for a POC)
  • 38. Hope you enjoyed the presentation! Thank you for your attention! Questions?