SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
.
.
.
elasto mania
@about_andrefs
2014
.
.
.
what is it?
...
.
Elasticsearch is a flexible and powerful
open source, distributed, real-time search
and analytics engine.
elasticsearch.org/overview/
.
.
.
talk disclaimers
• introduction to ES (sorry, no heavy stuff)
• focused on Elasticsearch itself (not so much
on integration with Kibana, Logstash, etc)
• heavily based on Andrew Cholakian’s book
Exploring Elasticsearch
• Tiririca method
• not all disclaimers have necessarily been
disclaimed
.
.
.
getting
started
.
.
.
buzzword driven slide
• real time analytics
• conflict management
• per-operation
persistence
• document oriented
• build on top of
Apache Lucene™
• Apache 2 Open
Source License
• real time data
• distributed
• multi-tenancy
• RESTful API
• schema free
• full text search
• high availability
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
...
.
auto-complete a search box with previously search
issues and allowing misspellings
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
...
.
auto-complete a search box with previously search
issues and allowing misspellings
...
.
storing large quantities of semi-structured (JSON)
data in a distributed fashion, with redundancy
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
...
.
execute operations transactionally with rollback
support
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
...
.
execute operations transactionally with rollback
support
...
.guarantee item uniqueness across multiple fields
.
.
.
history
2004: Shay Bannon creates Compass (Java
search engine framework)
2009: big parts of Compass would need to
be rewritten to release a third version
focused on scalability
Feb 2010: Elasticsearch 0.4.0
Mar 2012: Elasticsearch 0.19.0
Apr 2013: Elasticsearch 0.90.0
Feb 2014: Elasticsearch 1.0.0
Mar 2014: Elasticsearch 1.1.0
.
.
.
the basics
.
.
.
JSON over HTTP
• primary data format for ES is JSON
• main protocol consists of HTTP requests with
JSON payload
• _id is unique, and generated automatically if
unassigned
• internally, JSON is converted flat fields for
Lucene’s key/value API
.
.
.
mnemonic
relational DB Elasticsearch
database index
table type
schema definition mapping
column field
row document
elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
.
.
.
documents
• like a row in a table in an RDB
• JSON objects
• each is stored in an index, has a type and an
id
• each contains zero or more fields
.
.
.
sample document
.
PUT /music/songs/1
..
.
{
”_id” : 1,
”title” : ”The Vampyre of Time and Memory”,
”author” : ”Queens of the Stone Age”,
”album” : {
”title” : ”...Like Clockwork”,
”year” : 2013,
”track” : 3,
},
”genres” : [”alternative rock”,”piano rock”]
}
.
.
.
fields
• key-value pairs
• value can be a scalar or a nested structure
• each field has a type, defined in a mapping
.
.
.
types
type definition
string text
integer 32-bit integers
long 64-bit integers
float IEEE floats
double double precision floats
boolean true or false
date UTC Date/Time
geo_point latitude/longitude
null the value null
array any field
object type ommited, properties field
nested separate document
.
.
.
mapping
• defines the types of a document’s fields
• and the way they are indexed
• scopes _ids (documents with different types
may have identical _ids)
• defines a bunch of index-wide settings
• can be defined explicitly or automatically
when a document is indexed
.
.
.
sample mapping
.
PUT /music/songs/_mapping
..
.
{
”song” : {
”properties” : {
”title” : { ”type” : ”string” },
”author” : { ”type” : ”string” },
”album” : {
”properties” : {
”title” : { ”type” : ”string” },
”year” : { ”type” : ”integer” },
”number” : { ”type” : ”integer” }
}
},
”genres” : { ”type” : ”string” }
}
}
}
.
.
.
indexes
• like a database in an RDB
• has a mapping which defines types
• logical namespace
• maps to one or more primary shards
• can have zero or more replica shards
.
.
.
CRUD I
.
PUT /music...
.
PUT /music/songs/_mapping
..
.
{
”song” : {
”properties” : {
...
}
}
}
.
.
.
CRUD II
.
PUT /music/songs/1
..
.
{
”title” : ”The Vampyre of Time and Memory”,
...
}
.
GET /music/songs/1
...
.
POST /music/songs/1/_update
..
.{ ”doc” : { ”year” : 2014 }}
.
DELETE /music/songs/1
...
.
.
.
search
.
.
.
search fundamentals
1. boolean search
2. scoring
.
.
.
ES Search API
Includes:
• Query DSL
• Filter API
• Facet API
• Sort API
• …
...
.
• /index/_search
• /index/type/_search
.
.
.
filters
filtered queries: nested in the query field; affect
both query results and facet counts
top-level filters: specified at the root of search,
will only affect queries
facet level filters: pre-filters data before being
aggregated, only affects one specific
facet
.
.
.
search sample I
.
POST /music/_search
..
.
{ ”query” : {
”fuzzy” : { ”title” : ”vampires” }
}}
.
.
.
search sample II
.
POST /planet/_search
..
.
{
”from” : 0,
”size” : 15,
”query” : { ”match_all” : {} },
”sort” : { ”handle” : ”desc” },
”filter” : { ”term” : { ”_all” : ”coding” }},
”facets” : {
”hobbies” : {
”terms” : { ”field” : ”hobbies” }
}
}
}
.
.
.
analysis
• performed when documents are added
• manipulates data to ensure better indexing
• 3 steps:
1. character filtering
2. tokenization
3. token filtering
• distinct analyzers for each field
• multiple analyzers for each field
• custom analyzers
.
.
.
analyzers
.
PUT /music/songs/_mapping
..
.
{ ”song” : { ”properties” : {
”title” : {
”type” : ”string”,
”fields” : {
”title_exact” : { ”type” : ”string”,
”index” : ”not_analyzed” },
”title_simple”: { ”type” : ”string”,
”analyzer”: ”simple” },
”title_snow” : { ”type” : ”string”,
”analyzer”: ”snowball” }
}
},
...
}}}
.
.
.
highlighting
.
POST /publications/books/_search
..
.
{
”query” : {
”match” : { ”text” : ”spaceship” }
},
”fields” : [”title”, ”isbn”],
”highlight” : {
”fields” : {
”text” : { ”number_of_fragments” : 3 }
}
}
}
.
.
.
search phrases
.
POST /publications/books/_search
..
.
{
”query” : {
”match_phrase” : { ”text” : ”laser beam” }
},
”fields” : [”title”, ”isbn”],
”highlight” : {
”fields” : {
”text” : { ”number_of_fragments” : 3 }
}
}
}
.
.
.
going wild
.
.
.
aggregations
Unit of work that builds analytic information over a
set of documents
.
bucketing..
.
Documents are evaluated and placed into buckets
according to previously defined criteria
.
metric..
.
Keep track of metrics which are computed over a
set of documents
.
.
.
percolations
.
.
.
more stuff
• routing
• uri search
• suggesters
• count API
• validate API
• explain API
• more like this API
• …
.
.
.
scalability
.
.
.
tools
.
.
.
Logstash
.
.
.
Kibana
.
.
.
Marvel
.
.
.
what about
now
.
.
.
new features..
.
2014..
.
Apr 3rd
: count
Mar 6th
: Tribe nodes
Jan 17th
: the cat API
Jan 29th
: Marvel
Jan 21th
: snapshot & restore
.
2013..
.
Sep 24th
: official Elasticsearch clients for Ruby,
Python, PHP and Perl
Nov 28th
: Lucene 4.x doc values
…:
.
.
.
go read a book
• Exploring Elasticsearch, Andrew Cholakian
• Elasticsearch – The Definitive Guide,
Clinton Gormley, Zachary Tong
.
.
.
getting in touch
• https://github.com/elasticsearch
• @elasticsearch
• irc.freenode.org #elasticsearch
• irc.perl.org #elasticsearch
• http://www.elasticsearch.org/blog/
• Elasticsearch User mailing list
.
.
.
references
• Elastic Search Mega Manual
• http://solr-vs-elasticsearch.com/
• Elastic Search in Production
• Exploring Elasticsearch, Andrew Cholakian
• Elasticsearch – The Definitive Guide,
Clinton Gormley, Zachary Tong
.
.
.
job’s done
questions?

Mais conteúdo relacionado

Mais procurados

Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchangeChristoph Santschi
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 

Mais procurados (15)

Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchange
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 

Destaque

Manic depression psych pwrpt
Manic depression psych pwrptManic depression psych pwrpt
Manic depression psych pwrptRikha Brown
 
El pasado en_blanco_y_negro
El pasado en_blanco_y_negroEl pasado en_blanco_y_negro
El pasado en_blanco_y_negrofilipj2000
 
Editing images in the WordPress media manager
Editing images in the WordPress media managerEditing images in the WordPress media manager
Editing images in the WordPress media managerJeremy Dawes
 
Kms 6 7 Newfeatures En
Kms 6 7 Newfeatures EnKms 6 7 Newfeatures En
Kms 6 7 Newfeatures Ensrrm7
 
Experiential Education: Learning Through Co-curricular Leadership Experiences...
Experiential Education: Learning Through Co-curricular Leadership Experiences...Experiential Education: Learning Through Co-curricular Leadership Experiences...
Experiential Education: Learning Through Co-curricular Leadership Experiences...TEDx Adventure Catalyst
 
Windows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek YgWindows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek Ygekinozcicekciler
 
Tianmen mountains
Tianmen mountainsTianmen mountains
Tianmen mountainsfilipj2000
 
Pps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and morePps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and morefilipj2000
 
乘科技風潮 學術生涯規劃
乘科技風潮 學術生涯規劃乘科技風潮 學術生涯規劃
乘科技風潮 學術生涯規劃tpliang
 
YU Connect student affairs symposium july 11
YU Connect student affairs symposium july 11YU Connect student affairs symposium july 11
YU Connect student affairs symposium july 11TEDx Adventure Catalyst
 
Home Staging A Service That Really Works Case Study May 2011
Home Staging A Service That Really Works   Case Study May 2011Home Staging A Service That Really Works   Case Study May 2011
Home Staging A Service That Really Works Case Study May 2011juliestevens
 
Building your own CPAN with Pinto
Building your own CPAN with PintoBuilding your own CPAN with Pinto
Building your own CPAN with Pintoandrefsantos
 

Destaque (17)

Manic depression psych pwrpt
Manic depression psych pwrptManic depression psych pwrpt
Manic depression psych pwrpt
 
El pasado en_blanco_y_negro
El pasado en_blanco_y_negroEl pasado en_blanco_y_negro
El pasado en_blanco_y_negro
 
Non genuine savings policy - fact sheet
Non genuine savings policy - fact sheetNon genuine savings policy - fact sheet
Non genuine savings policy - fact sheet
 
Editing images in the WordPress media manager
Editing images in the WordPress media managerEditing images in the WordPress media manager
Editing images in the WordPress media manager
 
Kms 6 7 Newfeatures En
Kms 6 7 Newfeatures EnKms 6 7 Newfeatures En
Kms 6 7 Newfeatures En
 
Experiential Education: Learning Through Co-curricular Leadership Experiences...
Experiential Education: Learning Through Co-curricular Leadership Experiences...Experiential Education: Learning Through Co-curricular Leadership Experiences...
Experiential Education: Learning Through Co-curricular Leadership Experiences...
 
La Excepción
La ExcepciónLa Excepción
La Excepción
 
Windows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek YgWindows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek Yg
 
Tianmen mountains
Tianmen mountainsTianmen mountains
Tianmen mountains
 
Problema
ProblemaProblema
Problema
 
Pps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and morePps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and more
 
乘科技風潮 學術生涯規劃
乘科技風潮 學術生涯規劃乘科技風潮 學術生涯規劃
乘科技風潮 學術生涯規劃
 
Asset finance fact sheet email
Asset finance   fact sheet emailAsset finance   fact sheet email
Asset finance fact sheet email
 
YU Connect student affairs symposium july 11
YU Connect student affairs symposium july 11YU Connect student affairs symposium july 11
YU Connect student affairs symposium july 11
 
Home Staging A Service That Really Works Case Study May 2011
Home Staging A Service That Really Works   Case Study May 2011Home Staging A Service That Really Works   Case Study May 2011
Home Staging A Service That Really Works Case Study May 2011
 
Chine
ChineChine
Chine
 
Building your own CPAN with Pinto
Building your own CPAN with PintoBuilding your own CPAN with Pinto
Building your own CPAN with Pinto
 

Semelhante a Elasto Mania

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Использование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуИспользование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуOlga Lavrentieva
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMJBug Italy
 
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIMEElasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIMEPiotr Pelczar
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearchdnoble00
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Amazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Web Services
 
Delhi elasticsearch meetup
Delhi elasticsearch meetupDelhi elasticsearch meetup
Delhi elasticsearch meetupBharvi Dixit
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHPMike Lively
 

Semelhante a Elasto Mania (20)

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Использование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуИспользование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайту
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Language Search
Language SearchLanguage Search
Language Search
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIMEElasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
ElasticSearch Basics
ElasticSearch Basics ElasticSearch Basics
ElasticSearch Basics
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Amazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Elasticsearch and Databases
Amazon Elasticsearch and Databases
 
Delhi elasticsearch meetup
Delhi elasticsearch meetupDelhi elasticsearch meetup
Delhi elasticsearch meetup
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 

Mais de andrefsantos

Identifying similar text documents
Identifying similar text documentsIdentifying similar text documents
Identifying similar text documentsandrefsantos
 
Cleaning plain text books with Text::Perfide::BookCleaner
Cleaning plain text books with Text::Perfide::BookCleanerCleaning plain text books with Text::Perfide::BookCleaner
Cleaning plain text books with Text::Perfide::BookCleanerandrefsantos
 
Poster - Bigorna, a toolkit for orthography migration challenges
Poster - Bigorna, a toolkit for orthography migration challengesPoster - Bigorna, a toolkit for orthography migration challenges
Poster - Bigorna, a toolkit for orthography migration challengesandrefsantos
 
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...andrefsantos
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
 
Detecção e Correcção Parcial de Problemas na Conversão de Formatos
Detecção e Correcção Parcial de Problemas na Conversão de FormatosDetecção e Correcção Parcial de Problemas na Conversão de Formatos
Detecção e Correcção Parcial de Problemas na Conversão de Formatosandrefsantos
 
Bigorna - a toolkit for orthography migration challenges
Bigorna - a toolkit for orthography migration challengesBigorna - a toolkit for orthography migration challenges
Bigorna - a toolkit for orthography migration challengesandrefsantos
 

Mais de andrefsantos (10)

Slides
SlidesSlides
Slides
 
Identifying similar text documents
Identifying similar text documentsIdentifying similar text documents
Identifying similar text documents
 
Cleaning plain text books with Text::Perfide::BookCleaner
Cleaning plain text books with Text::Perfide::BookCleanerCleaning plain text books with Text::Perfide::BookCleaner
Cleaning plain text books with Text::Perfide::BookCleaner
 
Poster - Bigorna, a toolkit for orthography migration challenges
Poster - Bigorna, a toolkit for orthography migration challengesPoster - Bigorna, a toolkit for orthography migration challenges
Poster - Bigorna, a toolkit for orthography migration challenges
 
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
Detecção e Correcção Parcial de Problemas na Conversão de Formatos
Detecção e Correcção Parcial de Problemas na Conversão de FormatosDetecção e Correcção Parcial de Problemas na Conversão de Formatos
Detecção e Correcção Parcial de Problemas na Conversão de Formatos
 
Bigorna - a toolkit for orthography migration challenges
Bigorna - a toolkit for orthography migration challengesBigorna - a toolkit for orthography migration challenges
Bigorna - a toolkit for orthography migration challenges
 
Bigorna
BigornaBigorna
Bigorna
 
Mojolicious lite
Mojolicious liteMojolicious lite
Mojolicious lite
 

Último

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Último (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Elasto Mania

  • 2.
  • 3. . . . what is it? ... . Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. elasticsearch.org/overview/
  • 4. . . . talk disclaimers • introduction to ES (sorry, no heavy stuff) • focused on Elasticsearch itself (not so much on integration with Kibana, Logstash, etc) • heavily based on Andrew Cholakian’s book Exploring Elasticsearch • Tiririca method • not all disclaimers have necessarily been disclaimed
  • 6. . . . buzzword driven slide • real time analytics • conflict management • per-operation persistence • document oriented • build on top of Apache Lucene™ • Apache 2 Open Source License • real time data • distributed • multi-tenancy • RESTful API • schema free • full text search • high availability
  • 7. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results
  • 8. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word
  • 9. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings
  • 10. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings ... . storing large quantities of semi-structured (JSON) data in a distributed fashion, with redundancy
  • 11. . . . don’t use cases ... .calculate how many items are le in an inventory
  • 12. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices
  • 13. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support
  • 14. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support ... .guarantee item uniqueness across multiple fields
  • 15. . . . history 2004: Shay Bannon creates Compass (Java search engine framework) 2009: big parts of Compass would need to be rewritten to release a third version focused on scalability Feb 2010: Elasticsearch 0.4.0 Mar 2012: Elasticsearch 0.19.0 Apr 2013: Elasticsearch 0.90.0 Feb 2014: Elasticsearch 1.0.0 Mar 2014: Elasticsearch 1.1.0
  • 17. . . . JSON over HTTP • primary data format for ES is JSON • main protocol consists of HTTP requests with JSON payload • _id is unique, and generated automatically if unassigned • internally, JSON is converted flat fields for Lucene’s key/value API
  • 18. . . . mnemonic relational DB Elasticsearch database index table type schema definition mapping column field row document elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
  • 19. . . . documents • like a row in a table in an RDB • JSON objects • each is stored in an index, has a type and an id • each contains zero or more fields
  • 20. . . . sample document . PUT /music/songs/1 .. . { ”_id” : 1, ”title” : ”The Vampyre of Time and Memory”, ”author” : ”Queens of the Stone Age”, ”album” : { ”title” : ”...Like Clockwork”, ”year” : 2013, ”track” : 3, }, ”genres” : [”alternative rock”,”piano rock”] }
  • 21. . . . fields • key-value pairs • value can be a scalar or a nested structure • each field has a type, defined in a mapping
  • 22. . . . types type definition string text integer 32-bit integers long 64-bit integers float IEEE floats double double precision floats boolean true or false date UTC Date/Time geo_point latitude/longitude null the value null array any field object type ommited, properties field nested separate document
  • 23. . . . mapping • defines the types of a document’s fields • and the way they are indexed • scopes _ids (documents with different types may have identical _ids) • defines a bunch of index-wide settings • can be defined explicitly or automatically when a document is indexed
  • 24. . . . sample mapping . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string” }, ”author” : { ”type” : ”string” }, ”album” : { ”properties” : { ”title” : { ”type” : ”string” }, ”year” : { ”type” : ”integer” }, ”number” : { ”type” : ”integer” } } }, ”genres” : { ”type” : ”string” } } } }
  • 25. . . . indexes • like a database in an RDB • has a mapping which defines types • logical namespace • maps to one or more primary shards • can have zero or more replica shards
  • 26. . . . CRUD I . PUT /music... . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ... } } }
  • 27. . . . CRUD II . PUT /music/songs/1 .. . { ”title” : ”The Vampyre of Time and Memory”, ... } . GET /music/songs/1 ... . POST /music/songs/1/_update .. .{ ”doc” : { ”year” : 2014 }} . DELETE /music/songs/1 ...
  • 30. . . . ES Search API Includes: • Query DSL • Filter API • Facet API • Sort API • … ... . • /index/_search • /index/type/_search
  • 31. . . . filters filtered queries: nested in the query field; affect both query results and facet counts top-level filters: specified at the root of search, will only affect queries facet level filters: pre-filters data before being aggregated, only affects one specific facet
  • 32. . . . search sample I . POST /music/_search .. . { ”query” : { ”fuzzy” : { ”title” : ”vampires” } }}
  • 33. . . . search sample II . POST /planet/_search .. . { ”from” : 0, ”size” : 15, ”query” : { ”match_all” : {} }, ”sort” : { ”handle” : ”desc” }, ”filter” : { ”term” : { ”_all” : ”coding” }}, ”facets” : { ”hobbies” : { ”terms” : { ”field” : ”hobbies” } } } }
  • 34. . . . analysis • performed when documents are added • manipulates data to ensure better indexing • 3 steps: 1. character filtering 2. tokenization 3. token filtering • distinct analyzers for each field • multiple analyzers for each field • custom analyzers
  • 35. . . . analyzers . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string”, ”fields” : { ”title_exact” : { ”type” : ”string”, ”index” : ”not_analyzed” }, ”title_simple”: { ”type” : ”string”, ”analyzer”: ”simple” }, ”title_snow” : { ”type” : ”string”, ”analyzer”: ”snowball” } } }, ... }}}
  • 36. . . . highlighting . POST /publications/books/_search .. . { ”query” : { ”match” : { ”text” : ”spaceship” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  • 37. . . . search phrases . POST /publications/books/_search .. . { ”query” : { ”match_phrase” : { ”text” : ”laser beam” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  • 39. . . . aggregations Unit of work that builds analytic information over a set of documents . bucketing.. . Documents are evaluated and placed into buckets according to previously defined criteria . metric.. . Keep track of metrics which are computed over a set of documents
  • 41. . . . more stuff • routing • uri search • suggesters • count API • validate API • explain API • more like this API • …
  • 43.
  • 49. . . . new features.. . 2014.. . Apr 3rd : count Mar 6th : Tribe nodes Jan 17th : the cat API Jan 29th : Marvel Jan 21th : snapshot & restore . 2013.. . Sep 24th : official Elasticsearch clients for Ruby, Python, PHP and Perl Nov 28th : Lucene 4.x doc values …:
  • 50. . . . go read a book • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong
  • 51. . . . getting in touch • https://github.com/elasticsearch • @elasticsearch • irc.freenode.org #elasticsearch • irc.perl.org #elasticsearch • http://www.elasticsearch.org/blog/ • Elasticsearch User mailing list
  • 52. . . . references • Elastic Search Mega Manual • http://solr-vs-elasticsearch.com/ • Elastic Search in Production • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong