The slightly verbose slides accompanying my introductory ElasticSearch talk at Utrecht.rb
These slides are outdated, see https://speakerdeck.com/timonv/youre-not-using-elasticsearch
2. About me
•
CTO @ Tolq
•
Freelance hacker and consultant
•
Used ElasticSearch as a solution for Translation
Memory, text author recommendation and as a
spam filter.
3. What is it?
•
Full text search engine based on Apache Lucene
•
Cloud in mind, built for speeed
•
Easy to use JSON api
•
A *lot* of room for custom solutions
4. Why use it (vs solr)?
•
Cloud based setup out of the box. Setting this up is
super easy. That means automagic replication,
sharding and bonus mapreduce.
•
Indexes happen in a few seconds (vs minutes in solr)
•
Again, easy Json API
•
Nosql, mappings can be generated on the fly
•
Well scriptable and customisable for fancy
aggregation or dynamic analysing
5. O M G W H AT I S
LUCENE
•
Java library for doing full text searches by
Apache
•
Just a library, by no means a solution in itself
(although you can)
•
More in depth, the search works via a terms
indexing algorithm. Scoring is not only based on
occurrence, but also on uniqueness
6. Making a query
•
You send json to a _search endpoint on either an index, with maybe
a type
This is a basic full text search with a response:
!
GET localhost:9200/example/peanuts/_search
{ ‘query’: { text: { ‘my_field’: ’many search terms’ }}}
{ took: 5, timed_out: false,
_shards: { total: 5, successful: 5, failed: 0 },
hits: [
{ _index: “example”,
_type: “peanuts”,
_score: 0.9,
_source: { …data }
}
]
}
}
7. Other types of queries
•
Terms, full text, boolean, fuzzy, geolocation and
lots more variants
•
You can also do filters to narrow down results
8. Analysing
•
Before data is indexed or queries are made its
analysed
•
At this point, terms are scored
•
But before that, the data is normalised. This
usually includes stemming and stop word
removal. With support for over 30 languages.
Woot!
•
You can create your own analysers.
9. Facet me awesome
•
Facets allow you to do aggregation over your
search.
{ “query”: … }
{ “facets”: {
“my_facets”: {
“terms”: { “name”: “utrechtrb” }
}
}
}
Facets will be deprecated in 1.0 in favour of ‘aggregations’
11. H o w To l q u s e s
ElasticSearch
•
We use ElasticSearch as a Translation Memory.
In our case, that means we suggest other
relevant translations while a translator is
translating.
•
By storing original texts and dynamically
adjusting the analyser to the correct languages,
we can suggest similar translations.