O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

08. ElasticSearch : Sorting and Relevance

628 visualizações

Publicada em

08. ElasticSearch : Sorting and Relevance

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

08. ElasticSearch : Sorting and Relevance

  1. 1. ElasticSearch Sorting and Relevance http://elastic.openthinklabs.com/
  2. 2. Sorting GET /_search { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 } } } } } GET /_search { "query" : { "constant_score" : { "filter" : { "term" : { "user_id" : 1 } } } } }
  3. 3. Sorting Sorting by Field Values GET /_search { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 }} } }, "sort": { "date": { "order": "desc" }} } "hits" : { "total" : 6, "max_score" : null, "hits" : [ { "_index" : "us", "_type" : "tweet", "_id" : "14", "_score" : null, "_source" : { "date": "2014-09-24", ... }, "sort" : [ 1411516800000 ] }, ... }
  4. 4. Sorting Multilevel Sorting GET /_search { "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ] }
  5. 5. Sorting Multilevel Sorting GET /_search { "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ] }
  6. 6. Sorting Sorting on Multivalue Fields "sort": { "dates": { "order": "asc", "mode": "min" } }
  7. 7. String Sorting and Multifields "tweet": { "type": "string", "analyzer": "english" } "tweet": { "type": "string", "analyzer": "english", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } GET /_search { "query": { "match": { "tweet": "elasticsearch" } }, "sort": "tweet.raw" }
  8. 8. What Is Relevance? ● The standard similarity algorithm used in Elasticsearch : ● Term frequency : How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention. ● Inverse document frequency : How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms. ● Field-length norm : How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field
  9. 9. What Is Relevance? Understanding the Score GET /_search?explain { "query" : { "match" : { "tweet" : "honeymoon" }} }
  10. 10. What Is Relevance? Understanding Why a Document Matched GET /us/tweet/12/_explain { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 2 }}, "must" : { "match" : { "tweet" : "honeymoon" }} } } } "failure to match filter: cache(user_id:[2 TO 2])"
  11. 11. Doc Values Intro ● Doc values are used in several places in Elasticsearch: ● Sorting on a field ● Aggregations on a field ● Certain filters (for example, geolocation filters) ● Scripts that refer to fields
  12. 12. Referensi ● ElasticSearch, The Definitive Guide, A Distrib uted Real-Time Search and Analytics Engine, Cl inton Gormely & Zachary Tong, O’Reilly

×