Query DSL in Elasticsearch is a way to perform query on elasticsearch cluster.It is rich flexible query language
We can define queries of elasticsearch in JSON format.In this presentation we will see type of query dsl and its usage.
2. AgendaAgenda
● Overview of Elasticsearch
● What is Query DSL?
● Queries VS Filters
● Type of query
● Demo
● Overview of Elasticsearch
● What is Query DSL?
● Queries VS Filters
● Type of query
● Demo
3. Overview of ElasticsearchOverview of Elasticsearch
Elasticsearch
realTime,
search &
analytics engine
open-source
distributed
multi tenancy
scales massively
high availability
schema free
restful API
JSON over HTTP
lucene based
fault tolerance
4. What is Query DSL ?What is Query DSL ?
➢ It is rich flexible query language.
➢ Elasticsearch provides a full Query DSL based on JSON to define
queries.
➢ We can think Query DSL as an AST of queries, consisting of two
types of clauses.
Leaf query clauses: It looks for a particular value in a particular
field, such as the match, term or range queries.
Compound query clauses: It wraps other leaf or compound
queries and are used to combine multiple queries in a logical
fashion.
5. Queries VS FiltersQueries VS Filters
Queries
full text search
relevance scoring
heavier
not cacheable
Filters
exact matching
binary yes / no
fast
cacheable
6. Type of queryType of query
➢ Match All Query
➢ Full text queries
➢ Term level queries
➢ Compound queries
➢ Match All Query
➢ Full text queries
➢ Term level queries
➢ Compound queries
7. Match All QueryMatch All Query
The most simple query, which matches all documents, giving
them all a _score of 1.0.
Example:
"query": {
"match_all": {}
}
8. Full text queriesFull text queries
The high-level full text queries are usually used for running full text queries
on full text fields like the body of an email.
These are full text queries:
match_query
multi_match query
common_terms query
query_string query
simple_query_string
9. Full text queries continue ..Full text queries continue ..
match_query: A family of match queries that accepts text/numerics/dates, analyzes
them, and constructs a query.
{
"query": {
"match": {
"body": {
"query": "i spent at starbucks",
"operator": "and"
}
}
}
}
multi_match:The multi_match query builds on the match query to allow multi-field
queries {
"query": {
"multi_match": {
"query": "share post",
"fields": [
"verb"
]
}
}
}
10. Full text queries continue ..Full text queries continue ..
common_terms query: The common terms query is a modern alternative to
stopwords which improves the precision and recall of search results (by taking
stopwords into account), without sacrificing performance.
query_string query:A query that uses a query parser in order to parse its content.
"common": {
"body": {
"query": "i am spent at starbucks",
"cutoff_frequency": 0.001,
"low_freq_operator": "and"
}
}
"query": {
"query_string": {
"query": "(verb:post) AND (body:i am today OR body:came to starbucks)"
}
}
11. Full text queries continue ..Full text queries continue ..
simple_query_string query: A query that uses the SimpleQueryParser to parse its
context.The simple_query_string query will never throw an exception, and discards
invalid parts of the query.
"query": {
"simple_query_string": {
"query": ""at starbucks" | today -starbucks",
"fields": [
"body"
],
"flags": "OR|NOT|PHRASE"
}
}
12. Term level queriesTerm level queries
The term-level queries operate on the exact terms that are stored in the
inverted index.These queries are usually used for structured data like
numbers, dates, and enums, rather than full text fields.
term_query
terms_query
range_query
exists_query
prefix_query
These are term level queries:
wildcard_query
regexp_query
fuzzy_query
type_query
ids_query
13. Term level queries continue….Term level queries continue….
term_query: The term query finds documents that contain the exact term
specified in the inverted index.
"term": {
"actor.postedTime": "2010-11-17T03:55:57.000Z"
}
"terms": {
"verb": [
"share",
"post"
]
}
terms_query: Filters documents that have fields that match any of the
provided terms.
14. Term level queries continue….Term level queries continue….
range_query: Matches documents with fields that have terms within a
certain range.
"range": {
"actor.friendsCount": {
"gte": 10,
"lte": 500
}
}
exists_query: Returns documents that have at least one non-null
value in the original field.
"exists": {
"field": "actor.links.href"
}
15. Term level queries continue….Term level queries continue….
prefix_query: Matches documents that have fields containing terms with a
specified prefix.
wildcard_query: Matches documents that have fields matching a wildcard
expression .
"prefix": {
"body": "rt"
}
"wildcard": {
"actor.preferredUsername": "ba*"
}
"regexp": {
"actor.preferredUsername": "ba.*lan"
}
regexp_query:The regexp query allows you to use regular expression term
queries.
16. Compound QueriesCompound Queries
Compound query: Compound queries wrap other compound or leaf
queries, either to combine their results and scores, to change their
behaviour, or to switch from query to filter context.
The queries in this group are:
constant_score query
bool query
dis_max query
function_score query
boosting query
indices query
and, or, not
filtered query
limit query
17. Compound queries continue….Compound queries continue….
dis_max query: A query that generates the union of documents produced
by its subqueries.
"dis_max": {
"queries": [
{ "term": { "verb": "share"}},
{ "term": { "verb": "post"}}
]
}
boosting query:The boosting query can be used to effectively demote results
that match a given query.
"boosting": {
"positive": {"term": { "verb": "post" } },
"negative": {
"range": {
"actor.friendsCount": {"from": 10,"to": 500 }
}
},
"negative_boost": 0.5
}
19. Compound queries continue….Compound queries continue….
constant_score query: A query which wraps another query, but executes it
in filter context. All matching documents are given the same “constant”
_score.
"constant_score": {
"filter": {
"range": {
"actor.friendsCount": {
"from": 10,
"to": 500
}
}
}
}
20. Other DSL QueriesOther DSL Queries
Joining queries: Performing full SQL-style joins in a distributed
system like Elasticsearch . Example: nested_query,has_parent
query,has_child query etc.
Geo queries: These queries are related to geo_point and
geo_shape related operations.
Specialized queries: These queries have no any group. It uses
for some specific requirement like template_query,script_query etc.
Span queries:These are typically used to implement very specific
queries on legal documents or patents.
P1.Distributed :means elasticsearch distribute our data in a cluster using shards.example laptop.
P2.scale massively: means so we can scale the ES cluster smoothly from small size to big cluster and we can scale horizontally and vertically as well.
P3.high availabilty:means elasticsearch create duplicatcy over data through replica so,when a node goes down then the other nodes in cluster replicate the primary shard of that node so your data is'n lost at all.
P3 Rest full api: ES provides rest full api so we can easly interact with es cluster and perform ES operation.
P4 Json over HTTP: means ES is JSON in and JSON out .we can write ES query in json format and ES returns result in JSON format as well.
P5. Schema free : means ES provide schema free type support if you have not defined mapping of your type then it automatically understand and gnerate mapping for your type.
P6multi tenancy:Multiple application access same index without any modification as compare to RDBMS based appliction.we can easly distribute ES index across application.Ex:Kibana,logstash.
P1.for performance perspective first perfrom filter and then perform query over filtered data.
P1.There are two type of context one is query context and other is filter contex.
P2.The behaviour of a query clause depends on whether it is used in query context or in filter context:
P4.Query context
A query clause used in query context answers the question “How well does this document match this query clause?”.it claculates score of the documents.
P5.Filter context
In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated.
P1.it is by default query if you not mention any query.
Like : index/type/_search.
P1.They understand how the field being queried is analyzed and will apply each field’s analyzer (or search_analyzer) to the query string before executing.
P2.first of all I want to diffrentiate fulltext value and exact value.
P1.The common terms query divides the query terms into two groups: more important (ie low frequency terms) and less important (ie high frequency terms which would previously have been stopwords).
P2.query string is usefull in when we pass a complex queries as URL parameters and want to perform some boolean operation over it.
P1.it supports multi field to allow perform query on multiple field at same time.
P2.simple_query_string support multiple flags to specify which parsing features should be enabled. It is specified as a |-delimited string with the flags parameter.it help we can wirte more optimized simple query.
P1.it supports multi field to allow perform query on multiple field at same time.
P2.simple_query_string support multiple flags to specify which parsing features should be enabled. It is specified as a |-delimited string with the flags parameter.it help we can wirte more optimized simple query.
P3. Missing query is depricted so we can use exists query in must_not clause of bool query instead of missing_query.
P1.The term query looks for the exact term in the field’s inverted index — it doesn’t know anything about the field’s analyzer. This makes it useful for looking up values in not_analyzed string fields, or in numeric or date fields.
P2. When we write term query in filter context it generates Bitset[1,0,1,0],which discribes that which documents match aginst this query.
P1.Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?.
P2.The performance of a regexp query heavily depends on the regular expression chosen.if it possible then we should use more prefix character to optimize regexp query.
P3.Regular expressions are dangerous because it’s easy to accidentally create an innocuous looking one that requires an exponential number of internal determinized automaton states (and corresponding RAM and CPU) for Lucene to execute.
P1.filtered query has depricated in version 2.0.0
P1. and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
P2.A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the bool query combines the scores from all matching queries, the dis_max query uses the score of the single best- matching query clause.
P3.Unlike the "NOT" clause in bool query, this still selects documents that contain undesirable terms, but reduces their overall score.
P1.The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
P2.The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.
P1.constant query: but executes it in filter context. All matching documents are given the same “constant” _score.