Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.
2. 2
This presentation and the accompanying oral presentation contain forward-looking statements, including statements
concerning plans for future offerings; the expected strength, performance or benefits of our offerings; and our future
operations and expected performance. These forward-looking statements are subject to the safe harbor provisions
under the Private Securities Litigation Reform Act of 1995. Our expectations and beliefs in light of currently
available information regarding these matters may not materialize. Actual outcomes and results may differ materially
from those contemplated by these forward-looking statements due to uncertainties, risks, and changes in
circumstances, including, but not limited to those related to: the impact of the COVID-19 pandemic on our business
and our customers and partners; our ability to continue to deliver and improve our offerings and successfully
develop new offerings, including security-related product offerings and SaaS offerings; customer acceptance and
purchase of our existing offerings and new offerings, including the expansion and adoption of our SaaS offerings;
our ability to realize value from investments in the business, including R&D investments; our ability to maintain and
expand our user and customer base; our international expansion strategy; our ability to successfully execute our
go-to-market strategy and expand in our existing markets and into new markets, and our ability to forecast customer
retention and expansion; and general market, political, economic and business conditions.
Additional risks and uncertainties that could cause actual outcomes and results to differ materially are included in
our filings with the Securities and Exchange Commission (the “SEC”), including our Annual Report on Form 10-K for
the most recent fiscal year, our quarterly report on Form 10-Q for the most recent fiscal quarter, and any
subsequent reports filed with the SEC. SEC filings are available on the Investor Relations section of Elastic’s
website at ir.elastic.co and the SEC’s website at www.sec.gov.
Any features or functions of services or products referenced in this presentation, or in any presentations, press
releases or public statements, which are not currently available or not currently available as a general availability
release, may not be delivered on time or at all. The development, release, and timing of any features or functionality
described for our products remains at our sole discretion. Customers who purchase our products and services
should make the purchase decisions based upon services and product features and functions that are currently
available.
All statements are made only as of the date of the presentation, and Elastic assumes no obligation to, and does not
currently intend to, update any forward-looking statements or statements relating to features or functions of services
or products, except as required by law.
Forward-Looking Statements
3. 3
Runtime fields in a nutshell
• Empowering all users to generate fields upon need
• Flexibility vs. performance at query time
Schema on read
A Runtime field is a field that is associated with instructions for
calculating it at query time (e.g. script). Runtime fields can be
defined in the mapping or introduced in a query. Other than that
runtime fields behave like any other field in Elasticsearch.
4. Agenda Slide
What are runtime fields?1
How will runtime fields be implemented?3
Why are runtime fields useful?2
5. Schema on write
query performance
Extract, Transform, Index
Readiness for immediate query/agg
Advantages:
● Immediate response time
● Flexibility for new docs
6. Schema on read
flexibility, cost, ingest pace
Load almost raw
Prep per query upon need
Advantages:
● Flexibility for ingested docs
● Start without data/use knowledge
● Improved ingest rate
Schema on write
query performance
Extract, Transform, Index
Readiness for immediate query/agg
Advantages:
● Immediate response time
● Flexibility for new docs
7. Runtime Fields
Elastic’s schema on read
• Instructions for calculating the
field upon need (e.g. script)
• Defined in the mappings or
introduced in a query
• Smaller index and faster ingest
• Lower query performance
• Other than that - like any other
field
Schema on read
flexibility, cost, ingest pace
Load almost raw
Prep per query upon need
Advantages:
● Flexibility for ingested docs
● Start without data/use knowledge
● Improved ingest rate
8. Add to mapping
PUT /test {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_second"
},
"message": {
"type": "wildcard"
},
"status": {
"type": "runtime",
"runtime_type": "long",
"script": "String m = doc["message"].value; int end = m.lastIndexOf(" "); int start =
m.lastIndexOf(" ", end - 1) + 1; emit(Long.parseLong(m.substring(start, end)));"
}
}
}
POST /test/_doc?refresh
{
"timestamp" : "1998-04-30T14:30:17-05:00" ,
"message" : "40.135.0.0 - -
[1998-04-30T14:30:17-05:00] "GET /images/hm_bg.jpg
HTTP/1.0 " 200 24736"
}
9. and use like any other field
POST /_async_search
{
"query": {
"bool": {
"must" : [
{ "match": { "status": "200" } },
{
"range" : {
"@timestamp" : { "gte": "1998-05-01T00:00:00Z" , "lt": "1998-05-02T00:00:00Z" }
}
}
]
}
}
}
11. Future enhancements
• Painless script
• Grok patterns
• Query time enrichment
• Source field
Options for defining the function that yields the value in the field
12. Agenda Slide
Use color to highlight
What are runtime fields?1
How will runtime fields be implemented?3
Why are runtime fields useful?2
13. Schema on read
Benefits:
– Flexibility in defining the data
– No index footprint (lower TCO
– Improved ingest pace
Extract, transform and index data *only* upon need
Beneficial, but we do have better
mechanisms to help deal with these
Letting analysts define their schema in retrospect
14. A new field lifecycle
Extract more data
with Runtime fields
Index only @timestamp
The rest as log entry in
_source
Turn frequently
used runtime fields
into indexed fields
Benefits:
● Save time and effort
● Add fields if and when required, without knowing everything in advance
● Only index what you need - save index size - performance and hardware cost
15. Fix mapping errors
Benefits:
• Fix immediately, without reindexing
• Queries and schema don’t change (performance impacted)
Index Index data for optimal performance
Retrospective
Fix
Identify an error in the ingest instructions and
override the indexed field with runtime field for
indexed documents
Index Index new documents with the revised mapping
16. Field per context
Query, visualization, or completely ad-hoc
"runtime_mappings": {
"ip": {
"type": "runtime",
"runtime_type": "ip",
"script": "String m =
doc["message"].value;
emit(m.substring(0, m.indexOf(" ")));"
}
Benefits:
• Avoid polluting everyone’s schema with fields that answer a need only for a subset of the users
• Analyze more efficiently with fields designed to answer a specific need
What’s the average size of an article in my index?
I need to know for relevance ranking tuning.
Please don’t add it to everyone’s articles
index… You’re the only one interested in
it, and even you just look at it once a
month.
17. Autonomy
Anyone is free to create new fields
No collateral
impact
Adding a Runtime field
(not indexed)
Low permission
barrier
Benefits:
● Administrators avoid spending time on creating schema for specific needs
● Employees that are permitted to define their own data structure can achieve
more with fewer resources
18. Agenda Slide
Use color to highlight
What are runtime fields?1
How will runtime fields be implemented?3
Why are runtime fields useful?2
19. The complex parts are things we already have
Putting pre-existing mechanisms together
• Calculate a field value per document and do that quickly
– Prefered Painless script over ingest processor adaptation
• Index to rely on for the heavy lifting
• Logic to minimize the cases in which the calculation is performed
• Async search to deal with slow queries
22. Efficient calculation at query time
• Calculate only upon need
– Aggregations
– Filter only after filtering by indexed fields
– Display fields for top documents per query
• Initial performance tests prove the important of indexed timestamp
23. 23
Matching is done by the query
Only extract and transform are
made with a script
24. Define a field with the script
PUT /test {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_second"
},
"message": {
"type": "wildcard"
},
"status": {
"type": "runtime",
"runtime_type": "long",
"script": "String m = doc["message"].value; int end = m.lastIndexOf(" "); int start =
m.lastIndexOf(" ", end - 1) + 1; emit(Long.parseLong(m.substring(start, end)));"
}
}
}
POST /test/_doc?refresh
{
"timestamp" : "1998-04-30T14:30:17-05:00" ,
"message" : "40.135.0.0 - -
[1998-04-30T14:30:17-05:00] "GET /images/hm_bg.jpg
HTTP/1.0 " 200 24736"
}
26. Summary
• Runtime fields - schema on read in Elasticsearch
• Gaining in flexibility, index size and ingest pace, at a cost to
performance
• Leveraging existing mechanisms, e.g. index, async search, painless,
query optimization
• Facilitating new workflows:
– Field per context (query, visualization, schema, etc.)
– Fixing ingest errors in retrospect
– New field creation and ingest workflow: start working and gradually create the
schema
Runtime fields
Coming soon to an
elasticsearch cluster
near you