Everyone wants their Elasticsearch cluster to index and search faster, but optimizing both and finding the balance between the two can be tricky. At Kenna Security, we use Elasticsearch to store over 3 billion vulnerabilities for our clients. All that data needs to be quickly accessible so clients can assess their cyber security risk. At the same time the data is constantly changing. On average, we update 200+ million documents a day which means indexing speed is also a top priority.
In the early days our cluster could barely keep up. Nodes would fall over constantly, indexing queues would get backed up for days, and searches timed out about 50% of the time. Fixing all of these issues did not happen overnight. However, with a lot of testing, tweaking, and a few “OH crap!” moments we were able to build a stable, 21 node cluster that now meets all of our indexing and searching demands. In this talk I will share the insights we gained and the strategies we used to scale our cluster and hopefully that advice will save others some time and frustration as they grow their own.
19. When bulk processing your data...
● Start with batches of 100 and double the size from there until indexing time
plateaus
● Too large requests can put memory pressure on Elasticsearch so keep it under
a couple tens of megabytes
20. Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
Speed Up Searching
4
5
6
7
33. Speed Up Searching
Group your data
Route your documents
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
4
5
6
7
34. Does NOT score documents
and only cares if the
document matches the search
criteria or not
Queries
Scores documents based on
how well they match the
search criteria.
Filters
Easy/Fast Hard/Slow Easy/Fast Hard/Slow
42. Use filters whenever possible
Speed Up Searching
Group your data
Route your documents
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
4
5
6
7
48. Store IDs as keywords
Use filters whenever possible
Speed Up Searching
Group your data
Route your documents
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
4
5
6
7
54. Don’t let your users slow you down
Store IDs as keywords
Use filters whenever possible
Speed Up Searching
Group your data
Route your documents
Bulk process data
Speed up Indexing
Toggle the refresh interval1
2
3
4
5
6
7