SlideShare uma empresa Scribd logo
1 de 67
Geospatial Indexing & Search
at Scale with Lucene
Nick Knize, PhD
Lucene PMC Member + Committer
@nknize
2
Lets take a trip...
...through Geo advancements
Lucene Geo Search, Then & Now: Prefix Trees vs. BKD Trees
Improving Shape Search: BKD + Tessellation
2
23
Lucene Geo in Elasticsearch, an Introduction1
Mappings
Geo Field Types
4
PUT crime/incidents/_mapping
{
“properties” : {
“location” : {
“type” : “geo_point”,
“ignore_malformed” : true,
}
}
}
define
geo_point mapping
POST crime/incidents
{
“location” : { “lat” : 41.12, “lon” : -71.34 }
}
5
insert
geo_point mapping
POST crime/incidents
{
“location” : “41.12, -71.34”
}
POST crime/incidents
{
“location” : [[-71.34, 41.12], [-71.32, 41.21]]
}
6
define
geo_shape mapping
PUT police/precincts/_mapping
{
“properties” : {
“coverage” : {
“type” : “geo_shape”,
“ignore_malformed” : false,
“tree” : ”quadtree”,
“precision” : “5m”,
“distance_error_pct“ : 0.025,
“orientation” : “ccw”,
“points_only” : false
}
}
}
7
insert
geo_shape mapping
POST police/precincts/
{
“coverage” : {
“type” : “polygon”,
“coordinates” : [[
[-73.9762134, 40.7538588],
[-73.9742356, 40.7526327],
[-73.9656733, 40.7516774],
[-73.9763236, 40.7521246],
[-73.9723788, 40.7516733],
[-73.9732423, 40.7523556],
[-73.9762134, 40.7538588]
]]
}
}
• Shapes are parsed using OGC and ISO standards definitions
• OGC Simple Feature Access
• ISO Geographic information — Spatial Schema (19107:2003)
• Supports the following geo_shape types
• Point, MultiPoint
• LineString, MultiLineString
• Polygon (with holes), MultiPolygon (with holes)
• Envelope (bbox)
geo_shape mapping
8
insert
Indexing
Under the hood
10
A long, long time ago, in a galaxy far far away...
...the Inverted [text] Index
terms dictionary
(terms)
postings list
(doc ids)
Fast 2
The 1
brown 1
dog 1
fox 1
jumping 2
jumps 1
lazy 1
over 1
quick 1
spiders 2
the 1
11
And the open source community went bananas!
we Lucene text search!
12
But what about 1D numbers...?
Prefix Trees; precision
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
1 myINT: 3881553173
1110 0111 0101 1011 1100 1101 0001 0101
2 myINT: 2405166357
1000 1111 0101 1011 1110 1101 0001 0101
3 myINT: 3205335297
1011 1111 0000 1101 1000 1001 0000 0001
4 myINT: 2835679237
1010 1001 0000 0101 0000 1000 0000 0101
5 myINT: 4177856517
1111 1001 0000 1101 0000 1000 0000 0101
13
But what about numbers...?
Prefix Trees; precision
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
1 myINT: 3881553173
1110 0111 0101 1011 1100 1101 0001 0101
2 myINT: 2405166357
1000 1111 0101 1011 1110 1101 0001 0101
3 myINT: 3205335297
1011 1111 0000 1101 1000 1001 0000 0001
4 myINT: 2835679237
1010 1001 0000 0101 0000 1000 0000 0101
5 myINT: 4177856517
1111 1001 0000 1101 0000 1000 0000 0101
14
And the Lucene community went...bananas...again...
we Lucene NUMERIC search!
15
But, but, what about Geo?
…we’re cool too!
16
How about Geo...terms?
Prefix Trees; round peg, square hole
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
17
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
18
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
19
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
20
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
21
Searching for round pegs in square holes...
Traversing the Prefix Tree...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
2
4
3
1
5
22
Searching for round pegs in square holes...
Traversing the Prefix Tree...of death?
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 2, 3, 4
11 1, 5
100 2
101 3, 4
111 1, 5
1000 2
1010 4
1011 3
1110 1
1111 5
23
Okay, cool! But what about GeoShapes?!
Quad Trees!
24
Okay, cool! But what about GeoShapes?!
Use the Inverted [geo] Index...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5
10 1, 2, 4
11 3, 5
100 1
101 2, 4
111 3, 5
1000 2
1010 4
1011 3
1110 3
1111 5
DocID: 6
25
Okay, cool! But what about GeoShapes?!
Use the Inverted [geo] Index...
terms dictionary
(terms)
postings list
(doc ids)
1 1, 2, 3, 4, 5, 6
10 1, 2, 4
11 3, 5, 6
100 1
101 2, 4
111 3, 5, 6
1000 2
1010 4
1011 3
1110 3, 6
1111 5, 6
DocID: 6
26
Okay, cool! But what about GeoShapes?!
Melt with the Inverted [geo] Index...
27
Okay, cool! But what about GeoShapes?!
Melt with the Inverted [geo] Index...
• Max tree_levels == 32 (2 bits / cell)
• distance_error_pct
• “slop” factor to manage transient memory
usage
• % of the diagonal distance (degrees) of
the shape
• Default == 0 if precision set (2.0)
• points_only
• optimization for points only shape index
• short-circuits recursion
28
geo search?
pssh… it’s simple!
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
29
30
For duty and humanity!
31
We’re doing it wrong
32
33
34
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
35
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
Miny Minx Maxy Maxx
m: internal node
36
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
Yval Xval
leaf node
37
Block K Dimensional Trees (Bkd) to the rescue!
multi-way...
38
Block K Dimensional Trees (Bkd) to the rescue!
multi-way… perfectly balanced...
39
Block K Dimensional Trees (Bkd) to the rescue!
multi-way… perfectly balanced...FAIR...
40
Block K Dimensional Trees (Bkd) to the rescue!
...the right tool!
41
Okay, cool! But what about GeoShapes?!
Tessellation!
42
Quad Tree
Decomposition
Simple polygon example
• 8 vertex polygon
• 1º x 1º coverage area
• 3m quad cell resolution
• 1,105,889 terms
43
Tessellation
Decomposition
Simple polygon example
• 8 vertex polygon
• 1º x 1º coverage area
• 3m quad cell resolution
• 1,105,889 terms
• 1.11 cm resolution
• 8 terms
• 138,236 : 1 term ratio
•  (•◡•) / smaller, faster index!
44
Tessellation + BKD
a made in… Lucene 7.4
Miny Minx Maxy Maxx
m: internal node
???
leaf node
45
Tessellation + BKD
a seven dimension made in… Lucene 7.4
Index Dimensions Data Dimensions
46
Tessellation + BKD
a made in… Lucene 7.4
Miny Minx Maxy Maxx
m: internal node
???
leaf node
47
Smaller, faster, stronger...
Searching triangle intersections
48
XYShape for Spatial...in Lucene 8.2+
Tessellating & Indexing Virtual Worlds
49
Smaller, faster, stronger… at scale
Smaller index...
50
Smaller, faster, stronger… at scale
Faster indexing - Nearly as fast as points!!
51
Smaller, faster, stronger… at scale
Positive feedback...
@imotov: "...test that was crashing a shard on a
node with 16GB heap after running for 4.5 hours
now finishes in less than 1sec"
@esri: "results show that the insert operation is 25
to 30 times faster in Elasticsearch 6.6 compare to
Elasticsearch 6.5.0/6.4.2. ...along with better
performance we also get accurate results for spatial
queries"
52
geo search?
pssh… it’s simple!
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
¯_(ツ)_/¯
53
So, what’s next??
• Non-geo use cases (`XYShape` type)
‒ CAD Drawings (Design, County / City Planning)
‒ Venue Mapping (Conferences, Hotels, Theme Parks, Schools)
‒ Sports analysis (Chicago Cubs, Baseball Advanced Media)
‒ Virtual Gaming & Mapping (Blizzard, Cyber Security / SIEM)
• Other coordinate systems (datum / projection support)
‒ Non terrestrial planet models
‒ Localized projections
‒ Custom projections
• Beyond 2D
‒ Space - Time (OGC Moving Features)
‒ Elevation Modelling (DEM, DTM)
‒ LiDAR (3D Modelling)
Use case application goodies...
Available 8.2
In Work
Aggregations
Geo & Analytics
‹#› 55
GeoDistance
Agg
{
"aggs" : {
“sf_rings" : {
"geo_distance" : {
"field" : "location",
"origin" : [32.95,
-96.82],
"ranges" : [
{ "to" : 50 },
{ "from" : 50,
"to" : 100 },
{ "from" : 100,
"to" : 300}
]
}
}
}
}
‹#› 56
GeoDistance
Agg
‹#› 57
GeoGrid
Agg
{
"aggs" : {
“crime_cells" : {
"geohash_grid" : {
"field" : "location",
"precision" : 8
}
}
}
}
‹#› 58
GeoGrid
Agg
59
geotile_grid Aggregation
Finally, a grid designed for maps!
7.0 introduces geo_tile aggregation
- matches the tiling scheme of well
known tile maps in the Web Mercator
Projection (EPSG:3857)
- on web mercator maps, grid cells are
- actually square
- preserve an identical aspect
ratio at all scales and latitudes
‹#› 60
GeoCentroid
Agg
"query" : {
"match" : {
"crime" : "burglary"
}
},
"aggs" : {
"towns" : {
"terms" : { "field" : "town" },
"aggs" : {
"centroid" : {
"geo_centroid" : {
"field" : “location"
}
}
}
}
}
‹#› 61
GeoCentroid
Agg
‹#› 62
GeoCentroid
Agg
‹#› 63
matrix_stats
Agg
{
"aggs": {
"statistics": {
"matrix_stats": {
"fields": ["poverty", "income"]
}
}
}
}
‹#› 64
matrix_stats
Agg
"aggregations": {
"statistics": {
"doc_count": 50,
"fields": [{
"name": "income",
"count": 50,
"mean": 51985.1,
"variance": 7.383377037755103E7,
"skewness": 0.5595114003506483,
"kurtosis": 2.5692365287787124,
"covariance": {
"income": 7.383377037755103E7,
"poverty": -21093.65836734694
},
"correlation": {
"income": 1.0,
"poverty": -0.8352655256272504
}
}, {
"name": "poverty",
"count": 50,
"mean": 12.732000000000001,
"variance": 8.637730612244896,
"skewness": 0.4516049811903419,
"kurtosis": 2.8615929677997767,
"covariance": {
"income": -21093.65836734694,
"poverty": 8.637730612244896
},
"correlation": {
"income": -0.8352655256272504,
"poverty": 1.0
}
}]
}
65
Geo Aggregations
more available, and coming soon...
• pca - ML foundation plugin
‒ dimensionality reduction
‒ image analysis
‒ classification / recognition
• geo_stats - In work...
‒ Moran’s I - measuring spatial auto-correlation
‒ Getis-Ord - spatial hot spot analysis
66
Now GA in Elastic 7.3….
THANK YOU
Follow for updates:
@nknize
apache/lucene-solr

Mais conteúdo relacionado

Mais procurados

Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Paolo Cremonesi
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
 
Uninformed Search technique
Uninformed Search techniqueUninformed Search technique
Uninformed Search techniqueKapil Dahal
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
 
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context KhalidKhan412
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer VisionDongmin Choi
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 

Mais procurados (20)

BERT
BERTBERT
BERT
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 
C1_W1.pdf
C1_W1.pdfC1_W1.pdf
C1_W1.pdf
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
CIFAR-10
CIFAR-10CIFAR-10
CIFAR-10
 
Uninformed Search technique
Uninformed Search techniqueUninformed Search technique
Uninformed Search technique
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Deep learning
Deep learningDeep learning
Deep learning
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Yolo
YoloYolo
Yolo
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 

Semelhante a Geospatial Indexing and Search at Scale with Apache Lucene

Geo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic MapsGeo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic MapsElasticsearch
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Thomas Gottron
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchElasticsearch
 
Programming's Greatest Hits of the 60s and 70s
Programming's Greatest Hits of the 60s and 70sProgramming's Greatest Hits of the 60s and 70s
Programming's Greatest Hits of the 60s and 70sMichelle Brush
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Grace Yang
 
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남Eunjeong (Lucy) Park
 
Insights on the artificial generation of instances of combinatorial optimizat...
Insights on the artificial generation of instances of combinatorial optimizat...Insights on the artificial generation of instances of combinatorial optimizat...
Insights on the artificial generation of instances of combinatorial optimizat...Facultad de Informática UCM
 
Mining interesting locations and travel sequences from gps trajectories
Mining interesting locations and travel sequences from gps trajectoriesMining interesting locations and travel sequences from gps trajectories
Mining interesting locations and travel sequences from gps trajectoriesHopeBay Technologies, Inc.
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemszhu02
 
Tone deaf: finding structure in Last.fm data
Tone deaf: finding structure in Last.fm dataTone deaf: finding structure in Last.fm data
Tone deaf: finding structure in Last.fm datafaithlessfriend
 
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...Daniel Katz
 
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) Charles Deledalle
 
Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.
Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.
Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.Lucidworks
 
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)Kobkrit Viriyayudhakorn
 
Networking in depth
Networking in depthNetworking in depth
Networking in depthRiadh Briki
 
Hash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataHash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataLuca Mastrostefano
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
 

Semelhante a Geospatial Indexing and Search at Scale with Apache Lucene (20)

Geo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic MapsGeo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic Maps
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in Elasticsearch
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Programming's Greatest Hits of the 60s and 70s
Programming's Greatest Hits of the 60s and 70sProgramming's Greatest Hits of the 60s and 70s
Programming's Greatest Hits of the 60s and 70s
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
 
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남
 
Insights on the artificial generation of instances of combinatorial optimizat...
Insights on the artificial generation of instances of combinatorial optimizat...Insights on the artificial generation of instances of combinatorial optimizat...
Insights on the artificial generation of instances of combinatorial optimizat...
 
Mining interesting locations and travel sequences from gps trajectories
Mining interesting locations and travel sequences from gps trajectoriesMining interesting locations and travel sequences from gps trajectories
Mining interesting locations and travel sequences from gps trajectories
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Tone deaf: finding structure in Last.fm data
Tone deaf: finding structure in Last.fm dataTone deaf: finding structure in Last.fm data
Tone deaf: finding structure in Last.fm data
 
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
Legal Analytics Course - Class 7 - Binary Classification with Decision Tree L...
 
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
 
Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.
Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.
Scorer’s Diversity Phase 2.0: Presented by Mikhail Khludnev, Grid Dynamics Inc.
 
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
 
Biol
BiolBiol
Biol
 
Networking in depth
Networking in depthNetworking in depth
Networking in depth
 
Geospatial Data in R
Geospatial Data in RGeospatial Data in R
Geospatial Data in R
 
Hash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataHash - A probabilistic approach for big data
Hash - A probabilistic approach for big data
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 

Último

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Último (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Geospatial Indexing and Search at Scale with Apache Lucene

  • 1. Geospatial Indexing & Search at Scale with Lucene Nick Knize, PhD Lucene PMC Member + Committer @nknize
  • 2. 2 Lets take a trip... ...through Geo advancements Lucene Geo Search, Then & Now: Prefix Trees vs. BKD Trees Improving Shape Search: BKD + Tessellation 2 23 Lucene Geo in Elasticsearch, an Introduction1
  • 4. 4 PUT crime/incidents/_mapping { “properties” : { “location” : { “type” : “geo_point”, “ignore_malformed” : true, } } } define geo_point mapping
  • 5. POST crime/incidents { “location” : { “lat” : 41.12, “lon” : -71.34 } } 5 insert geo_point mapping POST crime/incidents { “location” : “41.12, -71.34” } POST crime/incidents { “location” : [[-71.34, 41.12], [-71.32, 41.21]] }
  • 6. 6 define geo_shape mapping PUT police/precincts/_mapping { “properties” : { “coverage” : { “type” : “geo_shape”, “ignore_malformed” : false, “tree” : ”quadtree”, “precision” : “5m”, “distance_error_pct“ : 0.025, “orientation” : “ccw”, “points_only” : false } } }
  • 7. 7 insert geo_shape mapping POST police/precincts/ { “coverage” : { “type” : “polygon”, “coordinates” : [[ [-73.9762134, 40.7538588], [-73.9742356, 40.7526327], [-73.9656733, 40.7516774], [-73.9763236, 40.7521246], [-73.9723788, 40.7516733], [-73.9732423, 40.7523556], [-73.9762134, 40.7538588] ]] } }
  • 8. • Shapes are parsed using OGC and ISO standards definitions • OGC Simple Feature Access • ISO Geographic information — Spatial Schema (19107:2003) • Supports the following geo_shape types • Point, MultiPoint • LineString, MultiLineString • Polygon (with holes), MultiPolygon (with holes) • Envelope (bbox) geo_shape mapping 8 insert
  • 10. 10 A long, long time ago, in a galaxy far far away... ...the Inverted [text] Index terms dictionary (terms) postings list (doc ids) Fast 2 The 1 brown 1 dog 1 fox 1 jumping 2 jumps 1 lazy 1 over 1 quick 1 spiders 2 the 1
  • 11. 11 And the open source community went bananas! we Lucene text search!
  • 12. 12 But what about 1D numbers...? Prefix Trees; precision terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 1 myINT: 3881553173 1110 0111 0101 1011 1100 1101 0001 0101 2 myINT: 2405166357 1000 1111 0101 1011 1110 1101 0001 0101 3 myINT: 3205335297 1011 1111 0000 1101 1000 1001 0000 0001 4 myINT: 2835679237 1010 1001 0000 0101 0000 1000 0000 0101 5 myINT: 4177856517 1111 1001 0000 1101 0000 1000 0000 0101
  • 13. 13 But what about numbers...? Prefix Trees; precision terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 1 myINT: 3881553173 1110 0111 0101 1011 1100 1101 0001 0101 2 myINT: 2405166357 1000 1111 0101 1011 1110 1101 0001 0101 3 myINT: 3205335297 1011 1111 0000 1101 1000 1001 0000 0001 4 myINT: 2835679237 1010 1001 0000 0101 0000 1000 0000 0101 5 myINT: 4177856517 1111 1001 0000 1101 0000 1000 0000 0101
  • 14. 14 And the Lucene community went...bananas...again... we Lucene NUMERIC search!
  • 15. 15 But, but, what about Geo? …we’re cool too!
  • 16. 16 How about Geo...terms? Prefix Trees; round peg, square hole terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5
  • 17. 17 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 18. 18 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 19. 19 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 20. 20 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 21. 21 Searching for round pegs in square holes... Traversing the Prefix Tree... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5 2 4 3 1 5
  • 22. 22 Searching for round pegs in square holes... Traversing the Prefix Tree...of death? terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 2, 3, 4 11 1, 5 100 2 101 3, 4 111 1, 5 1000 2 1010 4 1011 3 1110 1 1111 5
  • 23. 23 Okay, cool! But what about GeoShapes?! Quad Trees!
  • 24. 24 Okay, cool! But what about GeoShapes?! Use the Inverted [geo] Index... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5 10 1, 2, 4 11 3, 5 100 1 101 2, 4 111 3, 5 1000 2 1010 4 1011 3 1110 3 1111 5 DocID: 6
  • 25. 25 Okay, cool! But what about GeoShapes?! Use the Inverted [geo] Index... terms dictionary (terms) postings list (doc ids) 1 1, 2, 3, 4, 5, 6 10 1, 2, 4 11 3, 5, 6 100 1 101 2, 4 111 3, 5, 6 1000 2 1010 4 1011 3 1110 3, 6 1111 5, 6 DocID: 6
  • 26. 26 Okay, cool! But what about GeoShapes?! Melt with the Inverted [geo] Index...
  • 27. 27 Okay, cool! But what about GeoShapes?! Melt with the Inverted [geo] Index... • Max tree_levels == 32 (2 bits / cell) • distance_error_pct • “slop” factor to manage transient memory usage • % of the diagonal distance (degrees) of the shape • Default == 0 if precision set (2.0) • points_only • optimization for points only shape index • short-circuits recursion
  • 28. 28 geo search? pssh… it’s simple! ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯
  • 29. 29
  • 30. 30 For duty and humanity!
  • 32. 32
  • 33. 33
  • 34. 34 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool!
  • 35. 35 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool! Miny Minx Maxy Maxx m: internal node
  • 36. 36 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool! Yval Xval leaf node
  • 37. 37 Block K Dimensional Trees (Bkd) to the rescue! multi-way...
  • 38. 38 Block K Dimensional Trees (Bkd) to the rescue! multi-way… perfectly balanced...
  • 39. 39 Block K Dimensional Trees (Bkd) to the rescue! multi-way… perfectly balanced...FAIR...
  • 40. 40 Block K Dimensional Trees (Bkd) to the rescue! ...the right tool!
  • 41. 41 Okay, cool! But what about GeoShapes?! Tessellation!
  • 42. 42 Quad Tree Decomposition Simple polygon example • 8 vertex polygon • 1º x 1º coverage area • 3m quad cell resolution • 1,105,889 terms
  • 43. 43 Tessellation Decomposition Simple polygon example • 8 vertex polygon • 1º x 1º coverage area • 3m quad cell resolution • 1,105,889 terms • 1.11 cm resolution • 8 terms • 138,236 : 1 term ratio • (•◡•) / smaller, faster index!
  • 44. 44 Tessellation + BKD a made in… Lucene 7.4 Miny Minx Maxy Maxx m: internal node ??? leaf node
  • 45. 45 Tessellation + BKD a seven dimension made in… Lucene 7.4 Index Dimensions Data Dimensions
  • 46. 46 Tessellation + BKD a made in… Lucene 7.4 Miny Minx Maxy Maxx m: internal node ??? leaf node
  • 48. 48 XYShape for Spatial...in Lucene 8.2+ Tessellating & Indexing Virtual Worlds
  • 49. 49 Smaller, faster, stronger… at scale Smaller index...
  • 50. 50 Smaller, faster, stronger… at scale Faster indexing - Nearly as fast as points!!
  • 51. 51 Smaller, faster, stronger… at scale Positive feedback... @imotov: "...test that was crashing a shard on a node with 16GB heap after running for 4.5 hours now finishes in less than 1sec" @esri: "results show that the insert operation is 25 to 30 times faster in Elasticsearch 6.6 compare to Elasticsearch 6.5.0/6.4.2. ...along with better performance we also get accurate results for spatial queries"
  • 52. 52 geo search? pssh… it’s simple! ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯ ¯_(ツ)_/¯
  • 53. 53 So, what’s next?? • Non-geo use cases (`XYShape` type) ‒ CAD Drawings (Design, County / City Planning) ‒ Venue Mapping (Conferences, Hotels, Theme Parks, Schools) ‒ Sports analysis (Chicago Cubs, Baseball Advanced Media) ‒ Virtual Gaming & Mapping (Blizzard, Cyber Security / SIEM) • Other coordinate systems (datum / projection support) ‒ Non terrestrial planet models ‒ Localized projections ‒ Custom projections • Beyond 2D ‒ Space - Time (OGC Moving Features) ‒ Elevation Modelling (DEM, DTM) ‒ LiDAR (3D Modelling) Use case application goodies... Available 8.2 In Work
  • 55. ‹#› 55 GeoDistance Agg { "aggs" : { “sf_rings" : { "geo_distance" : { "field" : "location", "origin" : [32.95, -96.82], "ranges" : [ { "to" : 50 }, { "from" : 50, "to" : 100 }, { "from" : 100, "to" : 300} ] } } } }
  • 57. ‹#› 57 GeoGrid Agg { "aggs" : { “crime_cells" : { "geohash_grid" : { "field" : "location", "precision" : 8 } } } }
  • 59. 59 geotile_grid Aggregation Finally, a grid designed for maps! 7.0 introduces geo_tile aggregation - matches the tiling scheme of well known tile maps in the Web Mercator Projection (EPSG:3857) - on web mercator maps, grid cells are - actually square - preserve an identical aspect ratio at all scales and latitudes
  • 60. ‹#› 60 GeoCentroid Agg "query" : { "match" : { "crime" : "burglary" } }, "aggs" : { "towns" : { "terms" : { "field" : "town" }, "aggs" : { "centroid" : { "geo_centroid" : { "field" : “location" } } } } }
  • 63. ‹#› 63 matrix_stats Agg { "aggs": { "statistics": { "matrix_stats": { "fields": ["poverty", "income"] } } } }
  • 64. ‹#› 64 matrix_stats Agg "aggregations": { "statistics": { "doc_count": 50, "fields": [{ "name": "income", "count": 50, "mean": 51985.1, "variance": 7.383377037755103E7, "skewness": 0.5595114003506483, "kurtosis": 2.5692365287787124, "covariance": { "income": 7.383377037755103E7, "poverty": -21093.65836734694 }, "correlation": { "income": 1.0, "poverty": -0.8352655256272504 } }, { "name": "poverty", "count": 50, "mean": 12.732000000000001, "variance": 8.637730612244896, "skewness": 0.4516049811903419, "kurtosis": 2.8615929677997767, "covariance": { "income": -21093.65836734694, "poverty": 8.637730612244896 }, "correlation": { "income": -0.8352655256272504, "poverty": 1.0 } }] }
  • 65. 65 Geo Aggregations more available, and coming soon... • pca - ML foundation plugin ‒ dimensionality reduction ‒ image analysis ‒ classification / recognition • geo_stats - In work... ‒ Moran’s I - measuring spatial auto-correlation ‒ Getis-Ord - spatial hot spot analysis
  • 66. 66 Now GA in Elastic 7.3….
  • 67. THANK YOU Follow for updates: @nknize apache/lucene-solr