SlideShare a Scribd company logo
1 of 58
Download to read offline
LUCENE/ SOLR 4 SPATIALDEEPDIVE
DavidSmiley
SoftwareSystemsEngineer,Lead
© 2013 The MITRE Corporation. All rights reserved.
LUCENE / SOLR 4 SPATIAL
DEEP-DIVE
2013 Lucene Revolution
Presented by David Smiley, MITRE
About David Smiley
• Working at MITRE, for 13 years
• web development, Java, search
• 3 Solr apps, 1 Endeca
• Published 1st book on Solr; then 2nd edition (2009, 2011)
• Apache Lucene / Solr committer/PMC member (2012)
• Specializing on spatial
• Presented at Lucene Revolution (2010) & Basis O.S.
Search Conference (2011, 2012)
• Taught Solr classes at MITRE (2010, 2011, 2012)
• Solr search consultant within MITRE and its sponsors,
and privately
3
Agenda
• Background, overview
• Spatial4j
• Lucene spatial
• PrefixTree / Trie / Grid
• Solr spatial
• Demo
• Interesting use-cases
BACKGROUND &
OVERVIEW
What is Spatial Search?
Popular features:
• Spatial filter query
• Spatial distance sorting
• Spatial distance relevancy (i.e. spatial query score)
NOT “geocoding” – resolve “Boston” to its latitude and longitude
Typical use-case:
1. Index a location for each Lucene document given a
latitude & longitude
2. Then search for matching documents by a circle (point-
radius) or bounding box
3. Then sort results by distance
History of Spatial for Lucene & Solr
• 2007: Local-Lucene
• by Patric O’Leary (AOL)
• 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0
• Local-Lucene graduates to an official Lucene contrib module
• 2009-12: Spatial Search Plugin (SSP) for Solr
• by Chris Male (JTeam -> Orange11, ElasticSearch)
• 2010-10: SOLR-2155 a geohash prefix tree filter
• by David Smiley (MITRE)
• 2011-01: Lucene Spatial Playground (LSP)
• by Ryan McKinley (Voyager GIS), David, and Chris
• 2011-03: Solr 3.1 new spatial features
• by Grant Ingersoll and Yonik Seeley (LucidWorks)
• 2012-03: LSP -> Lucene 4 spatial module + Spatial4j + SSP
• replaces former Lucene spatial contrib module
Lucene Spatial Committers
• David Smiley
• Works for MITRE
• Boston area
• Ryan McKinley
• Works for Voyager GIS
• Silicon Valley
• Chris Male,
• Formerly at Elastic Search
• New Zealand
Spatial decomposed
• Spatial4j
• Shapes, WKT, Distance calculations, JTS adapter
• Lucene spatial
• Strategies: PrefixTree (TermQuery & Recursive impl.), BBox,
PointVector
• Solr adapters
• Misc: Spatial Solr Sandbox
• LSE
• JtsGeoStrategy
• Spatial-Demo (web app)
Lines of Code for Spatial Components
Spatial4j
43%
Lucene spatial
35%
Solr adapters
6%
Misc
16%
Total: 4,781 Non-Comment Source Statements (without javadocs or tests)
as of 2012-09
CarrotSearch Labs’ RandomizedTesting
• http://labs.carrotsearch.com/randomizedtesting.html
• Provides plumbing for repeatable randomized JUnit tests
• All the spatial test code uses it extensively
Randomized testing more generally is a certain
philosophy / approach on how to test
• A typical hard-coded test will only catch some regressions
• A randomized test will catch just about anything
eventually, especially nasty edge cases
• Although it’s hard to read / write / maintain these tests
• Randomized testing helped find bugs related to…
• Computing the bounding box of a circle
• Computing the relationship of a circle to a rectangle that has all 4 of
its corners inside it
SPATIAL4J
It’s all about the shapes
Spatial4j: It’s all about the shapes
https://github.com/spatial4j/spatial4j (spatial4j.com redirect)
• Shapes
• A “Shape” abstraction with multiple implementations
• Geodetic (sphere) & Cartesian/2D implementations
• Computes intersection relationship with other shapes
• Also…
• Distance and area math utilities, Geohash utilities
• Parsing Well Known Text (WKT) formatted shapes
• ASL licensed project independent of Apache on GitHub
• Requires JTS (LGPL licensed) for polygons & WKT*
• JTS is “JTS Topology Suite”
• * WKT parsing soon to be implemented directly by Spatial4j
• Ported to .NET as Spatial4n and used by RavenDB
• by Itamar Syn-Herskhko
The case for Spatial4j’s existence
• Just for shapes? How much code could there be?
• You’d be surprised. Determining the relationship between a lat-lon
rectangle and a geodetic circle (Within, Contains, Intersects, Disjoint)
is non-trivial, and that’s just one shape.
• Lots of non-trivial test code go with it.
• Why isn’t it a part of Lucene spatial?
• Parts of Spatial4j depend on JTS, an LGPL licensed library. The
Lucene PMC voted not to introduce this compile-time dependency.
• Spatial4j is independently useful.
• Is this duplication of other open-source that could be used?
• Spatial4j needs to be ASL licensed to be a dependency of Lucene.
• Still… I haven’t found existing code that does what Spatial4j does.
• Can’t only the JTS dependent parts be external to Lucene?
The Shape interface
(may become an abstract class in the next version)
• interface Shape {
• Point getCenter();
• Rectangle getBoundingBox();
• boolean hasArea();
• double getArea();
• SpatialRelation relate(Shape other);
• Must support Point & Rectangle
• enum SpatialRelation
• DISJOINT, INTERSECTS, WITHIN, CONTAINS
• Note: simpler set than the “DE-9IM” spatial standard
• no “equals” or “touches”
Spatial4j shapes
Cartesian
Cartesian
with
dateline
wrap
Geodetic
Point Y Y Y
Line & LineString
(w/ buffer)
Y N N
Rectangle Y Y Y
Circle Y N Y
ShapeCollection Y Y Y
JTS Geometry
(incl. polygons)
Y Y N
• Cartesian (AKA
Euclidean): a flat plane
• Dateline wrap assumes
the plane circles back on
itself
• Geodetic: a spherical
mathematical model
Well Known Text (WKT)
(see Wikipedia)
• A popular standard for
representing shapes as
strings
• Requires JTS’s WKT
Parser but Spatial4j has
its own in-progress
• Extensions are TBD for
Rectangles and Circles
• Limited support for
EMPTY and “Z” and “M”
dimensions (future)
• Some Examples:
• POINT (3, -2)
• LINESTRING(30 10, 10 30, …
• POLYGON ((30 10, 10 20, 20
40, 40 40, 30 10))
• MULTIPOLYGON (((…
• …
• Deprecated (may move
to Solr):
• -90, -180
• -180 -90 180 90
• CIRCLE(4.56,1.23 d=0.071)
• TBD / Pending:
• ENVELOPE(-180,180,90,-90)
• BOX2D(-180 -90, 180 90)
Spatial4j code sample
SpatialContext ctx = SpatialContext.GEO;
Rectangle r = ctx.makeRectangle(-71, -70, 42, 43);
Circle c = ctx.makeCircle(-72, 42, 1);
SpatialRelation rel = r.relate(c);
System.out.println(rel);
rel.intersects();//boolean
ctx = JtsSpatialContext.GEO;
Shape s = ctx.readShape(“POLYGON ((30 10, 10 20, 20 40, 40
40, 30 10))”);
double distanceDegrees = ctx.getDistCalc().distance(
ctx.makePoint(2, 2), ctx.makePoint(3, 3) );
Distances (including circle
radius) are in “Degrees”, not
radians or KM
Spatial4j Future
• Built-in WKT support (no JTS dependency)
• Extensible to user-defined shapes
• API improvements
• Shape argument validation via WKT but not via ctx.makeShape(…)
• ShapeCollection visitor design pattern
• Refactor to remove need for isGeo()
• LineString dateline & geodetic support
• Projection / Datum support
LUCENE SPATIAL
Spatial index information retrieval
Lucene 4 Spatial Module
• There isn’t one best way to implement spatial indexing for
all use-cases
• Index just points, or other shapes too? Which?
• Multiple shapes per field?
• Query by Intersection? Contains? Within? Equals? Disjoint? …
• Distance sorting? Query boost by distance?
• Or more exotic shape relevancy like overlap percentage?
• Tradeoff shape precision for speed?
• Multiple SpatialStrategy implementations:
• RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy
• PointVectorStrategy
• BBoxStrategy (currently in trunk, not 4x)
• JtsGeoStrategy (in Spatial Solr Sandbox)
Strategy: PointVector
• Similar to Solr’s PointType / LatLonType
• X & Y trie double fields; caching via FieldCache
• Characteristics
• Indexes points (only)
• Single-valued field (no multi)
• Query by rectangle or circle (only)
• Circle uses FieldCache (requires memory)
• Circle does bbox pre-filter for performance
• Relations: Intersects, Within (only)
• Exact precision for x & y coordinates and query shape
• Distance sort
• Uses FieldCache (requires memory)
Strategy: BBox
• Implemented with 4 doubles & 1 boolean
• Ported from ESRI GeoPortal (Open Source)
• Characteristics:
• Indexes rectangles (only)
• Single-valued field (no multi)
• Query by rectangle (only)
• Supports all relations: Intersects, Within, Contains, …
• Distance sort from box center
• Uses FieldCache (requires memory)
• Area overlap sorting
• Sort results by percentage overlap between query and indexed boxes
• Uses FieldCache (requires memory)
• Note: FieldCache needs are somewhat high
Strategy: JtsGeoStrategy
• Stores a JTS geometry in Lucene 4’s DocValues
• Stores WKB (WKT in binary format)
• Full vector geometry is retained for search
• DocValues is mostly a better FieldCache
• Faster loading into memory
• Can be disk resident or memory
• Multi-valued
• Characteristics:
• Indexes any shape, including Multi… varieties
• Query by any shape
• Uses DocValues (memory use optional)
• Supports all relations: intersect, within, contains, …
• Could easily also support JTS’s exotic DE-9IM based relations
• Exact precision to the vector geometry
• No sorting
• Experimental / immature status
More of a proof-of-concept for now
PREFIXTREE STRATEGY
Spatial grid indexing
Strategy: RecursivePrefixTree
• Grid / Tile / Trie / Prefix-
Tree based
• With recursive decent
algorithms
• Or TermQueryPrefixTree
alternative
• Choose Geohash (geo
only) or Quad tree
• The most mature
strategy to date
• Highly tested
• The current evolution of
SOLR-2155
Strategy: RecursivePrefixTree
• Characteristics:
• Indexes all shapes
• Variable precision of shape edges
• Highly precise shapes other than Point won’t scale
• LineString possibly not precise enough for your needs
• Multi-valued field support
• Query by any shape
• Variable precision for query shape
• Highest precision usually scales
• All Relations: Intersects, Within, Contains, Disjoint
• Distance sort (w/ multi-value support)
• Warning: immature, won’t scale
• Uses significant amounts of memory
• Fast scalable spatial filtering; no caches needed
new in Lucene 4.3
How many search /
NoSQL systems have
these capabilities?
Geohashes
• What is a Geohash?
• A lat/lon geocode system
• Has a hierarchical spatial structure
• Gradual precision degradation
• In the public domain
http://en.wikipedia.org/wiki/Geohash
• Example: (Boston) DRT2Y
Demo
http://openlocation.org/geohash/geohash-js/
Zooming In: D
Zooming In: DR
Zooming In: DRT
Zooming In: DRT2
Zooming In: DRT2Y
Geohash Grids
DRT2Y
Internal coordinates of an odd length geohash…
…and an even length geohash
DRT2
Demo
• Spatial Solr Playground
• Demo KML grid generation from geometries
• A sample point with quad tree indexes to these tokens:
• A, AD, ADB, ADBA
• A sample circle with quad tree indexes to these tokens:
• A, AB, ABA, ABAB+, ABAC+, ABAD+, ABB, ABBA+,
ABBB+, ABBC+, ABBD+, ABC, ABCA+, ABCB+, ABCC+,
ABCD+, ABD+, AD, ADA, ADAA+, ADAB+, ADAC+, ADAD+,
ADB+, ADC, ADCA+, ADCB+, ADCD+, ADD, ADDA+,
ADDB+, ADDC+, ADDD+, B, BA, BAA, BAAC+, BAAD+,
BAC, BACA+, BACB+, BACC+, BACD+, BC, BCA, BCAA+,
BCAB+, BCAC+, BCC, BCCA+, BCCC+, C, CB, CBB,
CBBA+
• Tokens with a ‘+’ are actually indexed with and without the ‘+’
PrefixTreeStrategy Architecture
Shape
calc rect relationship
SpatialPrefixTree & Cell
byte string to/from Cell (rect)
PrefixTreeStrategy
index & search algorithms
Lucene
TermsEnum
IntersectsPrefixTreeFilter
ContainsPrefixTreeFilter
WithinPrefixTreeFilter
Lucene Spatial example code
ctx = SpatialContext.GEO;
strategy = new RecursivePrefixTreeStrategy(
new GeohashPrefixTree(ctx,11), “myGeoField”);
… // make indexWriter and a Document
for (Field f : strategy.createIndexableFields(shape))
doc.add(f);
indexWriter.addDocument(doc);
…
filter = strategy.makeFilter(
new SpatialArgs(SpatialOperation.Intersects,
ctx.makeCircle(-80.0, 33.0,
DistanceUtils.dist2Degrees(200,
DistanceUtils.EARTH_MEAN_RADIUS_KM))));
indexSearcher.search(userKeywordQuery, filter, 10);
See SpatialExample.java in Lucene spatial tests for more
Future
• Possible de-emphasis of SpatialStrategy abstraction
• A better options for distance sorting of PrefixTree
strategies
• Better PrefixTree encoding than both geohash & quad
tree
• Google Summer of Code 2013 -- TBD
• Performance improvements to spatial Intersects
RecursivePrefixTree Filter
• Remove the need to double-index leaf-nodes (with and
without ‘+’)
• Exact geometry search by blending benefits of PrefixTree
and JtsGeoStrategy
• A Single-dimensional PrefixTree (for numeric range index)
SOLR SPATIAL
Adapters to Lucene 4 spatial
Solr 3 Spatial: LatLonType & friends
• Solr 3 was Solr’s first release to include spatial support
• Not based on Lucene’s old spatial contrib module
• Similar to TwoDoublesStrategy but more optimized
• Single-valued only, fast distance sorting, can choose floats (save
memory)
• Fields:
• LatLonType (Geodetic)
• PointType (Cartesian)
• Query parsers (spatial filters):
• {!geofilt} (circle) “p” and “sfield” and “d” params
• {!bbox} (bounding box of a circle)
• Distance function:
• geodist() and some esoteric others
NOT completely
superseded by Solr 4
spatial fields
Solr 4 Spatial
• See
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial
4
<fieldType name="location_rpt"
class="solr.SpatialRecursivePrefixTreeFieldType”
spatialContextFactory=”
com.spatial4j.core.context.jts.JtsSpatialContextFactory”
distErrPct="0.025”
maxDistErr="0.000009”
units="degrees” />
If you don’t need JTS
(polygons) don’t set this
Non-point shapes
approximated to
grid up to 2.5% of
radius
Max precision (1m) as
measured in degrees
Indexing
• Point: Latitude, Longitude (i.e. Y, X)
<field name="geo">43.17614, -90.57341</field>
• Point: X Y
<field name="geo">-90.57341 43.17614</field>
• Rect: minX minY maxX maxY
<field name="geo">-74.093 41.042 -69.347 44.558</field>
• Circle: point then d=radius (in degrees)
• will be deprecated
<field name="geo">Circle(4.56,1.23 d=0.0710)</field>
• WKT (preferred; it’s a standard)
<field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20,
0 0, -10 30))</field>
Filter (search)
• Using Solr 3’s bbox or geofilt query parsers
• Distance radius ‘d’ is interpreted as kilometers, just like LatLonType
• Limited to bbox and bbox of a circle
fq={!geofilt}&sfield=geo&pt=45.15,-93.85&d=5
• Range query style (bounding box)
• Handles dateline wrap
fq=geo:[-90,-180 TO 90,180]
• Field query style
• Unique to Lucene 4 spatial; see SpatialArgsParser
fq=geo:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40
20, 0 0, -10 30))) distErrPct=0”
• Predicates: Intersects, IsDisjointTo, IsWithin,
Contains, …
• distErrPct (& distErr) optional; override field type’s default
SOLR-4242: A
better spatial
query parser
Distance Sort & Relevancy Boost
• geodist() is for Solr 3 LatLonType only
sort=geodist(lltField,45.15,-93.85) desc
• Solr 4 spatial queries can return the distance as the score
q={!geofilt sfield=geo pt=45.15,-93.85 d=5
score=distance}&sort=score asc&fl=*,score
• Without a filter
sort=query($sortsq) asc&sortsq={!geofilt filter=false
score=distance sfield=geo pt=45.15,-93.85 d=0}
• Relevancy boost
defType=edismax&boost=query($mysq)&mysq={!geofilt
filter=false score=recipDistance pt=45.15,-98.85
d=5}
Distance Faceting
• sfield=geo (the field)
• pt=45.15,-93.85 (point of reference)
• Within 10km
• facet.query={!geofilt d=10}
• Within 50km
• facet.query={!geofilt d=50}
• Within 100km
• facet.query={!geofilt d=100}
Future
• A more Solr-friendly spatial query parser SOLR-4242
• Retrofit geodist() to support the SpatialStrategies?
• Expose more tunables
• A grid based heat-map faceting component
• Idea: a multi-strategy spatial field encompassing
• A PrefixTree field for points
• A PrefixTree field for non-points
• A TwoDoubles field for good distance sorting / relevancy
• Knows whether its single vs. multi-valued
• A FieldType for multi-value numeric ranges
DEMO
INTERESTING USE CASES
1. Geohash each point to multiple lengths and index each
length into its own field
• geohash_1:D, geohash_2:DR, geohash_3:DRT, geohash_4:DRT2
2. Search with a rectangle (bbox) filter, and…
3. Facet on the geohash field with the desired resolution
• facet.field=geohash_4
&facet.limit=10000
• Lots of tuning / customization
options
• Projected / quad tree
• facet.prefix may help
Heatmap / Grid faceting
Plotting many points on a map
• Why not ask Solr for rows=1000 ?
• It’s slow
• If variable-points per doc then could yield be 1 distinct point or 1M
• Instead facet on a geohash with facet.limit=1000
• Fast
• Guaranteed <= 1000 points
• But might need lots of memory
• Or result-grouping on a geohash
But do you really want
to plot 1000+ points
on a map?
Filter by indexed distance constraints
• Imagine a dating site where both potential parties have a
maximum distance they’re willing to travel
• Q: For the current user, who is not “too far” for you but is
also not “too far” for them?
• A: Index each user’s location as a point in one field and
as a circle in another. Query by the current user’s circle to
the indexed point field as well as the current user’s point
to the indexed circle field.
Multi-valued durations
• What if your documents needed a variable number of time (or
other numerical value) durations
• This approach won’t work:
<field name=“start” type=“tdate” multiValued=“true”/>
<field name=“end” type=“tdate” multiValued=“true”/>
• Solr (without Solr 4 spatial fields) can’t do it!
• You need to think differently to solve this…
http://wiki.apache.org/solr/SpatialForTimeDurations
• Example use-cases
• Searching for hotel-room vacancies
• Searching for movie show-times
• (next slides) Each document is a person with a variable number of
“shifts” that they are working…
… model durations as points
… queries become rectangles
… some config & search details
• Configuration
<fieldType name="days_of_year”
class="solr.SpatialRecursivePrefixTreeFieldType"
geo="false" units="degrees"
worldBounds="0 0 365 365"
distErrPct="0" maxDistErr="1"/>
• Sample search: Find shifts that have any overlap with 19th day to 23rd
daysOfYear:Intersects(0 18.5 23.5 365)
• Caveat: Won’t scale to the full precision of a java Long (timestamp)
Thank you!
• References
• Lucene 4 spatial javadocs
• https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/
• Spatial4j at GitHub
• https://github.com/spatial4j/spatial4j ( spatial4j.com redirect)
• http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com
• Solr
• http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
• Spatial Solr Sandbox
• https://github.com/ryantxu/spatial-solr-sandbox
• Contact me:
• David Smiley dsmiley@mitre.org dsmiley@apache.org
CONTACT
DavidSmiley
dsmiley@mitre.org

More Related Content

What's hot

Oracle APEX Cheat Sheet
Oracle APEX Cheat SheetOracle APEX Cheat Sheet
Oracle APEX Cheat SheetDimitri Gielis
 
[Solr 스터디] Solr 설정 및 색인 (2017)
[Solr 스터디] Solr 설정 및 색인 (2017)[Solr 스터디] Solr 설정 및 색인 (2017)
[Solr 스터디] Solr 설정 및 색인 (2017)용호 최
 
Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 PartitionsPerconaPerformance
 
FreeIPA - Attacking the Active Directory of Linux
FreeIPA - Attacking the Active Directory of LinuxFreeIPA - Attacking the Active Directory of Linux
FreeIPA - Attacking the Active Directory of LinuxJulian Catrambone
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...EDB
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재PgDay.Seoul
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Mydbops
 
Introduction to Tokyo Products
Introduction to Tokyo ProductsIntroduction to Tokyo Products
Introduction to Tokyo ProductsMikio Hirabayashi
 
Heap exploitation
Heap exploitationHeap exploitation
Heap exploitationAngel Boy
 
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...Umair Shahid
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillDataWorks Summit
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBScalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBAlluxio, Inc.
 
Pgsodium's Features: those not provided by pgcrypto and integration with rem...
 Pgsodium's Features: those not provided by pgcrypto and integration with rem... Pgsodium's Features: those not provided by pgcrypto and integration with rem...
Pgsodium's Features: those not provided by pgcrypto and integration with rem...EDB
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 

What's hot (20)

Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
Oracle APEX Cheat Sheet
Oracle APEX Cheat SheetOracle APEX Cheat Sheet
Oracle APEX Cheat Sheet
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
[Solr 스터디] Solr 설정 및 색인 (2017)
[Solr 스터디] Solr 설정 및 색인 (2017)[Solr 스터디] Solr 설정 및 색인 (2017)
[Solr 스터디] Solr 설정 및 색인 (2017)
 
Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 Partitions
 
FreeIPA - Attacking the Active Directory of Linux
FreeIPA - Attacking the Active Directory of LinuxFreeIPA - Attacking the Active Directory of Linux
FreeIPA - Attacking the Active Directory of Linux
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
 
CKAN: open source data catalog
CKAN: open source data catalogCKAN: open source data catalog
CKAN: open source data catalog
 
Introduction to Tokyo Products
Introduction to Tokyo ProductsIntroduction to Tokyo Products
Introduction to Tokyo Products
 
Heap exploitation
Heap exploitationHeap exploitation
Heap exploitation
 
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBScalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
 
Pgsodium's Features: those not provided by pgcrypto and integration with rem...
 Pgsodium's Features: those not provided by pgcrypto and integration with rem... Pgsodium's Features: those not provided by pgcrypto and integration with rem...
Pgsodium's Features: those not provided by pgcrypto and integration with rem...
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Detecting Paraphrases in Marathi Language
Detecting Paraphrases in Marathi LanguageDetecting Paraphrases in Marathi Language
Detecting Paraphrases in Marathi Language
 

Viewers also liked

Search with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial SearchSearch with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial Searchlucenerevolution
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucidworks
 
Geospatial search with SOLR
Geospatial search with SOLRGeospatial search with SOLR
Geospatial search with SOLRNicolas Leroy
 
Geometry
GeometryGeometry
Geometrykayenta
 
OpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on SolrOpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on Solrlucenerevolution
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Termsguest2b18d
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 

Viewers also liked (19)

Lucene 4 spatial
Lucene 4 spatialLucene 4 spatial
Lucene 4 spatial
 
Search with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial SearchSearch with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial Search
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David Smiley
 
Geospatial search with SOLR
Geospatial search with SOLRGeospatial search with SOLR
Geospatial search with SOLR
 
Geometry
GeometryGeometry
Geometry
 
OpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on SolrOpenStreetMap Geocoder Based on Solr
OpenStreetMap Geocoder Based on Solr
 
Planar Geometry Terms
Planar Geometry TermsPlanar Geometry Terms
Planar Geometry Terms
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 

Similar to Lucene solr 4 spatial extended deep dive

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal updateDavid Smiley
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyLucidworks
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC MeetupDavid Smiley
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaMagellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaSpark Summit
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovypaulbowler
 
NGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integrationNGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integrationFIWARE
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018Matthew Groves
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for JavaJody Garnett
 
"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design Principles"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design PrinciplesSerhiy Oplakanets
 
Saving Money with Open Source GIS
Saving Money with Open Source GISSaving Money with Open Source GIS
Saving Money with Open Source GISbryanluman
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft Platform5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft PlatformAll Things Open
 
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...Matthew Groves
 
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 20185 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018Matthew Groves
 

Similar to Lucene solr 4 spatial extended deep dive (20)

2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update2014 11 lucene spatial temporal update
2014 11 lucene spatial temporal update
 
The Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David SmileyThe Latest in Spatial & Temporal Search: Presented by David Smiley
The Latest in Spatial & Temporal Search: Presented by David Smiley
 
2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup2016-01 Lucene Solr spatial in 2015, NYC Meetup
2016-01 Lucene Solr spatial in 2015, NYC Meetup
 
State of JTS 2017
State of JTS 2017State of JTS 2017
State of JTS 2017
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 
Spatial search with geohashes
Spatial search with geohashesSpatial search with geohashes
Spatial search with geohashes
 
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaMagellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovy
 
NGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integrationNGSI: Geoqueries & Carto integration
NGSI: Geoqueries & Carto integration
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018
 
Geospatial for Java
Geospatial for JavaGeospatial for Java
Geospatial for Java
 
"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design Principles"SOLID" Object Oriented Design Principles
"SOLID" Object Oriented Design Principles
 
Openstreetmap
OpenstreetmapOpenstreetmap
Openstreetmap
 
Saving Money with Open Source GIS
Saving Money with Open Source GISSaving Money with Open Source GIS
Saving Money with Open Source GIS
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft Platform5 Popular Choices for NoSQL on a Microsoft Platform
5 Popular Choices for NoSQL on a Microsoft Platform
 
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
 
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 20185 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Recently uploaded (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Lucene solr 4 spatial extended deep dive

  • 1. LUCENE/ SOLR 4 SPATIALDEEPDIVE DavidSmiley SoftwareSystemsEngineer,Lead
  • 2. © 2013 The MITRE Corporation. All rights reserved. LUCENE / SOLR 4 SPATIAL DEEP-DIVE 2013 Lucene Revolution Presented by David Smiley, MITRE
  • 3. About David Smiley • Working at MITRE, for 13 years • web development, Java, search • 3 Solr apps, 1 Endeca • Published 1st book on Solr; then 2nd edition (2009, 2011) • Apache Lucene / Solr committer/PMC member (2012) • Specializing on spatial • Presented at Lucene Revolution (2010) & Basis O.S. Search Conference (2011, 2012) • Taught Solr classes at MITRE (2010, 2011, 2012) • Solr search consultant within MITRE and its sponsors, and privately 3
  • 4. Agenda • Background, overview • Spatial4j • Lucene spatial • PrefixTree / Trie / Grid • Solr spatial • Demo • Interesting use-cases
  • 6. What is Spatial Search? Popular features: • Spatial filter query • Spatial distance sorting • Spatial distance relevancy (i.e. spatial query score) NOT “geocoding” – resolve “Boston” to its latitude and longitude Typical use-case: 1. Index a location for each Lucene document given a latitude & longitude 2. Then search for matching documents by a circle (point- radius) or bounding box 3. Then sort results by distance
  • 7. History of Spatial for Lucene & Solr • 2007: Local-Lucene • by Patric O’Leary (AOL) • 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0 • Local-Lucene graduates to an official Lucene contrib module • 2009-12: Spatial Search Plugin (SSP) for Solr • by Chris Male (JTeam -> Orange11, ElasticSearch) • 2010-10: SOLR-2155 a geohash prefix tree filter • by David Smiley (MITRE) • 2011-01: Lucene Spatial Playground (LSP) • by Ryan McKinley (Voyager GIS), David, and Chris • 2011-03: Solr 3.1 new spatial features • by Grant Ingersoll and Yonik Seeley (LucidWorks) • 2012-03: LSP -> Lucene 4 spatial module + Spatial4j + SSP • replaces former Lucene spatial contrib module
  • 8. Lucene Spatial Committers • David Smiley • Works for MITRE • Boston area • Ryan McKinley • Works for Voyager GIS • Silicon Valley • Chris Male, • Formerly at Elastic Search • New Zealand
  • 9. Spatial decomposed • Spatial4j • Shapes, WKT, Distance calculations, JTS adapter • Lucene spatial • Strategies: PrefixTree (TermQuery & Recursive impl.), BBox, PointVector • Solr adapters • Misc: Spatial Solr Sandbox • LSE • JtsGeoStrategy • Spatial-Demo (web app)
  • 10. Lines of Code for Spatial Components Spatial4j 43% Lucene spatial 35% Solr adapters 6% Misc 16% Total: 4,781 Non-Comment Source Statements (without javadocs or tests) as of 2012-09
  • 11. CarrotSearch Labs’ RandomizedTesting • http://labs.carrotsearch.com/randomizedtesting.html • Provides plumbing for repeatable randomized JUnit tests • All the spatial test code uses it extensively Randomized testing more generally is a certain philosophy / approach on how to test • A typical hard-coded test will only catch some regressions • A randomized test will catch just about anything eventually, especially nasty edge cases • Although it’s hard to read / write / maintain these tests • Randomized testing helped find bugs related to… • Computing the bounding box of a circle • Computing the relationship of a circle to a rectangle that has all 4 of its corners inside it
  • 13. Spatial4j: It’s all about the shapes https://github.com/spatial4j/spatial4j (spatial4j.com redirect) • Shapes • A “Shape” abstraction with multiple implementations • Geodetic (sphere) & Cartesian/2D implementations • Computes intersection relationship with other shapes • Also… • Distance and area math utilities, Geohash utilities • Parsing Well Known Text (WKT) formatted shapes • ASL licensed project independent of Apache on GitHub • Requires JTS (LGPL licensed) for polygons & WKT* • JTS is “JTS Topology Suite” • * WKT parsing soon to be implemented directly by Spatial4j • Ported to .NET as Spatial4n and used by RavenDB • by Itamar Syn-Herskhko
  • 14. The case for Spatial4j’s existence • Just for shapes? How much code could there be? • You’d be surprised. Determining the relationship between a lat-lon rectangle and a geodetic circle (Within, Contains, Intersects, Disjoint) is non-trivial, and that’s just one shape. • Lots of non-trivial test code go with it. • Why isn’t it a part of Lucene spatial? • Parts of Spatial4j depend on JTS, an LGPL licensed library. The Lucene PMC voted not to introduce this compile-time dependency. • Spatial4j is independently useful. • Is this duplication of other open-source that could be used? • Spatial4j needs to be ASL licensed to be a dependency of Lucene. • Still… I haven’t found existing code that does what Spatial4j does. • Can’t only the JTS dependent parts be external to Lucene?
  • 15. The Shape interface (may become an abstract class in the next version) • interface Shape { • Point getCenter(); • Rectangle getBoundingBox(); • boolean hasArea(); • double getArea(); • SpatialRelation relate(Shape other); • Must support Point & Rectangle • enum SpatialRelation • DISJOINT, INTERSECTS, WITHIN, CONTAINS • Note: simpler set than the “DE-9IM” spatial standard • no “equals” or “touches”
  • 16. Spatial4j shapes Cartesian Cartesian with dateline wrap Geodetic Point Y Y Y Line & LineString (w/ buffer) Y N N Rectangle Y Y Y Circle Y N Y ShapeCollection Y Y Y JTS Geometry (incl. polygons) Y Y N • Cartesian (AKA Euclidean): a flat plane • Dateline wrap assumes the plane circles back on itself • Geodetic: a spherical mathematical model
  • 17. Well Known Text (WKT) (see Wikipedia) • A popular standard for representing shapes as strings • Requires JTS’s WKT Parser but Spatial4j has its own in-progress • Extensions are TBD for Rectangles and Circles • Limited support for EMPTY and “Z” and “M” dimensions (future) • Some Examples: • POINT (3, -2) • LINESTRING(30 10, 10 30, … • POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10)) • MULTIPOLYGON (((… • … • Deprecated (may move to Solr): • -90, -180 • -180 -90 180 90 • CIRCLE(4.56,1.23 d=0.071) • TBD / Pending: • ENVELOPE(-180,180,90,-90) • BOX2D(-180 -90, 180 90)
  • 18. Spatial4j code sample SpatialContext ctx = SpatialContext.GEO; Rectangle r = ctx.makeRectangle(-71, -70, 42, 43); Circle c = ctx.makeCircle(-72, 42, 1); SpatialRelation rel = r.relate(c); System.out.println(rel); rel.intersects();//boolean ctx = JtsSpatialContext.GEO; Shape s = ctx.readShape(“POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10))”); double distanceDegrees = ctx.getDistCalc().distance( ctx.makePoint(2, 2), ctx.makePoint(3, 3) ); Distances (including circle radius) are in “Degrees”, not radians or KM
  • 19. Spatial4j Future • Built-in WKT support (no JTS dependency) • Extensible to user-defined shapes • API improvements • Shape argument validation via WKT but not via ctx.makeShape(…) • ShapeCollection visitor design pattern • Refactor to remove need for isGeo() • LineString dateline & geodetic support • Projection / Datum support
  • 20. LUCENE SPATIAL Spatial index information retrieval
  • 21. Lucene 4 Spatial Module • There isn’t one best way to implement spatial indexing for all use-cases • Index just points, or other shapes too? Which? • Multiple shapes per field? • Query by Intersection? Contains? Within? Equals? Disjoint? … • Distance sorting? Query boost by distance? • Or more exotic shape relevancy like overlap percentage? • Tradeoff shape precision for speed? • Multiple SpatialStrategy implementations: • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy • PointVectorStrategy • BBoxStrategy (currently in trunk, not 4x) • JtsGeoStrategy (in Spatial Solr Sandbox)
  • 22. Strategy: PointVector • Similar to Solr’s PointType / LatLonType • X & Y trie double fields; caching via FieldCache • Characteristics • Indexes points (only) • Single-valued field (no multi) • Query by rectangle or circle (only) • Circle uses FieldCache (requires memory) • Circle does bbox pre-filter for performance • Relations: Intersects, Within (only) • Exact precision for x & y coordinates and query shape • Distance sort • Uses FieldCache (requires memory)
  • 23. Strategy: BBox • Implemented with 4 doubles & 1 boolean • Ported from ESRI GeoPortal (Open Source) • Characteristics: • Indexes rectangles (only) • Single-valued field (no multi) • Query by rectangle (only) • Supports all relations: Intersects, Within, Contains, … • Distance sort from box center • Uses FieldCache (requires memory) • Area overlap sorting • Sort results by percentage overlap between query and indexed boxes • Uses FieldCache (requires memory) • Note: FieldCache needs are somewhat high
  • 24. Strategy: JtsGeoStrategy • Stores a JTS geometry in Lucene 4’s DocValues • Stores WKB (WKT in binary format) • Full vector geometry is retained for search • DocValues is mostly a better FieldCache • Faster loading into memory • Can be disk resident or memory • Multi-valued • Characteristics: • Indexes any shape, including Multi… varieties • Query by any shape • Uses DocValues (memory use optional) • Supports all relations: intersect, within, contains, … • Could easily also support JTS’s exotic DE-9IM based relations • Exact precision to the vector geometry • No sorting • Experimental / immature status More of a proof-of-concept for now
  • 26. Strategy: RecursivePrefixTree • Grid / Tile / Trie / Prefix- Tree based • With recursive decent algorithms • Or TermQueryPrefixTree alternative • Choose Geohash (geo only) or Quad tree • The most mature strategy to date • Highly tested • The current evolution of SOLR-2155
  • 27. Strategy: RecursivePrefixTree • Characteristics: • Indexes all shapes • Variable precision of shape edges • Highly precise shapes other than Point won’t scale • LineString possibly not precise enough for your needs • Multi-valued field support • Query by any shape • Variable precision for query shape • Highest precision usually scales • All Relations: Intersects, Within, Contains, Disjoint • Distance sort (w/ multi-value support) • Warning: immature, won’t scale • Uses significant amounts of memory • Fast scalable spatial filtering; no caches needed new in Lucene 4.3 How many search / NoSQL systems have these capabilities?
  • 28. Geohashes • What is a Geohash? • A lat/lon geocode system • Has a hierarchical spatial structure • Gradual precision degradation • In the public domain http://en.wikipedia.org/wiki/Geohash • Example: (Boston) DRT2Y
  • 35. Geohash Grids DRT2Y Internal coordinates of an odd length geohash… …and an even length geohash DRT2
  • 36. Demo • Spatial Solr Playground • Demo KML grid generation from geometries • A sample point with quad tree indexes to these tokens: • A, AD, ADB, ADBA • A sample circle with quad tree indexes to these tokens: • A, AB, ABA, ABAB+, ABAC+, ABAD+, ABB, ABBA+, ABBB+, ABBC+, ABBD+, ABC, ABCA+, ABCB+, ABCC+, ABCD+, ABD+, AD, ADA, ADAA+, ADAB+, ADAC+, ADAD+, ADB+, ADC, ADCA+, ADCB+, ADCD+, ADD, ADDA+, ADDB+, ADDC+, ADDD+, B, BA, BAA, BAAC+, BAAD+, BAC, BACA+, BACB+, BACC+, BACD+, BC, BCA, BCAA+, BCAB+, BCAC+, BCC, BCCA+, BCCC+, C, CB, CBB, CBBA+ • Tokens with a ‘+’ are actually indexed with and without the ‘+’
  • 37. PrefixTreeStrategy Architecture Shape calc rect relationship SpatialPrefixTree & Cell byte string to/from Cell (rect) PrefixTreeStrategy index & search algorithms Lucene TermsEnum IntersectsPrefixTreeFilter ContainsPrefixTreeFilter WithinPrefixTreeFilter
  • 38. Lucene Spatial example code ctx = SpatialContext.GEO; strategy = new RecursivePrefixTreeStrategy( new GeohashPrefixTree(ctx,11), “myGeoField”); … // make indexWriter and a Document for (Field f : strategy.createIndexableFields(shape)) doc.add(f); indexWriter.addDocument(doc); … filter = strategy.makeFilter( new SpatialArgs(SpatialOperation.Intersects, ctx.makeCircle(-80.0, 33.0, DistanceUtils.dist2Degrees(200, DistanceUtils.EARTH_MEAN_RADIUS_KM)))); indexSearcher.search(userKeywordQuery, filter, 10); See SpatialExample.java in Lucene spatial tests for more
  • 39. Future • Possible de-emphasis of SpatialStrategy abstraction • A better options for distance sorting of PrefixTree strategies • Better PrefixTree encoding than both geohash & quad tree • Google Summer of Code 2013 -- TBD • Performance improvements to spatial Intersects RecursivePrefixTree Filter • Remove the need to double-index leaf-nodes (with and without ‘+’) • Exact geometry search by blending benefits of PrefixTree and JtsGeoStrategy • A Single-dimensional PrefixTree (for numeric range index)
  • 40. SOLR SPATIAL Adapters to Lucene 4 spatial
  • 41. Solr 3 Spatial: LatLonType & friends • Solr 3 was Solr’s first release to include spatial support • Not based on Lucene’s old spatial contrib module • Similar to TwoDoublesStrategy but more optimized • Single-valued only, fast distance sorting, can choose floats (save memory) • Fields: • LatLonType (Geodetic) • PointType (Cartesian) • Query parsers (spatial filters): • {!geofilt} (circle) “p” and “sfield” and “d” params • {!bbox} (bounding box of a circle) • Distance function: • geodist() and some esoteric others NOT completely superseded by Solr 4 spatial fields
  • 42. Solr 4 Spatial • See http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial 4 <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType” spatialContextFactory=” com.spatial4j.core.context.jts.JtsSpatialContextFactory” distErrPct="0.025” maxDistErr="0.000009” units="degrees” /> If you don’t need JTS (polygons) don’t set this Non-point shapes approximated to grid up to 2.5% of radius Max precision (1m) as measured in degrees
  • 43. Indexing • Point: Latitude, Longitude (i.e. Y, X) <field name="geo">43.17614, -90.57341</field> • Point: X Y <field name="geo">-90.57341 43.17614</field> • Rect: minX minY maxX maxY <field name="geo">-74.093 41.042 -69.347 44.558</field> • Circle: point then d=radius (in degrees) • will be deprecated <field name="geo">Circle(4.56,1.23 d=0.0710)</field> • WKT (preferred; it’s a standard) <field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field>
  • 44. Filter (search) • Using Solr 3’s bbox or geofilt query parsers • Distance radius ‘d’ is interpreted as kilometers, just like LatLonType • Limited to bbox and bbox of a circle fq={!geofilt}&sfield=geo&pt=45.15,-93.85&d=5 • Range query style (bounding box) • Handles dateline wrap fq=geo:[-90,-180 TO 90,180] • Field query style • Unique to Lucene 4 spatial; see SpatialArgsParser fq=geo:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))) distErrPct=0” • Predicates: Intersects, IsDisjointTo, IsWithin, Contains, … • distErrPct (& distErr) optional; override field type’s default SOLR-4242: A better spatial query parser
  • 45. Distance Sort & Relevancy Boost • geodist() is for Solr 3 LatLonType only sort=geodist(lltField,45.15,-93.85) desc • Solr 4 spatial queries can return the distance as the score q={!geofilt sfield=geo pt=45.15,-93.85 d=5 score=distance}&sort=score asc&fl=*,score • Without a filter sort=query($sortsq) asc&sortsq={!geofilt filter=false score=distance sfield=geo pt=45.15,-93.85 d=0} • Relevancy boost defType=edismax&boost=query($mysq)&mysq={!geofilt filter=false score=recipDistance pt=45.15,-98.85 d=5}
  • 46. Distance Faceting • sfield=geo (the field) • pt=45.15,-93.85 (point of reference) • Within 10km • facet.query={!geofilt d=10} • Within 50km • facet.query={!geofilt d=50} • Within 100km • facet.query={!geofilt d=100}
  • 47. Future • A more Solr-friendly spatial query parser SOLR-4242 • Retrofit geodist() to support the SpatialStrategies? • Expose more tunables • A grid based heat-map faceting component • Idea: a multi-strategy spatial field encompassing • A PrefixTree field for points • A PrefixTree field for non-points • A TwoDoubles field for good distance sorting / relevancy • Knows whether its single vs. multi-valued • A FieldType for multi-value numeric ranges
  • 48. DEMO
  • 50. 1. Geohash each point to multiple lengths and index each length into its own field • geohash_1:D, geohash_2:DR, geohash_3:DRT, geohash_4:DRT2 2. Search with a rectangle (bbox) filter, and… 3. Facet on the geohash field with the desired resolution • facet.field=geohash_4 &facet.limit=10000 • Lots of tuning / customization options • Projected / quad tree • facet.prefix may help Heatmap / Grid faceting
  • 51. Plotting many points on a map • Why not ask Solr for rows=1000 ? • It’s slow • If variable-points per doc then could yield be 1 distinct point or 1M • Instead facet on a geohash with facet.limit=1000 • Fast • Guaranteed <= 1000 points • But might need lots of memory • Or result-grouping on a geohash But do you really want to plot 1000+ points on a map?
  • 52. Filter by indexed distance constraints • Imagine a dating site where both potential parties have a maximum distance they’re willing to travel • Q: For the current user, who is not “too far” for you but is also not “too far” for them? • A: Index each user’s location as a point in one field and as a circle in another. Query by the current user’s circle to the indexed point field as well as the current user’s point to the indexed circle field.
  • 53. Multi-valued durations • What if your documents needed a variable number of time (or other numerical value) durations • This approach won’t work: <field name=“start” type=“tdate” multiValued=“true”/> <field name=“end” type=“tdate” multiValued=“true”/> • Solr (without Solr 4 spatial fields) can’t do it! • You need to think differently to solve this… http://wiki.apache.org/solr/SpatialForTimeDurations • Example use-cases • Searching for hotel-room vacancies • Searching for movie show-times • (next slides) Each document is a person with a variable number of “shifts” that they are working…
  • 54. … model durations as points
  • 55. … queries become rectangles
  • 56. … some config & search details • Configuration <fieldType name="days_of_year” class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" units="degrees" worldBounds="0 0 365 365" distErrPct="0" maxDistErr="1"/> • Sample search: Find shifts that have any overlap with 19th day to 23rd daysOfYear:Intersects(0 18.5 23.5 365) • Caveat: Won’t scale to the full precision of a java Long (timestamp)
  • 57. Thank you! • References • Lucene 4 spatial javadocs • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/ • Spatial4j at GitHub • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect) • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com • Solr • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 • Spatial Solr Sandbox • https://github.com/ryantxu/spatial-solr-sandbox • Contact me: • David Smiley dsmiley@mitre.org dsmiley@apache.org