O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Lucene 4 spatial

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 16 Anúncio

Mais Conteúdo rRelacionado

Quem viu também gostou (20)

Semelhante a Lucene 4 spatial (20)

Anúncio

Mais recentes (20)

Lucene 4 spatial

  1. 1. LUCENE 4 SPATIAL 2012 Basis Technology Open Source Search Conference Presented by David Smiley, MITRE © 2012 The MITRE Corporation. All rights reserved.
  2. 2. About David Smiley • Working at MITRE, for 12 years • web development, Java, search • 3 Solr apps, 1 Endeca • Published 1st book on Solr; then 2nd edition (2009, 2011) • Apache Lucene / Solr committer (2012) • Specializing on spatial • Presented at Lucene Revolution (2010) & Basis O.S. Search Conference (2011) • Taught Solr classes at MITRE (2010, 2011, 2012) • Solr search consultant within MITRE and its sponsors, and privately via OpenSource Connections 2 © 2012 The MITRE Corporation. All rights reserved.
  3. 3. What is Spatial Search? Primary features: • Spatial filter query • Spatial distance sorting • Spatial distance relevancy (i.e. spatial query score) NOT “geocoding” – resolve “Boston” to its latitude and longitude Typical use-case: 1. Index a location for each Lucene document given a latitude & longitude 2. Then search for matching documents by a circle (point- radius) or bounding box 3. Then sort results by distance © 2012 The MITRE Corporation. All rights reserved.
  4. 4. History of Spatial for Lucene & Solr • 2007: Local-Lucene • by Patric O’Leary (AOL) • 2009-09: LL -> Lucene spatial contrib in Lucene 2.9.0 • Local-Lucene graduates to an official Lucene contrib module • 2009-12: Spatial Search Plugin (SSP) for Solr • by Chris Male (JTeam -> Orange11, ElasticSearch) • 2010-10: SOLR-2155 a geohash prefix tree filter • by David Smiley (MITRE) • 2011-01: Lucene Spatial Playground (LSP) • by Ryan McKinley (Voyager GIS), David, and Chris • 2011-03: Solr 3.1 new spatial features • by Grant Ingersoll and Yonik Seeley (LucidWorks) • 2012-03: LSP -> Lucene 4 spatial module + Spatial4j • replaces former Lucene spatial contrib module © 2012 The MITRE Corporation. All rights reserved.
  5. 5. Lucene Spatial Committers • David Smiley, MITRE • Bedford, MA • Chris Male, Elastic Search • New Zealand • Ryan McKinley, Voyager GIS • Oakland, CA © 2012 The MITRE Corporation. All rights reserved.
  6. 6. Breakdown of Spatial Components Misc 16% Solr adapters 6% Spatial4j 43% Lucene spatial 35% Total: 4,781 Non-Comment Source Statements (without javadocs or tests) © 2012 The MITRE Corporation. All rights reserved.
  7. 7. Spatial4j: It’s all about the shapes • Shapes • Types: Point, Rectangle, Circle, Polygon • Geospatial & Euclidean/2D implementations • Intersection: within, contains, intersects, disjoint • Distance and area math utilities • Input/Output serialization to Well Known Text (WKT) • Ex: POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10)) • ASL licensed project independent of Apache on GitHub • Requires JTS (3rd party LGPL) for polygon & WKT support • Ported to .NET as Spatial4n and used by RavenDB • by Itamar Syn-Herskhko © 2012 The MITRE Corporation. All rights reserved.
  8. 8. Lucene 4 Spatial Module • There isn’t one best way to implement spatial indexing for all use-cases • Index just points, or other shapes too? Which? • Multiple shapes per field? • Query by Intersection? Contains? Within? Equals? Disjoint? … • Distance sorting? Query boost by distance? • Or more exotic shape relevancy like overlap percentage? • Tradeoff shape precision for speed? • Multiple SpatialStrategy implementations: • RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy • PointVectorStrategy • BBoxStrategy (currently in trunk, not 4x) • JtsGeoStrategy (in Spatial4j/LSP) Names subject to change! © 2012 The MITRE Corporation. All rights reserved.
  9. 9. Strategy: PointVector • Similar to Solr’s PointType / LatLonType • X & Y trie double fields; caching via FieldCache • Characteristics • Indexes points (only) • Single-valued field (no multi) • Query by rectangle or circle (only) • Circle uses FieldCache (requires memory) • Circle does bbox pre-filter for performance • Relations: Intersects, Within (only) • Exact precision for x & y coordinates and query shape • Distance sort • Uses FieldCache (requires memory) © 2012 The MITRE Corporation. All rights reserved.
  10. 10. Strategy: RecursivePrefixTree Potential rename to • Grid / Tile / Trie / Prefix- GridFilterSpatialStrategy Tree based • With recursive decent algorithm • Or TermQueryPrefixTree alternative • Choose Geohash (geo only) or Quad tree • The most mature strategy to date • The current evolution of SOLR-2155 © 2012 The MITRE Corporation. All rights reserved.
  11. 11. Strategy: RecursivePrefixTree • Characteristics: • Indexes all shapes • Variable precision of shape edges • Highly precise shapes other than point won’t scale • LineString’s possibly not precise enough for your needs • Multi-valued field support • Query by any shape • Variable precision for query shape • Highest precision usually scales • Relations: Intersects (only) • Distance sort (w/ multi-value support) • Warning: immature, won’t scale • Uses significant amounts of memory • Fast spatial filtering; no cache needed © 2012 The MITRE Corporation. All rights reserved.
  12. 12. Strategy: BBox • Implemented with 4 doubles & 1 boolean • Ported from ESRI Open SourceGeoPortal • Characteristics: • Indexes rectangles (only) • Single-valued field (no multi) • Query by rectangle (only) • Supports all relations: Intersects, Within, Contains, … • Distance sort from box center • Uses FieldCache (requires memory) • Area overlap sorting • Sort results by percentage overlap between query and indexed boxes • Uses FieldCache (requires memory) • Note: FieldCache needs are somewhat high © 2012 The MITRE Corporation. All rights reserved.
  13. 13. Strategy: JtsGeoStrategy • Stores any JTS geometry in Lucene 4’s DocValues • Stores WKB -- WKT in binary format • Full vector geometry is retained for search • DocValues is mostly a better FieldCache • Faster loading into memory • Can be disk resident or memory • Characteristics: • Indexes any shape • Single valued field but can be MultiPoint, MultiPolygon, etc. • Query by any shape • Uses DocValues (memory use optional) • Supports all relations: intersect, within, contains, … • No sorting • Experimental / immature status © 2012 The MITRE Corporation. All rights reserved.
  14. 14. Solr Adapters • Configuration: <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory" distErrPct="0.025" maxDistErr="0.000009" /> <field name="geo" type="geo" indexed="true" stored="true” multiValued="true" /> • Adding data: <field name="geo">43.17614,-90.57341</field> <field name="geo">POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))</field> • Search Filter fq=geo:”Intersects(Circle(54.729696,-98.525391 d=10))” • Distance Sort sort=query($sortsq) asc&sortsq={! score=distance v=$sq}&sq=store:"Intersects(Circle(54.729696,-98.525391 d=10))" © 2012 The MITRE Corporation. All rights reserved.
  15. 15. Future Possibilities • Solr: • Filter out points in multi-valued field from search results not matching filter • Heatmap/grid faceting spatial summarization • Spatial-Temporal search • 3d (x,y,t) point shapes, and “track” shape queries • Support any query shape for all Strategies • PrefixTreeStrategy: • More efficient binary grid encoding; use Hilbert Curve order • Better multi-value point caches • Cache-less sort of top-N results • More query relations: Contains, Within • Configurable DocValues vs. FieldCache choice • Choose floats or configurable bits instead of forcing doubles • CircleStrategy © 2012 The MITRE Corporation. All rights reserved.
  16. 16. Thank you! • References • Lucene 4 spatial javadocs • https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/spatial/ • Spatial4j at GitHub • https://github.com/spatial4j/spatial4j ( spatial4j.com redirect) • http://spatial4j.16575.n6.nabble.com -- dev@lists.spatial4j.com • Solr • http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 • Contact me: • David Smiley dsmiley@mitre.org dsmiley@apache.org © 2012 The MITRE Corporation. All rights reserved.

Notas do Editor

  • Distance sorting &amp; relevancy wind up being one underlying technical requirement from the implementation
  • Misc: is a demo web application and a Lucene spatial strategy called “JtsSpatialStrategy” that cannot be included in Lucene spatial due to licensing.
  • Polygons support dateline wrap.Well tested.Key differentiators: ASL licensed, Geospatial support, Circles &amp; Polygons
  • In time there will be additional unique capabilities of different implementations.TermQueryPrefixTreeStrategy too.SpatialStrategies can be combined just as people index text different ways simultaneouslySee SpatialExample.java for some code samples
  • This is a simple strategy. I’d like to see it extended to support choosing floats or other more compact means of holding the coordinates in memory for a desired precision level.
  • Recommend pairing with TwoDoublesStrategy for single-value distance sort
  • Would like to see customizable to floats ore other compact

×