SlideShare uma empresa Scribd logo
1 de 37
WHY WE CHOSE MONGODB TO
 PUT BIG-DATA ‘ON THE MAP’
          JUNE 2012




           @nknize
        +Nicholas Knize
“The 3D UDOP allows near real time visibility of all SOUTHCOM Directorates information in one
location…this capability allows for unprecedented situational awareness and information sharing”
                                                                -Gen. Doug Frasier




  TST PRODUCTS
  ACCOMPLISHING THE IMPOSSIBLE
• Expose enterprise data in a geo-temporal user defined
  environment
• Provide a flexible and scalable spatial indexing framework
  for heterogeneous data
• Visualize spatially referenced data on 3D globe & 2D maps
• Manage real-time data feeds and mobile messaging
• View data over geo-rectified imagery with 3D terrain
• Support mission planning and simulation
• Provide real-time collaboration and sharing
  ISPATIAL OVERVIEW
  ACCOMPLISHING THE IMPOSSIBLE
Desired Data Store Characteristic for ‘Big Data’

• Horizontally scalable – Large volume / elastic

• Vertically scalable – Heterogeneous data types (“Data Stack”)

• Smartly Distributed – Reduce the distance bits must travel

• Fault Tolerant – Replication Strategy and Consistency model

• High Availability – Node recovery

• Fast – Reads or writes (can’t always have both)
   BIG DATA STORAGE CHARACTERISTICS
   ACCOMPLISHING THE IMPOSSIBLE
Subset of Evaluated NoSQL Options
           • Cassandra
                 –   Nice Bring Your Own Index (BYOI) design
                 –   … but Java, Java, Java… Memory management can be an issue
                 –   Adding new nodes can be a pain (Token Changes, nodetool)
                 –   Key-Value store…good for simple data models

           • Hbase
                 – Nice BigTable model
                 – Theory grounded heavily in C.A.P, inflexible trade-offs
                 – Complicated setup and maintenance

           • CouchDB
                 – Provides some GeoSpatial functionality (Currently being rewritten)
                 – HEAVILY dependent on Map-Reduce model (complicated design)
                 – Erlang based – poor multi-threaded heap management

NOSQL OPTIONS
ACCOMPLISHING THE IMPOSSIBLE
Why MongoDB for Thermopylae?
• Documents based on JSON – A GEOJSON match made in heaven!

• C++ - No Garbage Collection Overhead! Efficient memory management
  design reduces disk swapping and paging

• Disk storage is memory mapped, enabling fast swapping when necessary

• Built in auto-failover with replica sets and fast recovery with journaling

• Tunable Consistency – Consistency defined at application layer

• Schema Flexible – friendly properties of SQL enable easy port

• Provided initial spatial indexing support – Point based limited!
  WHY TST LIKES MONGODB
  ACCOMPLISHING THE IMPOSSIBLE
... The Spatial Indexer wasn’t quite right
• MongoDB (like nearly all relational DBs) uses a b-Tree
     – Data structure for storing sorted data in log time
     – Great for indexing numerical and text documents (1D attribute data)
     – Cannot store multi-dimension (>2D) data – NOT COMPLEX GEOMETRY
       FRIENDLY




 MONGODB SPATIAL INDEXER
 ACCOMPLISHING THE IMPOSSIBLE
How does MongoDB solve the dimensionality problem?

• Space Filling (Z) Curve
     – A continuous line that
       intersects every point in a
       two-dimensional plane


• Use Geohash to
  represent lat/lon values
     – Interleave the bits of a
       lat/long pair
     – Base32 encode the result




 DIMENSIONALITY REDUCTION
 ACCOMPLISHING THE IMPOSSIBLE
Issues with the Geohash b-Tree approach

• Neighbors aren’t so
  close!
     – Neighboring points on the
       Geoid may end up on
       opposite ends of the
       plane
     – Impacts search efficiency


• What about Geometry?
     – Doesn’t support > 2D
     – Mongo uses Multi-
       Location documents
       which really just indexes
       multiple points that link
       back to a single document

 GEOHASH BTREE ISSUES
 ACCOMPLISHING THE IMPOSSIBLE
Mongo Multi-location Document Clipping Issues
                         ($within search doesn’t always work w/ multi-location)
     Case 1: Success!                                  Case 3: Fail!




     Case 2: Success!                                  Case 4: Fail!




            Multi-Location Document (aka. Polygon)                          Search Polygon
MULTI-LOCATION CLIPPING
ACCOMPLISHING THE IMPOSSIBLE
Potential Solutions

 • Constrain the system to single point searches
       – Multi-dimension support will be exponentially complex (won’t scale)



 • Interpolate points along the edge of the shape
       – Multi-dimension support will be exponentially complex (won’t scale)



 • Customize the spatial indexer
       – Selected approach



SOLUTIONS TO GEOHASH PROBLEM
ACCOMPLISHING THE IMPOSSIBLE
Thermopylae Custom Tuned MongoDB               for Geo

TST Leverage’s Guttman’s 1984 Research in R/R* Trees
• R-Trees organize any-dimensional data by representing
  the data as a minimum bounding box.
• Each node bounds it’s children. A node can have many
  objects in it (max: m min: ceil(m/2) )
• Splits and merges optimized by minimizing overlaps
• The leaves point to the actual objects (stored on disk
  probably)
• Height balanced – search is always O(log n)


 CUSTOM TUNED SPATIAL INDEXER
 ACCOMPLISHING THE IMPOSSIBLE
Spatial Indexing at Scale with R-Trees

Spatial data represented as minimum bounding rectangles (2-dimension),
cubes (3-dimension), hexadecant (4-dimension)




Index represented as: <I, DiskLoc> where:

    I = (I0, I1, … In) : n = number of dimensions
    Each I is a set in the form of [min,max] describing MBR range along a dimension




  RTREE THEORY
  ACCOMPLISHING THE IMPOSSIBLE
mn o p
    R*-Tree Spatial Index Example
• Sample insertion result for 4th order
  tree
• Objectives:                              a b cd   e f            g h i   jk l

    1.   Minimize area
    2.   Minimize overlaps
    3.   Minimize margins
    4.   Maximize inner node utilization




   R*-TREE INDEX OBJECTIVES
   ACCOMPLISHING THE IMPOSSIBLE
Insert
 • Similar to insertion into B+-tree but may insert
   into any leaf; leaf splits in case capacity exceeded.
       – Which leaf to insert into?
       – How to split a node?




R*-TREE INSERT EXAMPLE
ACCOMPLISHING THE IMPOSSIBLE
Insert—Leaf Selection
• Follow a path from root to leaf.
• At each node move into subtree whose MBR area
  increases least with addition of new rectangle.
                                      n
              m




                                  o                 p
Insert—Leaf Selection
• Insert into m.



               m
Insert—Leaf Selection
• Insert into n.


                        n
Insert—Leaf Selection
• Insert into o.




                        o
Insert—Leaf Selection
• Insert into p.




                        p
mn o p



Query
  • Start at root                     a b cd           e f            g h i   jk l
  • Find all overlapping MBRs
  • Search subtrees recursively


                                                   n
                    m
                                                       a


                                               o                                     p
                                  a                                      x
     a
mn o p




Query
                                 a b cd           e f            g h i   jk l
• Search m.


                             e                n
         a           m
                             a
     a
 b
         a               g   a
                                          o                                     p
 c
             d   x
     x
R*-Tree Leverages B-Tree Base Data Structures (buckets)




 R*-TREE MONGODB IMPLEMENTATION
 ACCOMPLISHING THE IMPOSSIBLE
Geo-Sharding – (in work)
     Scalable Distributed R* Tree (SD-r*Tree)
“Balanced” binary tree, with
nodes distributed on a set of
servers:
• Each internal node has
  exactly two children

• Each leaf node stores a
  subset of the indexed
  dataset

• At each node, the height
  of the subtrees differ by
  at most one

• mongos “routing” node
  maintains binary tree

   GEO-SHARDING
   ACCOMPLISHING THE IMPOSSIBLE
SD-r*Tree Data Structure Illustration

                               a                                a                              a
                                                                    c                                   c

        d0                                                r1    b                    r1        b
   Data Node                    Spatial
                               Coverage


                                                                                                            c
                                                 b   d0        d1       c   b   d0        r2                d
                                                                                                    e



                                                                                e    d1            d2       d


           • di = Data Node (Chunk)
           • ri = Coverage Node
Leveraged work from Litwin, Mouza, Rigaux 2007


           SD-r*Tree DATA STRUCTURE
           ACCOMPLISHING THE IMPOSSIBLE
SD-r*Tree Structure Distribution

                             a
                                      c       GeoShard 2        GeoShard 3

                 r1          b
                                                 d1                d2


                                                mongos
                                          c
    b     d0            r2                d
                                                    r1     r2   GeoShard 1
                                  e


                                                                   d0
           e     d1              d2       d




SD-r*TREE STRUCTURE DISTRIBUTION
ACCOMPLISHING THE IMPOSSIBLE
GeoSharding Alternative – 3D / 4D Hilbert Scanning Order




  GEO-SHARDING ALTERNATIVE
  ACCOMPLISHING THE IMPOSSIBLE
Next Steps: Beyond 4-Dimensions - X-Tree
                                  (Berchtold, Keim, Kriegel – 1996)




                        Normal Internal Nodes                  Supernodes   Data Nodes


• Avoid MBR overlaps

• Avoid node splits (main cause for high overlap)

• Introduce new node structure: Supernodes – Large Directory nodes of variable size

 BEYOND 4-DIMENSIONS
 ACCOMPLISHING THE IMPOSSIBLE
X-Tree Performance Results
                               (Berchtold, Keim, Kriegel – 1996)




X-TREE PERFORMANCE
ACCOMPLISHING THE IMPOSSIBLE
T-Sciences Custom Tuned Spatial Indexer
• Optimized Spatial Search – Finds intersecting MBR and recurses into
  those nodes

• Optimized Spatial Inserts – Uses the Hilbert Value of MBR centroid to
  guide search
   – 28% reduction in number of nodes touched

• Optimize Deletes – Leverages R* split/merge approach for rebalancing
  tree when nodes become over/under-full

• Low maintenance – Leverages MongoDB’s automatic data compaction
  and partitioning

  CONCLUSION
  ACCOMPLISHING THE IMPOSSIBLE
Example Use Case – OSINT (Foursquare Data)

• Sample Foursquare
  data set mashed with
  Government Intel
  Data (poly reports)

• 100 million Geo
  Document test (3D
  points and polys)

• 4 server replica set

• ~350ms query
  response

• ~300%
  improvement over
  PostGIS

   EXAMPLE
   ACCOMPLISHING THE IMPOSSIBLE
Community Support

• Thermopylae contributes fixes to the codebase
      – http://github.com/mongodb


• TST will work with 10gen to fold into the baseline

• Active developer collaboration
      – IRC: #mongodb freenode.net




FIND US
ACCOMPLISHING THE IMPOSSIBLE
THANK YOU
                                 Questions?

                                   Nicholas Knize
                               nknize@t-sciences.com

THANK YOU
ACCOMPLISHING THE IMPOSSIBLE
Backup
Thermopylae Sciences & Technology – Who are we?
• Advanced technology w/ 160+ employees
• Core customers in national security, venues and
  events, military and police, and city planning
• Partnered with Google and imagery providers
• Long term relationship focused – TS/SCI Staff
        TST + 10gen + Google = Game-changing approach


ENTERPRISE
 PARTNER




WHO ARE THESE GUYS?
ACCOMPLISHING THE IMPOSSIBLE
Key Customers - Government
        • US Dept of State Bureau of Diplomatic Security
              – Build and support 30 TB Google Earth Globe with multi-
                terabytes of individual globes sent to embassies throughout
                the world. Integrated Google Earth and iSpatial framework.
        • US Army Intelligence Security Command
              – Provide expertise in managing technology integration –
                prime contractor providing operations, intelligence, and IT
                support worldwide. Partners include IBM, Lockheed Martin,
                Google, MIT, Carnegie Mellon. Integrated Google Earth and
                iSpatial framework.
        • US Southern Command
              – Coordinate Intelligence management systems spatial data
                collection, indexing, and distribution. Integrated Google
                Earth, iSpatial, and iHarvest.
              – Index large volume imagery and expose it for different
                services (Air Force, Navy, Army, Marines, Coast Guard)
GOVERNMENT CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE
Key Customers - Commercial




     Cleveland                 USGIF      Las Vegas     Baltimore
     Cavaliers                         Motor Speedway   Grand Prix


iSpatial framework serves thousands of mobile devices
COMMERCIAL CUSTOMERS
ACCOMPLISHING THE IMPOSSIBLE

Mais conteĂşdo relacionado

Mais procurados

Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!PGConf APAC
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationOri Reshef
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJoel Brewer
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
2023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 162023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 16JosĂŠ Lin
 
Firestore: The Basics
Firestore: The BasicsFirestore: The Basics
Firestore: The BasicsJielynn Diroy
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...InfluxData
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelNeo4j
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An IntroductionSmita Prasad
 
How queries work with sharding
How queries work with shardingHow queries work with sharding
How queries work with shardingMongoDB
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLChristian Antognini
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PGPgDay.Seoul
 

Mais procurados (20)

Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
2023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 162023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 16
 
Firestore: The Basics
Firestore: The BasicsFirestore: The Basics
Firestore: The Basics
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global Travel
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An Introduction
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
How queries work with sharding
How queries work with shardingHow queries work with sharding
How queries work with sharding
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQL
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
 

Semelhante a RTree Spatial Indexing with MongoDB - MongoDC

High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)Nicholas Knize, Ph.D., GISP
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesPere UrbĂłn-Bayes
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterACSG Section Montréal
 
Terrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable SystemTerrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable SystemElectronic Arts / DICE
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014Avinash Ramineni
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014clairvoyantllc
 
Conceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQL
Conceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQLConceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQL
Conceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQLMongoDB
 
Chap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r treesChap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r treesHendry Chen
 
Chap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r treesChap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r treesBoHengOrz
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Efficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesEfficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesYen-Yu Chen
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graphdarthvader42
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)Steven Francia
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011David Funaro
 
Gis and digital_map_fundamentals
Gis and digital_map_fundamentalsGis and digital_map_fundamentals
Gis and digital_map_fundamentalsSumant Diwakar
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 

Semelhante a RTree Spatial Indexing with MongoDB - MongoDC (20)

High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph Databases
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
 
Terrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable SystemTerrain in Battlefield 3: A Modern, Complete and Scalable System
Terrain in Battlefield 3: A Modern, Complete and Scalable System
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Conceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQL
Conceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQLConceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQL
Conceptos bĂĄsicos. Seminario web 1: IntroducciĂłn a NoSQL
 
Chap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r treesChap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r trees
 
Chap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r treesChap 2 – dynamic versions of r trees
Chap 2 – dynamic versions of r trees
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Efficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search EnginesEfficient Query Processing in Geographic Web Search Engines
Efficient Query Processing in Geographic Web Search Engines
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
Gis and digital_map_fundamentals
Gis and digital_map_fundamentalsGis and digital_map_fundamentals
Gis and digital_map_fundamentals
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Christopher Logan Kennedy
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

RTree Spatial Indexing with MongoDB - MongoDC

  • 1. WHY WE CHOSE MONGODB TO PUT BIG-DATA ‘ON THE MAP’ JUNE 2012 @nknize +Nicholas Knize
  • 2. “The 3D UDOP allows near real time visibility of all SOUTHCOM Directorates information in one location…this capability allows for unprecedented situational awareness and information sharing” -Gen. Doug Frasier TST PRODUCTS ACCOMPLISHING THE IMPOSSIBLE
  • 3. • Expose enterprise data in a geo-temporal user defined environment • Provide a flexible and scalable spatial indexing framework for heterogeneous data • Visualize spatially referenced data on 3D globe & 2D maps • Manage real-time data feeds and mobile messaging • View data over geo-rectified imagery with 3D terrain • Support mission planning and simulation • Provide real-time collaboration and sharing ISPATIAL OVERVIEW ACCOMPLISHING THE IMPOSSIBLE
  • 4. Desired Data Store Characteristic for ‘Big Data’ • Horizontally scalable – Large volume / elastic • Vertically scalable – Heterogeneous data types (“Data Stack”) • Smartly Distributed – Reduce the distance bits must travel • Fault Tolerant – Replication Strategy and Consistency model • High Availability – Node recovery • Fast – Reads or writes (can’t always have both) BIG DATA STORAGE CHARACTERISTICS ACCOMPLISHING THE IMPOSSIBLE
  • 5. Subset of Evaluated NoSQL Options • Cassandra – Nice Bring Your Own Index (BYOI) design – … but Java, Java, Java… Memory management can be an issue – Adding new nodes can be a pain (Token Changes, nodetool) – Key-Value store…good for simple data models • Hbase – Nice BigTable model – Theory grounded heavily in C.A.P, inflexible trade-offs – Complicated setup and maintenance • CouchDB – Provides some GeoSpatial functionality (Currently being rewritten) – HEAVILY dependent on Map-Reduce model (complicated design) – Erlang based – poor multi-threaded heap management NOSQL OPTIONS ACCOMPLISHING THE IMPOSSIBLE
  • 6. Why MongoDB for Thermopylae? • Documents based on JSON – A GEOJSON match made in heaven! • C++ - No Garbage Collection Overhead! Efficient memory management design reduces disk swapping and paging • Disk storage is memory mapped, enabling fast swapping when necessary • Built in auto-failover with replica sets and fast recovery with journaling • Tunable Consistency – Consistency defined at application layer • Schema Flexible – friendly properties of SQL enable easy port • Provided initial spatial indexing support – Point based limited! WHY TST LIKES MONGODB ACCOMPLISHING THE IMPOSSIBLE
  • 7. ... The Spatial Indexer wasn’t quite right • MongoDB (like nearly all relational DBs) uses a b-Tree – Data structure for storing sorted data in log time – Great for indexing numerical and text documents (1D attribute data) – Cannot store multi-dimension (>2D) data – NOT COMPLEX GEOMETRY FRIENDLY MONGODB SPATIAL INDEXER ACCOMPLISHING THE IMPOSSIBLE
  • 8. How does MongoDB solve the dimensionality problem? • Space Filling (Z) Curve – A continuous line that intersects every point in a two-dimensional plane • Use Geohash to represent lat/lon values – Interleave the bits of a lat/long pair – Base32 encode the result DIMENSIONALITY REDUCTION ACCOMPLISHING THE IMPOSSIBLE
  • 9. Issues with the Geohash b-Tree approach • Neighbors aren’t so close! – Neighboring points on the Geoid may end up on opposite ends of the plane – Impacts search efficiency • What about Geometry? – Doesn’t support > 2D – Mongo uses Multi- Location documents which really just indexes multiple points that link back to a single document GEOHASH BTREE ISSUES ACCOMPLISHING THE IMPOSSIBLE
  • 10. Mongo Multi-location Document Clipping Issues ($within search doesn’t always work w/ multi-location) Case 1: Success! Case 3: Fail! Case 2: Success! Case 4: Fail! Multi-Location Document (aka. Polygon) Search Polygon MULTI-LOCATION CLIPPING ACCOMPLISHING THE IMPOSSIBLE
  • 11. Potential Solutions • Constrain the system to single point searches – Multi-dimension support will be exponentially complex (won’t scale) • Interpolate points along the edge of the shape – Multi-dimension support will be exponentially complex (won’t scale) • Customize the spatial indexer – Selected approach SOLUTIONS TO GEOHASH PROBLEM ACCOMPLISHING THE IMPOSSIBLE
  • 12. Thermopylae Custom Tuned MongoDB for Geo TST Leverage’s Guttman’s 1984 Research in R/R* Trees • R-Trees organize any-dimensional data by representing the data as a minimum bounding box. • Each node bounds it’s children. A node can have many objects in it (max: m min: ceil(m/2) ) • Splits and merges optimized by minimizing overlaps • The leaves point to the actual objects (stored on disk probably) • Height balanced – search is always O(log n) CUSTOM TUNED SPATIAL INDEXER ACCOMPLISHING THE IMPOSSIBLE
  • 13. Spatial Indexing at Scale with R-Trees Spatial data represented as minimum bounding rectangles (2-dimension), cubes (3-dimension), hexadecant (4-dimension) Index represented as: <I, DiskLoc> where: I = (I0, I1, … In) : n = number of dimensions Each I is a set in the form of [min,max] describing MBR range along a dimension RTREE THEORY ACCOMPLISHING THE IMPOSSIBLE
  • 14. mn o p R*-Tree Spatial Index Example • Sample insertion result for 4th order tree • Objectives: a b cd e f g h i jk l 1. Minimize area 2. Minimize overlaps 3. Minimize margins 4. Maximize inner node utilization R*-TREE INDEX OBJECTIVES ACCOMPLISHING THE IMPOSSIBLE
  • 15. Insert • Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. – Which leaf to insert into? – How to split a node? R*-TREE INSERT EXAMPLE ACCOMPLISHING THE IMPOSSIBLE
  • 16. Insert—Leaf Selection • Follow a path from root to leaf. • At each node move into subtree whose MBR area increases least with addition of new rectangle. n m o p
  • 21. mn o p Query • Start at root a b cd e f g h i jk l • Find all overlapping MBRs • Search subtrees recursively n m a o p a x a
  • 22. mn o p Query a b cd e f g h i jk l • Search m. e n a m a a b a g a o p c d x x
  • 23. R*-Tree Leverages B-Tree Base Data Structures (buckets) R*-TREE MONGODB IMPLEMENTATION ACCOMPLISHING THE IMPOSSIBLE
  • 24. Geo-Sharding – (in work) Scalable Distributed R* Tree (SD-r*Tree) “Balanced” binary tree, with nodes distributed on a set of servers: • Each internal node has exactly two children • Each leaf node stores a subset of the indexed dataset • At each node, the height of the subtrees differ by at most one • mongos “routing” node maintains binary tree GEO-SHARDING ACCOMPLISHING THE IMPOSSIBLE
  • 25. SD-r*Tree Data Structure Illustration a a a c c d0 r1 b r1 b Data Node Spatial Coverage c b d0 d1 c b d0 r2 d e e d1 d2 d • di = Data Node (Chunk) • ri = Coverage Node Leveraged work from Litwin, Mouza, Rigaux 2007 SD-r*Tree DATA STRUCTURE ACCOMPLISHING THE IMPOSSIBLE
  • 26. SD-r*Tree Structure Distribution a c GeoShard 2 GeoShard 3 r1 b d1 d2 mongos c b d0 r2 d r1 r2 GeoShard 1 e d0 e d1 d2 d SD-r*TREE STRUCTURE DISTRIBUTION ACCOMPLISHING THE IMPOSSIBLE
  • 27. GeoSharding Alternative – 3D / 4D Hilbert Scanning Order GEO-SHARDING ALTERNATIVE ACCOMPLISHING THE IMPOSSIBLE
  • 28. Next Steps: Beyond 4-Dimensions - X-Tree (Berchtold, Keim, Kriegel – 1996) Normal Internal Nodes Supernodes Data Nodes • Avoid MBR overlaps • Avoid node splits (main cause for high overlap) • Introduce new node structure: Supernodes – Large Directory nodes of variable size BEYOND 4-DIMENSIONS ACCOMPLISHING THE IMPOSSIBLE
  • 29. X-Tree Performance Results (Berchtold, Keim, Kriegel – 1996) X-TREE PERFORMANCE ACCOMPLISHING THE IMPOSSIBLE
  • 30. T-Sciences Custom Tuned Spatial Indexer • Optimized Spatial Search – Finds intersecting MBR and recurses into those nodes • Optimized Spatial Inserts – Uses the Hilbert Value of MBR centroid to guide search – 28% reduction in number of nodes touched • Optimize Deletes – Leverages R* split/merge approach for rebalancing tree when nodes become over/under-full • Low maintenance – Leverages MongoDB’s automatic data compaction and partitioning CONCLUSION ACCOMPLISHING THE IMPOSSIBLE
  • 31. Example Use Case – OSINT (Foursquare Data) • Sample Foursquare data set mashed with Government Intel Data (poly reports) • 100 million Geo Document test (3D points and polys) • 4 server replica set • ~350ms query response • ~300% improvement over PostGIS EXAMPLE ACCOMPLISHING THE IMPOSSIBLE
  • 32. Community Support • Thermopylae contributes fixes to the codebase – http://github.com/mongodb • TST will work with 10gen to fold into the baseline • Active developer collaboration – IRC: #mongodb freenode.net FIND US ACCOMPLISHING THE IMPOSSIBLE
  • 33. THANK YOU Questions? Nicholas Knize nknize@t-sciences.com THANK YOU ACCOMPLISHING THE IMPOSSIBLE
  • 35. Thermopylae Sciences & Technology – Who are we? • Advanced technology w/ 160+ employees • Core customers in national security, venues and events, military and police, and city planning • Partnered with Google and imagery providers • Long term relationship focused – TS/SCI Staff TST + 10gen + Google = Game-changing approach ENTERPRISE PARTNER WHO ARE THESE GUYS? ACCOMPLISHING THE IMPOSSIBLE
  • 36. Key Customers - Government • US Dept of State Bureau of Diplomatic Security – Build and support 30 TB Google Earth Globe with multi- terabytes of individual globes sent to embassies throughout the world. Integrated Google Earth and iSpatial framework. • US Army Intelligence Security Command – Provide expertise in managing technology integration – prime contractor providing operations, intelligence, and IT support worldwide. Partners include IBM, Lockheed Martin, Google, MIT, Carnegie Mellon. Integrated Google Earth and iSpatial framework. • US Southern Command – Coordinate Intelligence management systems spatial data collection, indexing, and distribution. Integrated Google Earth, iSpatial, and iHarvest. – Index large volume imagery and expose it for different services (Air Force, Navy, Army, Marines, Coast Guard) GOVERNMENT CUSTOMERS ACCOMPLISHING THE IMPOSSIBLE
  • 37. Key Customers - Commercial Cleveland USGIF Las Vegas Baltimore Cavaliers Motor Speedway Grand Prix iSpatial framework serves thousands of mobile devices COMMERCIAL CUSTOMERS ACCOMPLISHING THE IMPOSSIBLE

Notas do Editor

  1. Screen shot of UDOP…blow-out of key features (sharing, presentation builder, etc)