Presentation at Where 2.0 2008 where we discuss our rational for building a NoSQL data store after reaching limitations with the spatial SQL solution that were available at the time.
Semantic Web Investigation within Big Data Context
Where 2.0 NoSQL Presentation 2008 - GeoIQ
1. From Data Chaos to Actionable Intelligence: How the Convergence of the GeoWeb and Semantic Web is Revolutionizing the Way We Process Information Where 2.0 2008 Sean Gorman FortiusOne Inc.
23. Node “ Y ” is connected to Node “ X ” by the link “User Action” User adds structured dataset “ X ” to their 3 rd Party URL string “ Y ” Structured Dataset “ X” Third Party URL “ Y”
24. Structured Dataset “ X” Third Party URL “ Y” Third Party URL “ Z” Data Set Degree X 2 Y 1 Z 1
25. Creates the Ability to Intelligently Serve Content User searches on a term and gets a result based on tags and full text query then weighted by degree and user ratings: The system can now recommend data URL “Y” and “Z” as data that may be useful context for dataset “X” even though they may have no tags in common The graph can be encapsulated in code and communicated to the outside world Structured Dataset “ X” Third Party URL “ Y” Third Party URL “ Z”
26. 1. Bring a new class of content/data to the Web 2. Intelligently mix that content with the rest of the Web 3. Enable the content to answer meaningful questions for users
27. If you would like to check out the Finder! Beta see us for a card or go here: http://finder.geocommons.com / If you would like to talk about federating geodata email me at: [email_address]
Notas do Editor
Our story started with trying to GeoHack while avoiding Jonny Law
Specifically we started off researching the geography of the web – what is the structure of the plumbing of the Internet – the autonomous system graph, the router graph, IP addresses. It was a great collaboration of colleagues at UCL, Berkley, NYU and GMU.
Our specialty was really big data sets like the router and AS graphs. We’d use statistical mechanics and graph theory to model and analyze the data then map those logical results to their physical realities.
This resulted in discovering little tidbits like how to take down the NYSE.
Resulting in the guys in dark suits showing up
Fortunately the folks at IQT – the intelligence communities venture fund – rescued us.
IQT has been part of several key technologies driving the GeoWeb. One of their early investments was metacarta the folks behind openlayers which many of us in the community benefit from. They were also the funding behind Keyhole – commonly known as Google Earth the iconic application of the GeoWeb. They continued to fund the leading edge with technologies like SketchUp. IQT and the government at large have been a big impetus and source of funding pushing GeoWeb innovation.
While we started researching the “Geography of the Web” it was “”Geography on the Web” that had captured the worlds attention. The creation of mirror worlds on the web with amazing satellite imagery and 3D visualizations
But…the majority of data populating these mirror worlds were small and largely descriptive text and photos.
Leveraging our love for big data sets we developed GeoCommons – a crowdsourced repository of large structured datasets with quantitative capabilities
Yes last year on this stage we launched GeoCommons – so what happened – where did it go in one year.
We got lots of people contributing data – so much data we killed our database
This chart shows why. The problem with big datasets is they fill up databases very rapidly. 95% of the data in GeoCommons had over 100 features and over half had more that 5000 features and several had more than 100,000 features.
The problem was we were trying store and query heterogeneous data at scale. To do this we had to normalize the data, causing the tables to get huge. Then we had to optimize to not only manage the table size, but do that for all queries types and functions. It was a Sisyphean cycle – everything helped but nothing solved the problem – the rock rolled to the bottom of the hill each time. So, you can keep fighting the battle….
Leveraging a lightweight object database construct we were able to rapidly store, mine, access, and translate data – we were able to fit well over a billion features into 15 gigs of storage. The best illustration is to see a demo of the platform in action.
Introducing Finder!
We are going to leverage the GeoCommons platform to next build Maker – a cartographically empowered map creation tool and Atlas – a collaborative application to allow users to tell multimedia stories around maps.
We are not the only folks bringing geo-content to the web. There are many clever people doing innovative work to bring more rich data to the masses, but how can we begin to interconnect it all.
Can we federate our data
We’ve been having some early discussions on how we can create a network of data repositories to form a graph of content. Data stored securely and linked together – where the cloud and the data are owned by everyone.
Allowing a user to grab a URL from one source via web services and map it with a structured dataset via object federation from another source. By consciously mapping these two datasets together the user has created semantic meaning between them – going beyond just the syntax of tags. For instance a user may map crime rates from Every Block with housing prices from Zillow to solve a problem on where to buy a house. They could have no tags or syntax relating them but now they have a far more meaningful semantic relationship.
These relations form a graph relating content semantically together across the web. Whether you implement with RDF, ATOM or any other practical approach you have a highly efficient computational construct to relate content.
This allows you to make relevant recommendations to the user to solve meaningful problems.
Our goal is to deliver technology that accomplished three big objectives. I believe that an ecosystem of companies striving for similar objective will be what constitute another evolution of the Web.