SlideShare a Scribd company logo
1 of 39
The Black Art of GeocodingFinding that elusive lat/lon John Fagan, Microsoft
The Black Art of GeocodingFinding that elusive lat/lon John Fagan Program Manager Microsoft Corporation @johnbfagan
We been making maps for 1000’s of years
Well known and established standards/principles
Lots of experience in building software to create bitmaps from vector and raster data
Data availability & Simple data model
Mapping easy to scale
...and so is routing 1000’s years experience in wayfinding Over 50 years experience in routing algorithms Dijkstra's shortest path algorithm (1959)
Data availability & Simple data model
Routing, easy to scale
Geocoding not so easy 20 years experience 10 years of global Geocoding 5 years exposing geocoding to the mass consumer No standard algorithms Very few databases purpose built (maybe GNAF) Very hard to scale
Geocoding is fundamental Cant get a map without a geocode Cant get a route without a geocode Cant view your data without a geocode 80% of all information contains a geographic element.
It used to be easier
Now its hard
User expectations change with unstructured input 67 hill veiw road, s61 2bn in the 1850's 1.5 hours from Nice exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok. 10 mile radius from se20 7ua how long would it take me to walk around cancun how to get to m13 gb from g83 9le by car do bearded dragons bite?
But ......Geocoding NOT about Search
52.19157,-1.70415
The reason it's called 'I'm Feeling Lucky,' is of course that's a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you're going to try that with one go," tried to explain Sergey Brin.
Why is it hard (2 reasons)
Parsing: Hard to understand unstructured input
Finding Stratford-upon-Avon stratford stratford upon avon Stratford upon haven StratfordUponAvon Stratford-Upon-Avon stratford on avon stratford-on-avon stratford 0n avon stratford - upon-avon stratford on avaon stratfordaponavon stratford upon aavon stratfordupponavon
Finding Stratford-upon-Avon
Parsing In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. http://en.wikipedia.org/wiki/Parsing
Old way of Parsing – Rules based A rules based approach (mainly done with regular expressions)
Probabilistic approach  Machine learned Requires you to “train” the engine Requires truth sets of training data http://en.wikipedia.org/wiki/Hidden_Markov_model
Probabilistic approach: Hidden Markov Model  input --> 165 fleet street london EC4A 2DY  output -->  	address {  		street number : 165 		street : fleet street  		city : london 		postcode : EC4A 2DY  		}
Multimap stats
Parsing has its limitations Parsing failures Multimap/Bing Maps (standrewsscotland) Google (uk near Boston, MA, USA) All fail - House number plus postcode (165, EC4A 2DY)
Parsing using a Spatial Engine http://research.microsoft.com/en-us/people/josephj/acm_gis_2007_robust_location_search.pdf
Why is it hard (Data)
Hard to match input with reference database
[OSM-talk] Baghdad maps I am informed that any road may have up to 4 names (which may be the same or different):  The pre-Saddam name  The Saddam-era name.  The "public" name - What the people who live there call it.  The "Official" name - What the new Government calls it.  This situation is further complicated by language and social issues: Language  The roads are names in Arabic. There is no fixed translation between the Arabic and Latin alphabets.  Social Issues:  1) Sunnis tend to use the Saddam-era names  Shia tend to rename streets and won't acknowledge Saddam-era names.  Ethnic cleansing is changing the neighbourhoods and hence the names.  Names (such as 14th July Bridge) will change later.  My translator's opinion is that street names are going to take at least 2-3 years to settle down. http://lists.openstreetmap.org/pipermail/talk/2007-February/011273.html
Don't throw away your data Multimap have always kept old postcodes 10% of Multimap’s postcode database is of “dead” postcodes This might not work for routing and mapping, but very valuable for Geocoding
EC4A 1HE – Postcode of vintage 2002
Lash data and enrich Stratford-upon-Avon
Future = Real time Geocoding?
Summary Mapping and Routing – FIXED Geocoding – Must Try Harder Parsing  Data
thanksjohn faganubergeo.com@johnbfagan

More Related Content

Similar to John Fagan - The Black Art of Geocoding (6)

The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
 
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
 
Visualizing Unstructured Text Documents using Trees and Maps: Analyzing Verba...
Visualizing Unstructured Text Documents using Trees and Maps: Analyzing Verba...Visualizing Unstructured Text Documents using Trees and Maps: Analyzing Verba...
Visualizing Unstructured Text Documents using Trees and Maps: Analyzing Verba...
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 

More from John Fagan

Working From Home - Distributed Teams
Working From Home - Distributed TeamsWorking From Home - Distributed Teams
Working From Home - Distributed Teams
John Fagan
 
GeoWeb Services #WIN or #FAIL
GeoWeb Services #WIN or #FAILGeoWeb Services #WIN or #FAIL
GeoWeb Services #WIN or #FAIL
John Fagan
 

More from John Fagan (8)

Building Agile & AI startups - Basic tips for Product Managers
Building Agile & AI startups - Basic tips for Product Managers Building Agile & AI startups - Basic tips for Product Managers
Building Agile & AI startups - Basic tips for Product Managers
 
Beyond the Farebox - Mobility-as-a-Platform
Beyond the Farebox - Mobility-as-a-PlatformBeyond the Farebox - Mobility-as-a-Platform
Beyond the Farebox - Mobility-as-a-Platform
 
Orchestrated Mobility - Changing the way we move (Barclays Ai Frenzy)
Orchestrated Mobility - Changing the way we move (Barclays Ai Frenzy)Orchestrated Mobility - Changing the way we move (Barclays Ai Frenzy)
Orchestrated Mobility - Changing the way we move (Barclays Ai Frenzy)
 
SyncNorwich 5 years
SyncNorwich 5 yearsSyncNorwich 5 years
SyncNorwich 5 years
 
#mapbots can win, where #chatbots fail
#mapbots can win, where #chatbots fail#mapbots can win, where #chatbots fail
#mapbots can win, where #chatbots fail
 
Working From Home - Distributed Teams
Working From Home - Distributed TeamsWorking From Home - Distributed Teams
Working From Home - Distributed Teams
 
GeoWeb Services #WIN or #FAIL
GeoWeb Services #WIN or #FAILGeoWeb Services #WIN or #FAIL
GeoWeb Services #WIN or #FAIL
 
Where2.0Now - Finding the heat in Thematic Maps
Where2.0Now - Finding the heat in Thematic MapsWhere2.0Now - Finding the heat in Thematic Maps
Where2.0Now - Finding the heat in Thematic Maps
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 

John Fagan - The Black Art of Geocoding

  • 1. The Black Art of GeocodingFinding that elusive lat/lon John Fagan, Microsoft
  • 2. The Black Art of GeocodingFinding that elusive lat/lon John Fagan Program Manager Microsoft Corporation @johnbfagan
  • 3. We been making maps for 1000’s of years
  • 4. Well known and established standards/principles
  • 5. Lots of experience in building software to create bitmaps from vector and raster data
  • 6. Data availability & Simple data model
  • 8. ...and so is routing 1000’s years experience in wayfinding Over 50 years experience in routing algorithms Dijkstra's shortest path algorithm (1959)
  • 9. Data availability & Simple data model
  • 11. Geocoding not so easy 20 years experience 10 years of global Geocoding 5 years exposing geocoding to the mass consumer No standard algorithms Very few databases purpose built (maybe GNAF) Very hard to scale
  • 12. Geocoding is fundamental Cant get a map without a geocode Cant get a route without a geocode Cant view your data without a geocode 80% of all information contains a geographic element.
  • 13. It used to be easier
  • 15. User expectations change with unstructured input 67 hill veiw road, s61 2bn in the 1850's 1.5 hours from Nice exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok. 10 mile radius from se20 7ua how long would it take me to walk around cancun how to get to m13 gb from g83 9le by car do bearded dragons bite?
  • 16. But ......Geocoding NOT about Search
  • 18. The reason it's called 'I'm Feeling Lucky,' is of course that's a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you're going to try that with one go," tried to explain Sergey Brin.
  • 19. Why is it hard (2 reasons)
  • 20. Parsing: Hard to understand unstructured input
  • 21. Finding Stratford-upon-Avon stratford stratford upon avon Stratford upon haven StratfordUponAvon Stratford-Upon-Avon stratford on avon stratford-on-avon stratford 0n avon stratford - upon-avon stratford on avaon stratfordaponavon stratford upon aavon stratfordupponavon
  • 23. Parsing In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. http://en.wikipedia.org/wiki/Parsing
  • 24. Old way of Parsing – Rules based A rules based approach (mainly done with regular expressions)
  • 25. Probabilistic approach Machine learned Requires you to “train” the engine Requires truth sets of training data http://en.wikipedia.org/wiki/Hidden_Markov_model
  • 26. Probabilistic approach: Hidden Markov Model input --> 165 fleet street london EC4A 2DY output --> address { street number : 165 street : fleet street city : london postcode : EC4A 2DY }
  • 28. Parsing has its limitations Parsing failures Multimap/Bing Maps (standrewsscotland) Google (uk near Boston, MA, USA) All fail - House number plus postcode (165, EC4A 2DY)
  • 29. Parsing using a Spatial Engine http://research.microsoft.com/en-us/people/josephj/acm_gis_2007_robust_location_search.pdf
  • 30. Why is it hard (Data)
  • 31. Hard to match input with reference database
  • 32. [OSM-talk] Baghdad maps I am informed that any road may have up to 4 names (which may be the same or different): The pre-Saddam name The Saddam-era name. The "public" name - What the people who live there call it. The "Official" name - What the new Government calls it. This situation is further complicated by language and social issues: Language The roads are names in Arabic. There is no fixed translation between the Arabic and Latin alphabets. Social Issues: 1) Sunnis tend to use the Saddam-era names Shia tend to rename streets and won't acknowledge Saddam-era names. Ethnic cleansing is changing the neighbourhoods and hence the names. Names (such as 14th July Bridge) will change later. My translator's opinion is that street names are going to take at least 2-3 years to settle down. http://lists.openstreetmap.org/pipermail/talk/2007-February/011273.html
  • 33. Don't throw away your data Multimap have always kept old postcodes 10% of Multimap’s postcode database is of “dead” postcodes This might not work for routing and mapping, but very valuable for Geocoding
  • 34. EC4A 1HE – Postcode of vintage 2002
  • 35. Lash data and enrich Stratford-upon-Avon
  • 36.
  • 37. Future = Real time Geocoding?
  • 38. Summary Mapping and Routing – FIXED Geocoding – Must Try Harder Parsing Data