1. NCES STATS-DC 2012 – July 13, 2012
Presented by Tai Phan & Adrienne Allegretti
NCES, Blue Raster
GEOCODING OUR NATION’S SCHOOLS
2. PRESENTATION OUTLINE
Introduction
Overview of Geocoding
Data Preparation
Selecting a Geocode Solution
Live Demonstration
Overview of SDDS Updates
Summary/Q & A
3. WHAT IS GEOCODING?
The process of finding associated geographic
coordinates (latitude and longitude) from other
geographic data, such as street addresses,
or zip codes (postal codes).
8. WHY DOES NCES GEOCODE SCHOOL
ADDRESSES?
To support spatial analysis and provide
geographic context to a schools location
Geocoded schools can be overlaid with other
geographic layers such as:
School Districts
Locale Codes
Counties
Congressional Districts
And more…
10. HOW TO PREPARE DATA
Invest in your address data quality and check for
errors before you begin:
Spelling
Numeric address ranges
Missing information
11. IMPLICATIONS OF GEOCODING ERRORS
Accuracy of geocoding dictates quality of
research
Schools geocoded incorrectly can mean
lost funding
Unable to respond to an emergency
appropriately
12. HOW TO PREPARE DATA
Format it in a style that the geocoder can handle
Use the locational address, not the PO Box or
Administrative address of the school
13. HOW TO PREPARE DATA
Bad: FMLY CHLD CTR 1411 LNCLNWY W, MISHAWAKA, IN,
46544
14. HOW TO PREPARE DATA
Good: 1411 LINCOLN WAY WEST, MISHAWAKA, IN, 46544
15. SELECTING A GEOCODE SOLUTION
Things to consider:
Budget
Number of Records
Infrastructure
Number of Individuals Needing Access
Frequency of Geocoding
Sensitivity of Data
16. 1ST APPROACH: LOCAL/INTERNAL
Datacenter
Economy of and/or
Scale Cloud
Services
Millions of Sensitive
addresses Data
In House
17. GEOCODING ON THE DESKTOP
Prerequisites
ArcGIS Desktop Basic
Prepared Address Table
Reference Data (NAVTEQ, TOM TOM, TIGER)
Address Locator (Composite)
18. ON THE DESKTOP – COMPOSITE LOCATOR
Multiple locators used in a cascading fashion
Address • Best
Match Match
Street
• Decent
Centroid Match
Match
Zip Code • Acceptable
Match Match
22. ON THE DESKTOP SUMMARY
Great way to go if software and reference
data is already in-house
Can geocode more than 350,000 addresses
per hour
Data maintenance and support
No custom development required
23. 2ND APPROACH: GEOCODING WEBSITES
Excel and
Points on a
Web
Map
Browser
Hundreds Publicly
of address available
at a time data
Geocoding
Websites
29. 3RD APPROACH: WEB APIS
Publicly
Accessible
Data
Hundreds to
Custom
Thousands of
Development
addresses at
Required
a time
Web
APIs
30. GEOCODING APIS AVAILABLE FOR
CUSTOM DEVELOPMENT
ArcGIS Online
Yahoo
Bing
Google
MapQuest
Geonames
Be sure to read the terms of service and know the
limitations. For example, when using Google, you
must display your results on a Google map. Google
limits requests to 2,500 addresses, etc.
31. GEOCODE RESULTS
The Hit Rate is the number of addresses that are
geocodable
The Match Scores tell you what level of accuracy
a particular address was geocoded.
- Country
- Region (state, province, prefecture, etc.)
- Sub-region (county, municipality, etc.)
- Town (city, village)
- Post code (zip code)
- Street
- Intersection
- Address
- Premise or Roof Top
41. WHAT ARE LOCALE CODES?
Read more: http://nces.ed.gov/ccd/rural_locales.asp
42.
43.
44.
45.
46.
47.
48.
49.
50. SCHOOL BOUNDARY PROJECT
Project Background
350largest districts already collected
1000+ collected by the end of the year
That’s over 50% of students
52. WHAT’S NEW IN SDDS MAPVIEWER
Map Viewer Standard
Census 2010
ACS 5yr Estimate 2006-
2010
Public School Boundaries
Middle
Elementary
High
Promise Neighborhood
Schools
Map Viewer Mobile
Migration to GEOCLOUD
http://nces.ed.gov/surveys/sdds/index.aspx
54. FOR MORE INFORMATION:
Tai Phan Adrienne Allegretti
Tai.Phan@ed.gov aallegretti@blueraster.com
202-502-7431 703-842-0171
www.blueraster.com
blog.blueraster.com
Notas do Editor
My name is Adrienne Allegretti and I’m the GIS Project manager at blue RasterContract with NCES to build and maintain web map applications and tools such as the SDDS map viewerI’m here today to talk about Geocoding and NCES’s initiatives in this area.
The process of finding associated geographic coordinates (latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes).Addresses are great for the postal service but not for doing statistical analysis.
This method makes use of data from a street GIS where the street network is already mapped within the geographic coordinate space. Each street segment is attributed with address ranges (e.g. house numbers from one segment to the next). Geocoding takes an address, matches it to a street and specific segment (such as a block). Geocoding then interpolates the position of the address, within the range along the segment.
Other Techniques include locating a point at the centroid of a land parcel, using a GPS to map a location, or using a street intersection or midpoint along a street centerline.
THEN: Geocoding millions of addresses could take Days or Weeks
Now it just takes hoursIn addition, the technology for parsing address has greatly increased and systems have thus become much smarter about delineating the address from the city and state
To support spatial analysis and provide geographic context to a schools location in particular to Census geographies. Geocoded schools can be overlaid with other geographic layers such as:School DistrictsLocalesCountiesCongressional DistrictsAnd more…
Where schools are matters - so that we can better our children’s educationso that there is adequate response in the event of an emergency For instance, NCES collaborates with Homeland Security in determining where daytime populations are in order to aid disaster relief and response.They also work with EPA and FEMA for similar reasons. Dept of Ed is looked at as the authoritative source for Federal gov in terms of knowing where schools are. So, knowing where the administrative office of a school is fine but at the end of the day we need to know where the students really are.
The accuracy of geocoded data has a large bearing on the quality of research that can be done using that data. A school geocoded to the wrong location could have implications for that school - such as a loss of funding opportunities.It could also have an impact on the planning of appropriate emergency response.
IN addition to investing in your data quality you want to make sure you format the data in a style the goeoder can handle and be sure that you are using locational addresses and not the PO Box or the Administrative Address of a school.
Here is an example of a Bad address that when entered into ArcGIS online we only get the centroid of the zip code because the street address can’t be found.
If we remove the place name and add in the vowels to the road address, we get an address that the geocoder can find.
So now you’re ready to geocode, how do you go about choosing an appropriate geocoding solution.Well, you need to take into consideration a few thingsDo you have a small budget or a large budget with the ability to purchase your own reference dataDo you have a single record, hundreds or thousands of records, or do you have millionsAre you just bring an Excel file to a web browser or do you have an IT department with enterprise servers and cloud deployment. Or, do you have ArcGIS for Desktop or have developers that can create your own webservice with your data4) Are you the only one that needs to access the geocoder or is there whole team of individuals?5) And lastly, how publicly accessible data is your data? You might need to run the geocoding behind a firewall if it’s related to Financial Loan informationNo matter your situation, there is an approach for you.
So, if your budget allows to make the initial investment in the data and you expect to be geocoding millions of addresses and If your address lists are private and need to be secure behind a firewall - You should consider setting up an in-house solution. For millions of addresses, you will want to geocode via desktop (ArcMap) than over the web which will be much faster – but there is some prerequisites you need for this.
ArcGIS desktop software is the best option and offers geocoding capabilities without custom development. Geocoding on the desktop also assumes you have prepared your address data, but there are tools to help with that including a “Standardize Addresses Tool”Reference Data is the crucial part. These are the actual road network and address information used to determine the lat, lng ‘s of addresses. We are using Street Map Premium data which gives you pre-built networks optimized for geocoding and routing. If you cannot access this data, you can build your own networks off of Census TIGER/LINE data, but this can be a significant effort especially for national level geocoding.With reference data in hand, you now must build your own address locator.
One of the most powerful built-in features of ArcGIS Desktop is the use of composite locators. A composite locator is made from multiple locators used in a cascading fashions. While you may not be able to get the best match possible for every address, it does ensure that some match will occur provide the address has information for the last locator (zip code). This can also help in identifying problem areas in data.
Here we see the ArcGIS Desktop User Interface with Geocoding toolbar enabled. I have selected a composite locator, and on the bottom of the screen we see the address table I want to code.
Here we see the configuration screen for the “geocode addresses tool”. We select the name of the field in our table which corresponds to the street, city, state, and zip of the addresses. We have additional options to tweak spelling sensitivity, the minimum score of an address against reference data to signify a “match” and what data fields will be added to the output table.We generally just go with the default options. Tweaking these would be more for the advanced user.
Here is what the results look like, which provide the % matched and the ability to make edits to your data and retry if needed.When do this with millions of records, you can’t possibly sit here and interactively match.So you have to determine your threshold and what data you’re trying to match against. Over time, investing in our data quality will make the unmatched addresses much smaller.
If you’re doing this often, it may be great to get your own reference dataBut you will need to maintain that dataSupport – good to be able to call in someplace and get helpBenefit is of not having to do custom script and development – accessible to non-programmer typesBut not everyone is geocoding millions of records and has the budget for this type of arrangement – so what are their options
MillionsFastest to geocode on your desktop or your own server infrastruce of web service to meet your own needsThousandsWebservicesFor millions of addresses, faster to geocode via desktop (ArcMap) than over the webAddress lists private and need to be secure behind a firewall?Use ArcGIS Desktop, along with either StreetMap Premium, ArcGIS Data Appliance, or Address Coder, is the best option. Want to deploy a public-facing Web applications or manage small- to medium-sized databases in which address security is not a main concern?Use ArcGIS Online geocoding services, along with the World Street Map service, or other geocoding APIs like Google, Bing, Yahoo or Mapquest.
To do small batches and single addresses and if you don’t have the ability to do it over the deskotp, here are a few web Services that are available
Using ArcGIS online as an example of how it worksLimitations are 250 records at a time for the free service and you really just are given the visualization tools of seeing your points on the map – not a returned dataset with the addresses lats and longs. But here is demo regardless, so you can see how the single address and multiple address geocoding works in action in a public web service
Minimum of 250 w/o subscription
Want to deploy a public-facing Web applications or manage small- to medium-sized databases in which address security is not a main concern?Use ArcGIS Online geocoding services, along with the World Street Map service, or other geocoding APIs like Google, Bing, Yahoo or Mapquest.
ArcGIS OnlineYahooBingGoogleMapQuestGeonamesBe sure to read the terms of service and know the limitations. For example, when using Google, you must display your results on a Google map. Google limits requests to 2,500 addresses, etc. be sure to read and follow the terms of service be aware of the licensing agreements.Also, most don’t allow you to persist the lat/long either
The goal is to always get the best match which is the Premise or Roof Top of the address.
Using a composite locators as an example, we ran some tests to show how the level of accuracy can work.In the first example, the geocoder apparently could not find the address, street address, nor the street names or zips.Many reasons why the composite can fail:In both situations the zip code is wrongThis is your extra homework for investing in data quality
If your just aggregating data at the state level perhaps this is an ok match for you but if you’re trying to determine emergency shelters, well this isn’t going to do it for you
This is a little better than the zip centroid but probably only as useful
Interpolated – conventional geocoder as we discussed in the beginning
Available from Navteq and know the actual physical location. This is a somewhat recent advancement and mostly available in urban areas where businesses have invested more heavily in the data
Placefinders provide the location of places within a certain vicinity. Such as a search on Starbucks when zoomed in to Downtown DC in Google
Reverse Geocoding finds the address of a lat/long
we’re working on a prototype tool to help increase the ability of NCES to geocode schools and assign locale codes which I’d like to now demo
Zones developed by the Census Beurue for NCES of how urban or rural an area isIt matters where schools are because the Locale Codes can drive funding
The urban-centric locale code system classifies territory into four major types: city, suburban, town, and rural. Each type has three subcategories. For city and suburb, these are gradations of size – large, midsize, and small. Towns and rural areas are further distinguished by their distance from an urbanized area. They can be characterized as fringe, distant, or remote.
The School Attendance Boundary Project seeks to make school boundaries readily available and allow for linkages of demographic data of populations living within those boundaries.So far have collected 350 largest districts 1000+ collected by the end of the yearThat’s over 50% of students