SlideShare uma empresa Scribd logo
1 de 14
FME Data Transformation for
the Geographic Support
System Initiative
Jay E. Spurlin
Software Architect and Development Manager for the
GSS-I Feature Source Evaluation software system

                                                     April 8, 2013
U.S. Census Bureau
• The Census Bureau serves as the leading
  source of quality data about the nation's
  people and economy. We honor privacy,
  protect confidentiality, share our expertise
  globally, and conduct our work openly. We
  are guided on this mission by our strong and
  capable workforce, our readiness to innovate,
  and our abiding commitment to our
  customers.

                                              2
Geography Division
• The Geography Division plans, coordinates, and
  administers all geographic and cartographic activities
  needed to facilitate the Census Bureau's statistical
  programs throughout the US and its territories. We
  manage the Census Bureau's programs to
  continuously update features, boundaries and
  geographic entities in TIGER and the Master Address
  File (MAF). We also conduct research into geographic
  concepts, methods, and standards needed to facilitate
  the Census Bureau's data collection and dissemination
  programs.

                                                       3
GSS-I
• In support of the 2020 Decennial Census, the Census Bureau
  is evaluating what areas should be targeted for a traditional,
  on-the-ground address canvassing operation and in which
  areas a traditional canvassing operation is not necessary.
• The task the Census Bureau is undertaking is determining
  how to decide which areas should be considered for targeting
   – GEO has evaluated the MAF/TIGER database and assigned
     quality indicators to each of the census tracts
   – A Targeted Address Canvassing strategy has been developed
     that contains an inventory of criteria for evaluation




                                                                   4
GSS-I
• The Geographic Partnership program is now underway.
    – GEO is receiving both address and spatial data from invited partners
        • This data is at the state, county, and local level.
        • The data is being evaluated and integrated with the MAF/TIGER database.
        • The next step is to determine what level of feedback we can give to the partners
          about their data.
• GEO is also working with statisticians on predictive modeling to help
  determine where to target.
• The combination of the evaluation of the current MAF/TIGER
  database, the partner data, and the predictive modeling will
  contribute to the recommendation on which areas of the country
  should be considered for targeting.


                                                                                             5
The Geographic Partnership Program
•   A partner provides a set of source files
•   The source files are moved inside the Census firewall via a secure web-exchange module
•   The content inventory of the files undergoes initial verification
•   The files are preserved, as supplied, for later reference
•   A more detailed content assessment is done, including verification the files meet the
    minimum guidelines for content and metadata
•   The files are prepared for automated processing, including re-projection and mapping to a
    standardized schema
•   A series of (mostly) automated checks is run, which provides metrics about the data in the
    files
•   An interactive review is conducted, in which the files and their associated metrics are
    reviewed and a decision is made how to capture any new data
•   Any data that are not useful for updating the MAF/TIGER database get removed from the
    files
•   Features or addresses are added or modified, using an automated conflate and review
    process – or – an interactive update process



                                                                                                 6
Feature Source Evaluation Software
•   A number of MAF/TIGER spatial layers will be extracted for the extent of the partner
    entity
•   An analyst will use the supplied data and metadata to map the provided source
    schema to a standardized schema, and the supplied road centerline file will be
    converted to an ArcSDE layer, re-projected, and the name and MTFCC mappings
    applied
•   The feature names in the source file will be standardized to the parsed, MAF/TIGER
    naming conventions
•   The standardized feature names will be checked to see if any contain illegal
    charactersor prohibited or generic names
•   A topological check will be run, to gauge the topological stability of the source file
•   A completeness / change detection check will be run to attempt to identify areas in
    the source file that contain features not found in MAF/TIGER
•   A comparison will be run between the universe of feature names in the source file
    and the universe of feature names found in MAF/TIGER within the extent of the entity
•   All intersections that meet the requirements for CE95 assessment will be identified


                                                                                         7
Previous FME Technology Architecture
• FME Workspaces were developed using FME Workbench 2012 on
  desktop workstations, running 32-bit Windows XP Service Pack 3
• FME Server 2012 (FME Engine only), on batch servers running
  Linux Redhat Enterprise 5 connected to a SAN (Storage Area
  Network)
                        Linux Batch Server

                          Cronacle job-queueing system


                               Perl and shell scripts

            MAF/TIGER       FME Server (command line
             (Oracle                                     Shapefiles on
                            invocation of FME Engines)
            Database)                                        SAN
                              Oracle Run-Time Client




                                                                         8
New FME Technology Architecture
•     FME Workspaces are developed using FME Workbench 2012 SP3 on
      desktop workstations, running 32-bit Windows XP Service Pack 3
•     FME Server 2012 SP3 (FME Server Console), on batch servers running
      Linux Redhat Enterprise 5
•     FME Server 2012 SP3, on Windows server, with SAN (Storage Area
      Network) disk(s) mounted via Samba
    Linux Batch Server
                                                      Windows Web Server
      Cronacle job-queueing system                                                        MAF/TIGER
                                      Shapefiles on           ArcGIS for Server            (Oracle
                                          SAN                                             Database)
           Perl and shell scripts
                                                         FME Server (full installation)
     FME Server Console (remote job
       submission to FME Server)                                                            ArcSDE
                                                           Oracle Run-Time Client
                                                                                          Geodatabase


                                                                                                  9
Cross-walking (Transmogrification)




                                     10
Topology Check
• The Topology Check workspace compiles a number of topology and
  tolerance based metrics:
   – Gaps – endpoints within 5 meters of any line segment
   – Overshoots – line segments extending less than 5 meters beyond an
     intersection
   – Tiny Features – features with a total length less than 5 meters
   – Floating Features – features or connected sets of features that are not
     connected to the rest of the road network
   – Exact Duplicates – features whose geometry and name are identical to
     another feature
   – Coincident – features whose geometry overlaps with another feature
   – Crossing – features that cross but do not intersect at a node
   – Multi-part – features that consist of multiple geometry parts
   – Cutbacks – features containing angles less than 25 degrees

                                                                           11
Completeness / Change Detection Check
• The MAF/TIGER road centerline features and the
  feature source file road centerline features will be
  compared using and FME workspace.
• The MAF/TIGER features will be Buffered to a
  distance of 15 meters, then “overlayed” with the
  source file features.
• Any source file feature parts that fall outside of the
  Buffer areas will be chained together, and the total
  length of difference (and of each part) will be
  reported as an evaluation metric.

                                                       12
CE95 Qualifying Intersection Identification

• Qualifying intersections must meet the
  following criteria:
  – Must consist of three roads (a “T” intersection)
    or four roads (an “X” intersection)
  – Must consist of only secondary roads or local
    roads
  – Must meet at 90 or 180 degree angles, with a
    15 degree plus/minus tolerance

                                                   13
Thank You!


 Questions?

 For more information:
   Jay E. Spurlin
      jay.e.spurlin@census.gov
   U.S. Census Bureau

   http://www.census.gov/geo/www/gss/

Mais conteúdo relacionado

Destaque

Destaque (6)

How to Process Real-Time Data with FME
How to Process Real-Time Data with FMEHow to Process Real-Time Data with FME
How to Process Real-Time Data with FME
 
Leveraging Autodesk Products with FME: AutoCAD to GIS is Only the Beginning
Leveraging Autodesk Products with FME: AutoCAD to GIS is Only the BeginningLeveraging Autodesk Products with FME: AutoCAD to GIS is Only the Beginning
Leveraging Autodesk Products with FME: AutoCAD to GIS is Only the Beginning
 
How to Exchange Data between CAD and GIS
How to Exchange Data between CAD and GISHow to Exchange Data between CAD and GIS
How to Exchange Data between CAD and GIS
 
Deep Dive into FME Desktop 2017
Deep Dive into FME Desktop 2017Deep Dive into FME Desktop 2017
Deep Dive into FME Desktop 2017
 
Task modeling: Understanding what people want and how to design for them.
Task modeling: Understanding what people want and how to design for them.Task modeling: Understanding what people want and how to design for them.
Task modeling: Understanding what people want and how to design for them.
 
BIM Workflows: How to Build from CAD & GIS for Infrastructure
BIM Workflows: How to Build from CAD & GIS for InfrastructureBIM Workflows: How to Build from CAD & GIS for Infrastructure
BIM Workflows: How to Build from CAD & GIS for Infrastructure
 

Semelhante a FME Data Transformation for the Geographic Support System Initiative

ODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data Management
ODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data ManagementODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data Management
ODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data Management
Francisco Amores
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Safe Software
 

Semelhante a FME Data Transformation for the Geographic Support System Initiative (20)

Using FME to Convert TIGER Spatial Data From Oracle Spatial To ESRI Shapefiles
Using FME to Convert TIGER Spatial Data From Oracle Spatial To ESRI ShapefilesUsing FME to Convert TIGER Spatial Data From Oracle Spatial To ESRI Shapefiles
Using FME to Convert TIGER Spatial Data From Oracle Spatial To ESRI Shapefiles
 
ODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data Management
ODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data ManagementODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data Management
ODTUG KSCOPE 2017 - Black Belt Techniques for FDMEE and Cloud Data Management
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & Esri
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
 
Sumo Logic QuickStart Webinar Sep 2016
Sumo Logic QuickStart Webinar Sep 2016Sumo Logic QuickStart Webinar Sep 2016
Sumo Logic QuickStart Webinar Sep 2016
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Hadoop
HadoopHadoop
Hadoop
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
Sumo Logic quickStart Webinar June 2016
Sumo Logic quickStart Webinar June 2016Sumo Logic quickStart Webinar June 2016
Sumo Logic quickStart Webinar June 2016
 
Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016
 
Sumo Logic QuickStart
Sumo Logic QuickStartSumo Logic QuickStart
Sumo Logic QuickStart
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
SmartMet Server OSGeo
SmartMet Server OSGeoSmartMet Server OSGeo
SmartMet Server OSGeo
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim Architect
 
Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?
 
Spark 1.0
Spark 1.0Spark 1.0
Spark 1.0
 

Mais de Safe Software

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Safe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Taking Off with FME: Elevating Airport Operations to New Heights
Taking Off with FME: Elevating Airport Operations to New HeightsTaking Off with FME: Elevating Airport Operations to New Heights
Taking Off with FME: Elevating Airport Operations to New Heights
Safe Software
 
Initiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
 

Mais de Safe Software (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action:  Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action:  Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
The Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data EcosystemThe Critical Role of Spatial Data in Today's Data Ecosystem
The Critical Role of Spatial Data in Today's Data Ecosystem
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI Technology
 
Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
New Year's Fireside Chat with Safe Software’s Founders
New Year's Fireside Chat with Safe Software’s FoundersNew Year's Fireside Chat with Safe Software’s Founders
New Year's Fireside Chat with Safe Software’s Founders
 
Taking Off with FME: Elevating Airport Operations to New Heights
Taking Off with FME: Elevating Airport Operations to New HeightsTaking Off with FME: Elevating Airport Operations to New Heights
Taking Off with FME: Elevating Airport Operations to New Heights
 
Initiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Mastering DevOps-Driven Data Integration with FME
Mastering DevOps-Driven Data Integration with FMEMastering DevOps-Driven Data Integration with FME
Mastering DevOps-Driven Data Integration with FME
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

FME Data Transformation for the Geographic Support System Initiative

  • 1. FME Data Transformation for the Geographic Support System Initiative Jay E. Spurlin Software Architect and Development Manager for the GSS-I Feature Source Evaluation software system April 8, 2013
  • 2. U.S. Census Bureau • The Census Bureau serves as the leading source of quality data about the nation's people and economy. We honor privacy, protect confidentiality, share our expertise globally, and conduct our work openly. We are guided on this mission by our strong and capable workforce, our readiness to innovate, and our abiding commitment to our customers. 2
  • 3. Geography Division • The Geography Division plans, coordinates, and administers all geographic and cartographic activities needed to facilitate the Census Bureau's statistical programs throughout the US and its territories. We manage the Census Bureau's programs to continuously update features, boundaries and geographic entities in TIGER and the Master Address File (MAF). We also conduct research into geographic concepts, methods, and standards needed to facilitate the Census Bureau's data collection and dissemination programs. 3
  • 4. GSS-I • In support of the 2020 Decennial Census, the Census Bureau is evaluating what areas should be targeted for a traditional, on-the-ground address canvassing operation and in which areas a traditional canvassing operation is not necessary. • The task the Census Bureau is undertaking is determining how to decide which areas should be considered for targeting – GEO has evaluated the MAF/TIGER database and assigned quality indicators to each of the census tracts – A Targeted Address Canvassing strategy has been developed that contains an inventory of criteria for evaluation 4
  • 5. GSS-I • The Geographic Partnership program is now underway. – GEO is receiving both address and spatial data from invited partners • This data is at the state, county, and local level. • The data is being evaluated and integrated with the MAF/TIGER database. • The next step is to determine what level of feedback we can give to the partners about their data. • GEO is also working with statisticians on predictive modeling to help determine where to target. • The combination of the evaluation of the current MAF/TIGER database, the partner data, and the predictive modeling will contribute to the recommendation on which areas of the country should be considered for targeting. 5
  • 6. The Geographic Partnership Program • A partner provides a set of source files • The source files are moved inside the Census firewall via a secure web-exchange module • The content inventory of the files undergoes initial verification • The files are preserved, as supplied, for later reference • A more detailed content assessment is done, including verification the files meet the minimum guidelines for content and metadata • The files are prepared for automated processing, including re-projection and mapping to a standardized schema • A series of (mostly) automated checks is run, which provides metrics about the data in the files • An interactive review is conducted, in which the files and their associated metrics are reviewed and a decision is made how to capture any new data • Any data that are not useful for updating the MAF/TIGER database get removed from the files • Features or addresses are added or modified, using an automated conflate and review process – or – an interactive update process 6
  • 7. Feature Source Evaluation Software • A number of MAF/TIGER spatial layers will be extracted for the extent of the partner entity • An analyst will use the supplied data and metadata to map the provided source schema to a standardized schema, and the supplied road centerline file will be converted to an ArcSDE layer, re-projected, and the name and MTFCC mappings applied • The feature names in the source file will be standardized to the parsed, MAF/TIGER naming conventions • The standardized feature names will be checked to see if any contain illegal charactersor prohibited or generic names • A topological check will be run, to gauge the topological stability of the source file • A completeness / change detection check will be run to attempt to identify areas in the source file that contain features not found in MAF/TIGER • A comparison will be run between the universe of feature names in the source file and the universe of feature names found in MAF/TIGER within the extent of the entity • All intersections that meet the requirements for CE95 assessment will be identified 7
  • 8. Previous FME Technology Architecture • FME Workspaces were developed using FME Workbench 2012 on desktop workstations, running 32-bit Windows XP Service Pack 3 • FME Server 2012 (FME Engine only), on batch servers running Linux Redhat Enterprise 5 connected to a SAN (Storage Area Network) Linux Batch Server Cronacle job-queueing system Perl and shell scripts MAF/TIGER FME Server (command line (Oracle Shapefiles on invocation of FME Engines) Database) SAN Oracle Run-Time Client 8
  • 9. New FME Technology Architecture • FME Workspaces are developed using FME Workbench 2012 SP3 on desktop workstations, running 32-bit Windows XP Service Pack 3 • FME Server 2012 SP3 (FME Server Console), on batch servers running Linux Redhat Enterprise 5 • FME Server 2012 SP3, on Windows server, with SAN (Storage Area Network) disk(s) mounted via Samba Linux Batch Server Windows Web Server Cronacle job-queueing system MAF/TIGER Shapefiles on ArcGIS for Server (Oracle SAN Database) Perl and shell scripts FME Server (full installation) FME Server Console (remote job submission to FME Server) ArcSDE Oracle Run-Time Client Geodatabase 9
  • 11. Topology Check • The Topology Check workspace compiles a number of topology and tolerance based metrics: – Gaps – endpoints within 5 meters of any line segment – Overshoots – line segments extending less than 5 meters beyond an intersection – Tiny Features – features with a total length less than 5 meters – Floating Features – features or connected sets of features that are not connected to the rest of the road network – Exact Duplicates – features whose geometry and name are identical to another feature – Coincident – features whose geometry overlaps with another feature – Crossing – features that cross but do not intersect at a node – Multi-part – features that consist of multiple geometry parts – Cutbacks – features containing angles less than 25 degrees 11
  • 12. Completeness / Change Detection Check • The MAF/TIGER road centerline features and the feature source file road centerline features will be compared using and FME workspace. • The MAF/TIGER features will be Buffered to a distance of 15 meters, then “overlayed” with the source file features. • Any source file feature parts that fall outside of the Buffer areas will be chained together, and the total length of difference (and of each part) will be reported as an evaluation metric. 12
  • 13. CE95 Qualifying Intersection Identification • Qualifying intersections must meet the following criteria: – Must consist of three roads (a “T” intersection) or four roads (an “X” intersection) – Must consist of only secondary roads or local roads – Must meet at 90 or 180 degree angles, with a 15 degree plus/minus tolerance 13
  • 14. Thank You!  Questions?  For more information:  Jay E. Spurlin  jay.e.spurlin@census.gov  U.S. Census Bureau  http://www.census.gov/geo/www/gss/

Notas do Editor

  1. I work in the Geography Division – or GEO, as we refer to it. We manage MAF/TIGER (Topologically Integrated Geographic Encoding and Reference), which isa geospatial database system. The data is stored in Oracle Spatial Topology Manager format, and is used in support of various censuses and surveys of the Census Bureau.
  2. This is the basic set of steps through which a set of partner-supplied source files proceeds. Currently, this is a highly manual process and most of the processing is done on shapefiles using ArcGIS for Desktop.A partner provides a set of source files – this could be through a Regional Office contact, Community TIGER, or via a direct upload.The source files are moved inside the Census firewall via a secure web-exchange module.The content inventory of the files undergoes initial verification, to make sure someone has not accidentally supplied their laundry list.The files are preserved, as supplied, for later reference. This provides a re-start point, if it is ever necessary – as well as a reference against which future submissions could be compared to determine change over time.A more detailed content assessment is done, including verification the files meet the minimum guidelines for content and metadata.The files are prepared for automated processing, including re-projection and mapping to a standardized schema. The feature names are standardized to fit the parsed, MAF/TIGER naming convention, and metadata is used to derive the MAF/TIGER Feature Classification Code (or MTFCC) for each record.A series of (mostly) automated checks are run, which provide metrics about the data in the files. For addresses, this includes a range of geocoding checks and comparisons for the addresses and for the address point locations, if they were provided. For the spatial features, I’ll talk more about these checks in a moment.An interactive review is conducted, in which the files and their associated metrics are reviewed and an assessment is made as to how many new features or addresses have been supplied as well as how many attribute or shape updates. Based on this review, a decision is made about how to capture any new data – whether the data can continue through an automated update process or should be handled through an interactive update process.If the automated process is appropriate, then any data that are not useful for updating the MAF/TIGER database get removed from the files.Features or addresses are added and/or modified, using the method chosen during the interactive review - either an automated conflate and review process – or – an interactive update process.
  3. For the purposes of this discussion, we will focus on the Feature Source Evaluation software – in contrast to the Address Source Evaluation software. There are two separate, dedicated software systems for the evaluation of spatial features and addresses, though the architecture of the GSS-I is integrated to include both. The business model, hardware and software architecture, technology architecture, and security models have been integrated; it is really only the application architectures that have been separated out – and that only because there are established, separate areas of development expertise for spatial features, geographic entities, and addresses.The list of functionality on this slide indicates the first set of functions targeted for production release at the end of March 2013. Other checks have been proposed, and will likely be added to the software at a future date.Basically, each of the pieces of functionality listed corresponds to a module in the Feature Source Evaluation software system.A number of MAF/TIGER spatial layers will be extracted for the extent of the partner entity. These will include the road centerline layer, a number of geographic entity boundaries for reference, and the topological edge layer with the primary feature name for each edge. These layers will be extracted using automated FME workspaces, but they are fairly simple and obvious – they basically just read from Oracle Spatial using an SDO_FILTER SQL query, narrow the selection with an AreaOnAreaOverlayer or Clipper, and write to an ArcSDE geodatabase, so I don’t plan to show any examples of those.An analyst will use the supplied data and metadata to map the provided source schema to a standardized schema, and the supplied road centerline file will be converted to an ArcSDE layer, re-projected if necessary, and the name and MTFCC mappings applied. We will look at some example transformers in a few minutes.The feature names in the source file will be standardized to the parsed, MAF/TIGER naming conventions. In production, this will be a Java application, but for the current, manual procedures, an FME workspace is making an HTTPFetcher call to a published web service to do the feature name standardization, with a Decelerator to keep from overloading the web service.The standardized feature names will be checked to see if any contain illegal characters or prohibited or generic names; another Java application.A topological check will be run, to gauge the topological stability of the source file. This will be accomplished using a fairly complicated FME Workspace, which we will look at in detail shortly.A completeness / change detection check will be run to attempt to identify areas in the source file that contain features not found in MAF/TIGER. This will also be accomplished using an FME Workspace, which we will also look at in a moment.A comparison will be run between the universe of feature names in the source file and the universe of feature names found in the MAF/TIGER within the extent of the entity; this will be another Java application.All intersections that meet the requirements for conducting the CE95 accuracyassessment will be identified. The CE95 accuracy value is stated as a distance in meters, and denotes the circular standard error confidence – this is stating a 95% chance each coordinate falls within that distance from “ground truth”. This is the final FME workspace that we will be looking at today.
  4. Previously, our general technology architecture as it related to FME was very simple. FME Server was installed on our production Linux batch servers, and FME Engines were invoked via the command line from Perl scripts driven by Cronacle-based control systems.To keep things simple and better highlight the differences in architecture, the illustrations on this slide and the next depict only the production, batch configuration as it relates to FME.
  5. The technology architecture for FME was restructured for GSS-I to support products and processes that depend on ArcGIS on Windows. The Geography Division deployment of ArcGIS for Server is limited to Windows servers, because a Linux deployment was not seen as a viable option, for various reasons. This prompted us to research and develop a new technology architecture pattern for utilizing FME. The old pattern is still in use, as well, but this new pattern will be applied for the GSS-I and several other new software systems.
  6. One of the business functions for which we are utilizing FME is crosswalking (or transmogrification as some of our subject matter experts have taken to calling it). This mapping of each source file schema to a standard schema is configuredusing FME Workbench, and the data transformation is done using FME Server. Source schemas can – and do – vary widely. As you might imagine, the string manipulation and filter transformers come in extremely handy while doing these mappings.The example on the left shows the use of the AttributeValueMapper transformer to transform a set of road type identifiers into MAF/TIGER Feature Classification Codes.The example on the right shows the use of the StringSearcher transformer to find all instances of a street classification code that end with the digit ‘5’ – then set the MTFCC value to the code that designates the feature as a “Ramp”.
  7. The topology check workspace uses various transformers to collect metrics about certain types of features or feature interactions in the feature source file. Please note – not all of these are technically “wrong” topologically – they are only meant to be markers for identifying general topology or network stability and to predict MAF/TIGER update behaviour. The list of metrics might shrink or grow with time, as more partner files get processed and we learn more about what situations indicate data issues or cause problem during the update of the MAF/TIGER database.{show topology workspace and explain}The road centerlines are projected to the North American Lambert Conformal Conic projection, which preserves shape (and thereby distance).
  8. {show the change detection workspace and explain}
  9. Please check with LFBFor the CE95 accuracy assessment, qualifying intersections must be perpendicular ‘T’ or ‘X’ intersections (plus or minus 15 degrees) on secondary and/or local roads.{show CE95 QI workspace and explain}The road type selection is accomplished using a TestFilter.The names of the attributes that contain the MTFCC code (road type) and road name are passed in via published parameters.The road centerlines are projected to the North American Lambert Conformal Conic projection, which preserves shape (and thereby angle).The TopologyBuilder is used to find all of the intersection nodes.“T” and “X” intersections are identified by counting the number of rays emanating from each node star (the number of elements in the _node_angle list).The _fme_arc_angle values are exposed with an AttributeExposer, and a composite test in a Tester transformer checks the angle ranges.The nodes are projected back to NAD83.The requirements was to create at least 200 randomly selected nodes, with the goal of assessing the accuracy of 100 of them.A RandomNumberGenerator and Sorter are used to randomly sort and output all the nodes, allowing the user to weed through as many as necessary.The CoordinateExtractor is used to expose the coordinate x and y values as attributes.The StringConcatenator is used to string together all of the road names, which were preserved from the line segments during the topology build.