SlideShare uma empresa Scribd logo
1 de 18
6/7/2011 U.S. Environmental Protection Agency 1 Conflation, Data Quality and MADness ESRI Developer MeetupJune 7th, 2011 USEPA Office of Environmental Information David G Smith PE PLS202-566-0797 Smith.DavidG@epa.gov Twitter:@DruidSmith
Metadata?? 6/7/2011 U.S. Environmental Protection Agency 2
FRS Overview Facility Registry System FRS is a data aggregator FRS performs integration, validation and QA across over 30 federal databases and over 50 state, territory and tribal databases FRS contains information on nearly 2.8 million facilities > 80% of facilities have lat/long information
FRS improves program facility data validity from 40—95% by selecting best contact and location information from multiple data sources  Allows EPA, public, academic, and investment communities to evaluate compliance with environmental regulations  Provides robust, complete view of facility information,  facilitating cross-media analyses: Community-based initiatives Environmental justice analyses NEPA assessments Emergency response Other mission needs (TMDL program, climate change analysis, etc.) 6/7/2011 U.S. Environmental Protection Agency 4 What FRS Does
FRS Features Provides a more complete, holistic, cross-media view of key facility information  through verification and  data management procedures Incorporates layers of quality control – the FRS record is checked for completeness, consistency, and validity and is owned by FRS Integrates information from program national systems, state master facility records, tribal partners, and other federal agencies Supported by a network of data stewards covering both geographic and  programmatic areas of expertise. Fully integrated with the Locational Data and the Integrated Error Correction Process (IECP) 5
FRS Features Provides essential support for applications that rely on integrated views of facilities GIS applications  (EnviroMapper, MyEnvironment) Public access applications (Envirofacts, Cleanups in My Community (CIMC) Enforcement  systems and applications  (IDEA, OTIS, ECHO, ICIS) Offers specialized services to applications in need of accurate facility information Emergency Response TRI-ME web DMR Loadings Tool Provides web services, enabling data exchanges with state partners on the Environmental Exchange Network 6
FRS Scope Major Programs Represented in FRS Air AFS		AQS CAMDBS	EGRID NEI		RBLC RFS (Ethanol) Water PCS		ICIS-NPDES SDWIS	CWNS Chemical Releases TRIS		RMP TSCA		SSTS  FRP  		BRAC Hazardous Waste ACRES	CERCLIS RCRAINFO	RADINFO Enforcement/Compliance ICIS		ECRM NCDB Schools NCES 	GNIS    BIA INDIAN SCHOOL Other LANDFILL http://www.epa.gov/enviro/html/frs_demo/new_crosswalks.html
FRS Data Model High Level Data Model Organization Industrial Classification Affiliation Individual Individual Supplemental Interest Mailing Address Alternative Name Facility/Site Geospatial  Environmental Interest
FRS Data Pipeline
QA Process
Integration? Air Permit  Coordinate Water Permit Coordinate Toxics Permit Coordinate Best Facility Coordinate?
FRS Processing Q/A Enhancement Data Collection Data Publishing ,[object Object]
The FRS geospatial database provides web services, database connections and spatial queries for a wide variety of web mapping applications, for example MyEnvironment, Cleanup In My CommunityIDEA/ECHO/OTIS and many others
For Title 40 regulated programs, CDX collects locational and parametric data for the program offices,  and facility data goes to FRS.
  Several program offices have their own systems that collect and manage locational and parametric data – Envirofacts pulls data from these, and FRS serves as the locational component for Envirofacts
FRS contains many data enhancement, lookup and validation services that aid and assist other CDX flows.
FRS receives locational data updates and edits from regional data stewards as needed.
Envirofacts pulls data from the program offices, taking in parametric data and sending  locational data to FRS.  FRS serves as the geospatial component of FRS,[object Object]

Mais conteúdo relacionado

Semelhante a Conflation, Data Quality and MADness (David Smith)

061206 Ua Huntsville Seminar
061206 Ua Huntsville Seminar061206 Ua Huntsville Seminar
061206 Ua Huntsville SeminarRudolf Husar
 
sers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality Scenariosers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality ScenarioRudolf Husar
 
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop TorontoRudolf Husar
 
2008-02-11: EPA DataFed Presentation
2008-02-11: EPA DataFed Presentation2008-02-11: EPA DataFed Presentation
2008-02-11: EPA DataFed PresentationRudolf Husar
 
060730 Igarss06 Denver Husar
060730 Igarss06 Denver Husar060730 Igarss06 Denver Husar
060730 Igarss06 Denver HusarRudolf Husar
 
2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrack2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrackRudolf Husar
 
070416 Egu Vienna Husar
070416 Egu Vienna Husar070416 Egu Vienna Husar
070416 Egu Vienna HusarRudolf Husar
 
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE DataRudolf Husar
 
061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1Rudolf Husar
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data3 Round Stones
 
Joining the Data Dots
Joining the Data DotsJoining the Data Dots
Joining the Data DotsAlex Coley
 
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open DataBernadette Hyland-Wood
 

Semelhante a Conflation, Data Quality and MADness (David Smith) (20)

061206 Ua Huntsville Seminar
061206 Ua Huntsville Seminar061206 Ua Huntsville Seminar
061206 Ua Huntsville Seminar
 
EcoInformatics FRS Presentation 20101206
EcoInformatics FRS Presentation 20101206EcoInformatics FRS Presentation 20101206
EcoInformatics FRS Presentation 20101206
 
sers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality Scenariosers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality Scenario
 
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
 
2008-02-11: EPA DataFed Presentation
2008-02-11: EPA DataFed Presentation2008-02-11: EPA DataFed Presentation
2008-02-11: EPA DataFed Presentation
 
Ws For Aqm
Ws For AqmWs For Aqm
Ws For Aqm
 
060730 Igarss06 Denver Husar
060730 Igarss06 Denver Husar060730 Igarss06 Denver Husar
060730 Igarss06 Denver Husar
 
Linked Data W3C 20110629
Linked Data W3C  20110629Linked Data W3C  20110629
Linked Data W3C 20110629
 
Ws Stuff
Ws StuffWs Stuff
Ws Stuff
 
2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrack2005-03-17 Air Quality Cluster TechTrack
2005-03-17 Air Quality Cluster TechTrack
 
DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016
 
070416 Egu Vienna Husar
070416 Egu Vienna Husar070416 Egu Vienna Husar
070416 Egu Vienna Husar
 
Seeds Poster2
Seeds Poster2Seeds Poster2
Seeds Poster2
 
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
 
061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1
 
Seeds Poster
Seeds PosterSeeds Poster
Seeds Poster
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data
 
Joining the Data Dots
Joining the Data DotsJoining the Data Dots
Joining the Data Dots
 
Ws For Aq
Ws For AqWs For Aq
Ws For Aq
 
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data
 

Mais de geeknixta

Wellbeing Toronto (Matthew McFarland)
Wellbeing Toronto (Matthew McFarland)Wellbeing Toronto (Matthew McFarland)
Wellbeing Toronto (Matthew McFarland)geeknixta
 
Geoprocessing in Web Time (Robert Cheetham)
Geoprocessing in Web Time (Robert Cheetham)Geoprocessing in Web Time (Robert Cheetham)
Geoprocessing in Web Time (Robert Cheetham)geeknixta
 
GIS Developments at the City of Philadelphia (Adam Conner)
GIS Developments at the City of Philadelphia (Adam Conner)GIS Developments at the City of Philadelphia (Adam Conner)
GIS Developments at the City of Philadelphia (Adam Conner)geeknixta
 
Spatial Data Collection on Mobile Devices (Holly Orr)
Spatial Data Collection on Mobile Devices (Holly Orr)Spatial Data Collection on Mobile Devices (Holly Orr)
Spatial Data Collection on Mobile Devices (Holly Orr)geeknixta
 
NYC Parks, a Mobile Computing Agency (Peter Carlo)
NYC Parks, a Mobile Computing Agency (Peter Carlo)NYC Parks, a Mobile Computing Agency (Peter Carlo)
NYC Parks, a Mobile Computing Agency (Peter Carlo)geeknixta
 
Five Myths About GIS in 2011 (Bill Dollins)
Five Myths About GIS in 2011 (Bill Dollins)Five Myths About GIS in 2011 (Bill Dollins)
Five Myths About GIS in 2011 (Bill Dollins)geeknixta
 
GIS Development, Past, Present and Future (Chris McClain)
GIS Development, Past, Present and Future (Chris McClain)GIS Development, Past, Present and Future (Chris McClain)
GIS Development, Past, Present and Future (Chris McClain)geeknixta
 
Searching for Geospatial Data (Mark Wimer)
Searching for Geospatial Data (Mark Wimer)Searching for Geospatial Data (Mark Wimer)
Searching for Geospatial Data (Mark Wimer)geeknixta
 
Mobile GIS in the Browser (by Adam Conner)
Mobile GIS in the Browser (by Adam Conner)Mobile GIS in the Browser (by Adam Conner)
Mobile GIS in the Browser (by Adam Conner)geeknixta
 

Mais de geeknixta (9)

Wellbeing Toronto (Matthew McFarland)
Wellbeing Toronto (Matthew McFarland)Wellbeing Toronto (Matthew McFarland)
Wellbeing Toronto (Matthew McFarland)
 
Geoprocessing in Web Time (Robert Cheetham)
Geoprocessing in Web Time (Robert Cheetham)Geoprocessing in Web Time (Robert Cheetham)
Geoprocessing in Web Time (Robert Cheetham)
 
GIS Developments at the City of Philadelphia (Adam Conner)
GIS Developments at the City of Philadelphia (Adam Conner)GIS Developments at the City of Philadelphia (Adam Conner)
GIS Developments at the City of Philadelphia (Adam Conner)
 
Spatial Data Collection on Mobile Devices (Holly Orr)
Spatial Data Collection on Mobile Devices (Holly Orr)Spatial Data Collection on Mobile Devices (Holly Orr)
Spatial Data Collection on Mobile Devices (Holly Orr)
 
NYC Parks, a Mobile Computing Agency (Peter Carlo)
NYC Parks, a Mobile Computing Agency (Peter Carlo)NYC Parks, a Mobile Computing Agency (Peter Carlo)
NYC Parks, a Mobile Computing Agency (Peter Carlo)
 
Five Myths About GIS in 2011 (Bill Dollins)
Five Myths About GIS in 2011 (Bill Dollins)Five Myths About GIS in 2011 (Bill Dollins)
Five Myths About GIS in 2011 (Bill Dollins)
 
GIS Development, Past, Present and Future (Chris McClain)
GIS Development, Past, Present and Future (Chris McClain)GIS Development, Past, Present and Future (Chris McClain)
GIS Development, Past, Present and Future (Chris McClain)
 
Searching for Geospatial Data (Mark Wimer)
Searching for Geospatial Data (Mark Wimer)Searching for Geospatial Data (Mark Wimer)
Searching for Geospatial Data (Mark Wimer)
 
Mobile GIS in the Browser (by Adam Conner)
Mobile GIS in the Browser (by Adam Conner)Mobile GIS in the Browser (by Adam Conner)
Mobile GIS in the Browser (by Adam Conner)
 

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Conflation, Data Quality and MADness (David Smith)

  • 1. 6/7/2011 U.S. Environmental Protection Agency 1 Conflation, Data Quality and MADness ESRI Developer MeetupJune 7th, 2011 USEPA Office of Environmental Information David G Smith PE PLS202-566-0797 Smith.DavidG@epa.gov Twitter:@DruidSmith
  • 2. Metadata?? 6/7/2011 U.S. Environmental Protection Agency 2
  • 3. FRS Overview Facility Registry System FRS is a data aggregator FRS performs integration, validation and QA across over 30 federal databases and over 50 state, territory and tribal databases FRS contains information on nearly 2.8 million facilities > 80% of facilities have lat/long information
  • 4. FRS improves program facility data validity from 40—95% by selecting best contact and location information from multiple data sources Allows EPA, public, academic, and investment communities to evaluate compliance with environmental regulations Provides robust, complete view of facility information, facilitating cross-media analyses: Community-based initiatives Environmental justice analyses NEPA assessments Emergency response Other mission needs (TMDL program, climate change analysis, etc.) 6/7/2011 U.S. Environmental Protection Agency 4 What FRS Does
  • 5. FRS Features Provides a more complete, holistic, cross-media view of key facility information through verification and data management procedures Incorporates layers of quality control – the FRS record is checked for completeness, consistency, and validity and is owned by FRS Integrates information from program national systems, state master facility records, tribal partners, and other federal agencies Supported by a network of data stewards covering both geographic and programmatic areas of expertise. Fully integrated with the Locational Data and the Integrated Error Correction Process (IECP) 5
  • 6. FRS Features Provides essential support for applications that rely on integrated views of facilities GIS applications (EnviroMapper, MyEnvironment) Public access applications (Envirofacts, Cleanups in My Community (CIMC) Enforcement systems and applications (IDEA, OTIS, ECHO, ICIS) Offers specialized services to applications in need of accurate facility information Emergency Response TRI-ME web DMR Loadings Tool Provides web services, enabling data exchanges with state partners on the Environmental Exchange Network 6
  • 7. FRS Scope Major Programs Represented in FRS Air AFS AQS CAMDBS EGRID NEI RBLC RFS (Ethanol) Water PCS ICIS-NPDES SDWIS CWNS Chemical Releases TRIS RMP TSCA SSTS FRP BRAC Hazardous Waste ACRES CERCLIS RCRAINFO RADINFO Enforcement/Compliance ICIS ECRM NCDB Schools NCES GNIS BIA INDIAN SCHOOL Other LANDFILL http://www.epa.gov/enviro/html/frs_demo/new_crosswalks.html
  • 8. FRS Data Model High Level Data Model Organization Industrial Classification Affiliation Individual Individual Supplemental Interest Mailing Address Alternative Name Facility/Site Geospatial Environmental Interest
  • 11. Integration? Air Permit Coordinate Water Permit Coordinate Toxics Permit Coordinate Best Facility Coordinate?
  • 12.
  • 13. The FRS geospatial database provides web services, database connections and spatial queries for a wide variety of web mapping applications, for example MyEnvironment, Cleanup In My CommunityIDEA/ECHO/OTIS and many others
  • 14. For Title 40 regulated programs, CDX collects locational and parametric data for the program offices, and facility data goes to FRS.
  • 15. Several program offices have their own systems that collect and manage locational and parametric data – Envirofacts pulls data from these, and FRS serves as the locational component for Envirofacts
  • 16. FRS contains many data enhancement, lookup and validation services that aid and assist other CDX flows.
  • 17. FRS receives locational data updates and edits from regional data stewards as needed.
  • 18.
  • 19. Locational Reference Table All underlying information from programs is retained, to include locational data For any given facility, there may be multiple individual locations that have been gathered, e.g. an associated air stack location, water outfall location, front gate location, et cetera MAD Codes help us to assess how to handle locational data quality as well as understanding what it represents http://www.epa.gov/enviro/html/locational/lrt_viewer.html
  • 20. MAD Codes MAD Codes help us to assess how to handle locational data quality As well as understanding what it represents
  • 22. Match & IntegrateFacility Data Scoring method: to determine if two records are the same facility 25 points, parsed street number 50 points, matching standardized city name, standardized county name, state and zip Score 100: an environmental interest is created for the source, and associated to the matched FRS record Score 50—100: FRS creates a new record and a new associated environmental interest, the new record is identified as having possible matches Score <30: FRS creates a new record with a new interest
  • 23.
  • 24. Business Case Users benefit from high quality integrated locational data for facilities toward enforcement, compliance, analysis, assessment and community impact Being able to assess and manage large amounts of data of varying quality, e.g. VGI
  • 25. Thank You - URLs

Notas do Editor

  1. DS will provide words for this slide
  2. Clean &amp; validate source data geocodeIntegrate – unique facility ID linkages matchingSelect best pick data – locational &amp; non-locationalDISCUSS BUSINESS RULES AND PRECEDENCE note: we don’t change program data
  3. This slide is meant for viewing in slideshow mode.When in slideshow mode, click on the titles, data collection, Q/A Enhancement and Datapublishing in order to open and close the relevant box of text
  4. In addition to cross-media integration, FRS has a core mission of locational data improvement, and as such, FRS has a framework for working toward iterative improvement of locational data. Toward assessing locational data, it is important to understand what the data represents, to allow the best-quality data and best representation to bubble to the top. Some of the key pieces of this are the LRT, which collects up and houses all of the various locational data about a given facility. To attempt to make sense of the various locations, LRT also tracks record level metadata called “MAD codes”, and uses a “best pick” algorithm for sorting through the LRT data. These will be briefly discussed in the following slides.
  5. As mentioned, LRT is the set of database tables that house all locational data associated with a given facility. Shown here is DuPont’s Spruance facility near Richmond, VA, which has 48 LRT records, coming from several programs. The table shows the data within the LRT, several of the columns deal with MAD codes.
  6. MAD codes – these are “method accuracy description” codes - they helps to understand if we are dealing with something like a high-precision GPS coordinate collected in the field, or a very vague location (dart thrown at a map)It helps us to understand what the location represents, whether air stack locations, water outfalls, front gate, plant centroid, or other type of feature. These may be very important to program offices, but for getting a general location of where the main facility is, they may in some cases be off by quite a bit. For example, a large industrial campus might cover many square miles.
  7. MAD codes are a component of EPA’s Latitude/Longitude Data Standard, with some key pieces being information about the locational accuracy estimate, code sets for method of collection, reference point code (descriptions of what the point represents), and spatial reference system (FRS uses the federal NAD83 standard, versus WGS84 or state plane coordinates)
  8. Once the earlier data fields have been checked, they are scored by their similarities to other records in the following manner. Their score determines whether or not it will be a new record.
  9. The first version of the EPA Latitude/Longitude Data Standard was developed back in 1998. This current version dates from 2006 but in reality, little has changed (except for the code values used) since 1998.
  10. Are we talking about developing a business case or communicating a business case, or both? How about:“The FRS community of interest can facilitate a discussion among EN partners to review, improve and communicate the business case for FRS.”I don’t think the “FRS can work with….” phrase is clear. An FRS data steward can work with partners, but FRS itself isn’t capable of working with anyone. Does that mean that EN partners can use FRS to share data and improve its quality? This slide suggests there is some dialogue about FRS involving and EN partners. Aside from the technical aspects of flowing data, I don’t think that’s happening. We aggregate data on facilities dealing with industrial classifications, points of contact, organizational affiliation and other information to provide a holistic view of the facility across programsFor data enhancements, we are now indexing facilities to census block, HUC12s, congressional district and other geographies We are doing a number of validations to check addresses against NAVTEQ streets data and USPS postal databases, we validate lat/long values as well as doing many other checks.