SlideShare uma empresa Scribd logo
1 de 80
Creating a Data-Driven Government
Big Data With Purpose
Dr Tyrone W A Grandison
Deputy Chief Data Officer
<< Log(radiances) >>
The US as a histogram
dim light average light intense light
radiance roughly proxies for people activity
commercedataservice.github.io/tutorial_viirs_part1
•MATRIX OF HISTOGRAMS
commercedataservice.github.io/tutorial_viirs_part1
Two histogram comparison
New York City Las Vegas
commercedataservice.github.io/tutorial_viirs_part1
Y(Labor Forcei)
.
.
.
X(Radiancei,j … Radiancei,n)
.
.
.
=
commercedataservice.github.io/tutorial_viirs_part1
Illegal fishing
gas flares
population
?!?!?!
growth and opportunity
69,583 datasets ~ 35.9%
government takes
on the hardest,
inelastic problems
“What’s your stack?”
“How fast is your GPU cluster in
traversing the graph?
“Are you a Spark guy?”
Micro
Touch
Long
touch
vs
In government, there’s a lot
more to algorithmic accuracy
than
a score.
TPR
AUC
F-1
Prec.
MSE
MAPE
signal + purpose
signal + purpose
useful
information
signal + purpose
direction, meaning
signal + purpose
viability
optimum
n-dimensional
data
• A reason for existence
• Access to the field
• Access to actionable data
• Ethical intervention points
• Methodologically defensible yet intellectually
accessible
• Path to sustainability
Six conditions for data awesomeness
Influence strategy and operations
Seed for innovation
40 Projects
Algorithmic
Intelligence
For New Exporters
Our Client, Our
Goal
New Exporters
Project
XX,XXX
Case: Who is export-ready and to
what degree?
Unsupervised
Learning with a
hint of supervised
learning
Differentiated services
for
new markets
Case: A trade specialist in rural
America may need to drive 2
hours to meet a potential exporter.
Conversion
Scoring
Problem
Know your utility
before you go
Case: Which positions in a
company are like to use which
services?
Transition
probabilities
Sets expectations
We’re just getting
started.
Data
Education
Upskill through data
education to seed for
change and improvement
Commerce
Data Academy
Start small: an experiment
4 Three-hour course taught by General
Assembly
Pilot Results
422 Registrations
90% Attendance rate
Data Science I: Basics / Working with
Teams (Git and GitHub) / Intro to Object-
Oriented Programming (Python &
JavaScript) / Using APIs (Intro to REST) /
Intro to Photoshop / Intro to Python / Basic
SQL (Using Sqlite3) / Building APIs / Intro
to R / Intro to JavaScript / Intro to Data
Analysis with Python / Data Wrangling with
pandas / Agile Development / HTML + CSS
/ Storytelling with Data / Excel / Intro to
Machine Learning / Visual Analytics with
Python / Data Storytelling with R
2016 Season (Scale Experiment)
14 Three-hour course taught by
Commerce Data Service staff
Two-week intensives on data science
and data visualization via General
Assembly
2
Option to be a data scientist or data
engineer-in-residence
Initial Response
3,500 Registrations
15 Participants for
In-Residence program
10 Bureaus represented
1 Model forked by
another federal
agency
4x more courses
6.9x growth in interest
unlimited potential
the upshot
Data skills are now a ”thing”
+ there is an internal market
Data Usability
Commerce Data
valuable, open, big, under-
utilized, unused
Commerce Data Usability Project
commerce.gov/datausability
Find the right users Understand security Find affordable housing Determine hail risk
Predict rainfall and flooding Determine human activity;
using satellite data
Help with Water
Management
a novel analysis or question posed
to the data
—
visually arresting graphics and
engagement with the public
—
open, free code and data for the
public to use
Contribute
Income
Inequality
Income Inequality is a hard
topic to interact with…
So people don’t.
How might we create a better
‘conversation’ and/or experience with
data around income inequality?
purpose
Create a basis of knowledge for
Americans on income inequality
initially…
Eventually a one-stop hub for making
income-related decisions combining
Census and BLS data.
intention
● Accessible via American Fact Finder (AFF).
● AFF doesn’t show distributions of individuals.
American Community Survey (ACS)
Current Population Survey (CPS)
● Limits:
● Medians falling in the upper, open-ended interval are
plugged with "$250,000”
● The data sets aggregate everyone above $100,000 together
● Limitations on job-to-job comparison
● Granularity of breakdowns
ACS Public Use Microdata Sample
(PUMS)
71
● Very Rich Data Set
● Difficult To Use
The MIDAAS Project
https://midaas.commerce.gov
School-to-Prison
Pipeline
The lives of too many girls of
color is characterized by:
Early Sexual Abuse, Chronic Aversive Stress ➪
School Failure ➪ Sexual Exploitation ➪ Prison.
12% African-American girls
7% of Native American girls
6% of white boys
2% of white girls.
Every year, girls of color are suspended from
school at higher rates than any other group
Annual Suspension Rates
Many of these girls are disproportionately funneled
through the juvenile justice system.
Girls are the fastest growing segment of the
juvenile justice system.
US Population Detained and
Committed
African American
Girls
14% 32%
Native American
Girls
1% 3.5%
How Do We Use Data to Address
This Problem?
Help Girls of Color
http://www.helpgirlsofcolor.org
Stay tuned.
Dr Tyrone W A Grandison
Deputy Chief Data Officer
tgrandison@doc.gov
commerce.gov/dataservice
github.com/CommerceDataService

Mais conteúdo relacionado

Mais procurados

Data journalism in the second machine age
Data journalism in the second machine ageData journalism in the second machine age
Data journalism in the second machine age
Alexander Howard
 

Mais procurados (20)

Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Data journalism Overview
Data journalism OverviewData journalism Overview
Data journalism Overview
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism
 
Collab Space DC Open Data
Collab Space DC Open DataCollab Space DC Open Data
Collab Space DC Open Data
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
Data journalism in the second machine age
Data journalism in the second machine ageData journalism in the second machine age
Data journalism in the second machine age
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 
NATO Workshop on Pre-Detection of Lone Wolf Terrorists of the Future
NATO Workshop on Pre-Detection of Lone Wolf Terrorists of the FutureNATO Workshop on Pre-Detection of Lone Wolf Terrorists of the Future
NATO Workshop on Pre-Detection of Lone Wolf Terrorists of the Future
 
World Future Society talk on Work/Technologh Global 2050 scenarios
World Future Society talk on Work/Technologh Global 2050 scenariosWorld Future Society talk on Work/Technologh Global 2050 scenarios
World Future Society talk on Work/Technologh Global 2050 scenarios
 
Big Data Paper
Big Data PaperBig Data Paper
Big Data Paper
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
The Information Economy
The Information EconomyThe Information Economy
The Information Economy
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International Development
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and Trends
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting Future
 

Destaque

Open source nahsl
Open source nahslOpen source nahsl
Open source nahsl
Shane Sher
 
Phree photo editing l
Phree photo editing lPhree photo editing l
Phree photo editing l
Shane Sher
 
desh birthday
desh birthdaydesh birthday
desh birthday
epadofina
 
Ben's two year presentation
Ben's two year presentationBen's two year presentation
Ben's two year presentation
judygio
 
Andrea Johnson--Stage Management
Andrea Johnson--Stage ManagementAndrea Johnson--Stage Management
Andrea Johnson--Stage Management
kavitamenon1
 
Stephen's ap gov f inal project.
Stephen's ap gov f inal project.Stephen's ap gov f inal project.
Stephen's ap gov f inal project.
stepheniscool2
 
Convert21189 2
Convert21189 2Convert21189 2
Convert21189 2
KHulsy
 

Destaque (20)

Enabling Data-Driven Private-Public Collaborations
Enabling Data-Driven Private-Public CollaborationsEnabling Data-Driven Private-Public Collaborations
Enabling Data-Driven Private-Public Collaborations
 
Sådan bruger vi MailChimp
Sådan bruger vi MailChimpSådan bruger vi MailChimp
Sådan bruger vi MailChimp
 
Presentation at Social Media & Society 2014 conference, Toronto
Presentation at Social Media & Society 2014 conference, TorontoPresentation at Social Media & Society 2014 conference, Toronto
Presentation at Social Media & Society 2014 conference, Toronto
 
Open source nahsl
Open source nahslOpen source nahsl
Open source nahsl
 
Be proactive
Be proactiveBe proactive
Be proactive
 
Phree photo editing l
Phree photo editing lPhree photo editing l
Phree photo editing l
 
desh birthday
desh birthdaydesh birthday
desh birthday
 
Stary basarab
Stary basarabStary basarab
Stary basarab
 
Spotlight with Imtiaz Ali & nexGTv
Spotlight with Imtiaz Ali & nexGTvSpotlight with Imtiaz Ali & nexGTv
Spotlight with Imtiaz Ali & nexGTv
 
Premios grammy
Premios grammyPremios grammy
Premios grammy
 
Attitude
AttitudeAttitude
Attitude
 
The power of share point mobile solutions - NYC 2016
The power of share point mobile solutions - NYC 2016The power of share point mobile solutions - NYC 2016
The power of share point mobile solutions - NYC 2016
 
Ben's two year presentation
Ben's two year presentationBen's two year presentation
Ben's two year presentation
 
产品设计与用户体验 - 马化腾
产品设计与用户体验 - 马化腾产品设计与用户体验 - 马化腾
产品设计与用户体验 - 马化腾
 
Justin beiber[1]
Justin beiber[1]Justin beiber[1]
Justin beiber[1]
 
Course 1: Create and Prepare Ubuntu 12.04 VM Template
Course 1: Create and Prepare Ubuntu 12.04 VM TemplateCourse 1: Create and Prepare Ubuntu 12.04 VM Template
Course 1: Create and Prepare Ubuntu 12.04 VM Template
 
Chief Data Officers At Work
Chief Data Officers At WorkChief Data Officers At Work
Chief Data Officers At Work
 
Andrea Johnson--Stage Management
Andrea Johnson--Stage ManagementAndrea Johnson--Stage Management
Andrea Johnson--Stage Management
 
Stephen's ap gov f inal project.
Stephen's ap gov f inal project.Stephen's ap gov f inal project.
Stephen's ap gov f inal project.
 
Convert21189 2
Convert21189 2Convert21189 2
Convert21189 2
 

Semelhante a Creating a Data-Driven Government: Big Data With Purpose

Ppt shark global forum session 3 2012 v4
Ppt shark global forum session 3 2012 v4Ppt shark global forum session 3 2012 v4
Ppt shark global forum session 3 2012 v4
GlobalForum
 

Semelhante a Creating a Data-Driven Government: Big Data With Purpose (20)

Ppt shark global forum session 3 2012 v4
Ppt shark global forum session 3 2012 v4Ppt shark global forum session 3 2012 v4
Ppt shark global forum session 3 2012 v4
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Heavy, Messy, Misleading: why Big Data is a human problem, not a tech one
Heavy, Messy, Misleading: why Big Data is a human problem, not a tech oneHeavy, Messy, Misleading: why Big Data is a human problem, not a tech one
Heavy, Messy, Misleading: why Big Data is a human problem, not a tech one
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovations
 
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...NPTEL BIG DATA FULL PPT  BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
NPTEL BIG DATA FULL PPT BOOK WITH ASSIGNMENT SOLUTION RAJIV MISHRA IIT PATNA...
 
Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
 
Heavy, Messy, Misleading: How Big Data is a human problem, not a tech one
Heavy, Messy, Misleading: How Big Data is a human problem, not a tech oneHeavy, Messy, Misleading: How Big Data is a human problem, not a tech one
Heavy, Messy, Misleading: How Big Data is a human problem, not a tech one
 
Big Data
Big DataBig Data
Big Data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data Challenges faced by Organizations
Big Data Challenges faced by OrganizationsBig Data Challenges faced by Organizations
Big Data Challenges faced by Organizations
 
Closing the Big Data Gap in Public Sector
Closing the Big Data Gap in Public SectorClosing the Big Data Gap in Public Sector
Closing the Big Data Gap in Public Sector
 
How Can Public Data Help Your Organization? An Introduction to DataCommons.org
How Can Public Data Help Your Organization? An Introduction to DataCommons.orgHow Can Public Data Help Your Organization? An Introduction to DataCommons.org
How Can Public Data Help Your Organization? An Introduction to DataCommons.org
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
It’s Big Data but Where Is It?
It’s Big Data but Where Is It?It’s Big Data but Where Is It?
It’s Big Data but Where Is It?
 

Mais de Tyrone Grandison

Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...
Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...
Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...
Tyrone Grandison
 

Mais de Tyrone Grandison (20)

Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...
Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...
Global Scientific Research as a Tool to Unlock and Engage Talent and Expand t...
 
Learning From the COViD-19 Global Pandemic
Learning From the COViD-19 Global PandemicLearning From the COViD-19 Global Pandemic
Learning From the COViD-19 Global Pandemic
 
Systemic Barriers in Technology: Striving for Equity and Access
Systemic Barriers in Technology: Striving for Equity and AccessSystemic Barriers in Technology: Striving for Equity and Access
Systemic Barriers in Technology: Striving for Equity and Access
 
COVID and the Ederly
COVID and the EderlyCOVID and the Ederly
COVID and the Ederly
 
Are There Ethical Limits to What Science Can Achieve or Should Pursue?
Are There Ethical Limits to What Science Can Achieve or Should Pursue?Are There Ethical Limits to What Science Can Achieve or Should Pursue?
Are There Ethical Limits to What Science Can Achieve or Should Pursue?
 
Using Data and Computing for the Greater Good
Using Data and Computing for the Greater GoodUsing Data and Computing for the Greater Good
Using Data and Computing for the Greater Good
 
How to effectively collaborate with your IT Departments to Develop Secure IA ...
How to effectively collaborate with your IT Departments to Develop Secure IA ...How to effectively collaborate with your IT Departments to Develop Secure IA ...
How to effectively collaborate with your IT Departments to Develop Secure IA ...
 
DOES innovation Lab Launch
DOES innovation Lab LaunchDOES innovation Lab Launch
DOES innovation Lab Launch
 
Creating Chandler's IT Strategic Plan
Creating Chandler's IT Strategic PlanCreating Chandler's IT Strategic Plan
Creating Chandler's IT Strategic Plan
 
Inventing with Purpose, Intention and Focus
Inventing with Purpose, Intention and FocusInventing with Purpose, Intention and Focus
Inventing with Purpose, Intention and Focus
 
Becoming a Nation of Innovation
Becoming a Nation of InnovationBecoming a Nation of Innovation
Becoming a Nation of Innovation
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHME
 
The Power Of Open
The Power Of OpenThe Power Of Open
The Power Of Open
 
ISPAB Presentation - The Commerce Data Service
ISPAB Presentation - The Commerce Data ServiceISPAB Presentation - The Commerce Data Service
ISPAB Presentation - The Commerce Data Service
 
Building APIs in Government for Social Good
Building APIs in Government for Social GoodBuilding APIs in Government for Social Good
Building APIs in Government for Social Good
 
Strategies and Tactics for Accelerating IT Modernization
Strategies and Tactics for Accelerating IT ModernizationStrategies and Tactics for Accelerating IT Modernization
Strategies and Tactics for Accelerating IT Modernization
 
The Creative Economy within the United States of America
The Creative Economy within the United States of AmericaThe Creative Economy within the United States of America
The Creative Economy within the United States of America
 
Security and Privacy in Healthcare
Security and Privacy in HealthcareSecurity and Privacy in Healthcare
Security and Privacy in Healthcare
 
Publishing in Biomedical Data Science
Publishing in Biomedical Data SciencePublishing in Biomedical Data Science
Publishing in Biomedical Data Science
 
The Big Think
The Big ThinkThe Big Think
The Big Think
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Creating a Data-Driven Government: Big Data With Purpose

Notas do Editor

  1. On October 28, 2011, a Delta II rocket took off from Vandenberg Air Force Base in California.
  2. Onboard was the Suomi NPP satellite, a nearly 2000 kg satellite with the mission of adding to the environmental and climate data records of the Earth; helping us to better understand society. The satellite mission was made possible by a partnership between the National Oceanic and Atmospheric Administration (NOAA) and NASA.
  3. Onboard, NPP carries various instruments that collect information about the earth system. One particular instrument, the Visible Infrared Imaging Radiometer Suite or VIIRS -- a 277 kg imaging device -- holds the potential to understand earth in unprecedented ways.
  4. While NPP flies over a sun synchronous orbit, the VIIRS instrument goes to work. It can see everything from: - atmospheric conditions, clouds, the earth radiation budget, clear-air land/water surfaces, sea surface temperature, ocean color, and low light visible imagery. It also captures nighttime lights, enabling far ranging applications.
  5. Looking at the continental US, nighttime lights are distributed in non-random patterns.
  6. On a macroscale, we can see all the interconnectedness of large cities to towns with the arteries in between.
  7. We can also see activity on the high seas, with boats and oil rigs in the Gulf coast.
  8. And, It’s more than a pretty picture. It’s data. It’s big data. In fact, the US nighttime lights profile can be turned a histogram. Think about taking a photo of the US from space using your nifty digital camera and then having a histogram of the lights. We basically are binning the light so we know how many pixels fall into each level of light intensity.
  9. And that light intensity holds the potential to understand population dynamics -- we could ballpark the number of people on the ground -- allowing researchers to  tie it to labor force estimates and economic output. This representation of data holds clues to how society collectively behaves. Let's put it into an example
  10. Let’s zoom in a bit on the 35 largest metro areas in the US See the spider web patterns and the clustering light. That indicates patterns in urban development, sprawl, economic activity, residential activity. And using nighttime lights we can quantify it.
  11. In fact when we breakdown satellite imagery into histograms, we can see clear differences in the amount and intensity of light. Cities with less light will have smaller histograms. Cities with more light and higher population density will have a tail to the right. More clustered the central business district is in small cities, longer the right tail.
  12. In New York, the light distribution has a mix of dim and bright lights. But in Las Vegas, it’s dimmer with one super bright urban core. One intensely bright pixel in one city will not mean the same as the same bright pixel in another. The clustering, residential, employment will also differ.
  13. Our team is experimenting with ways to convert the signal into more timely measures of society and the economy. And find where we can develop derivative data series. The key to new data-driven societal insights is somewhere in that data.
  14. But we're certainly not the first to take a crack at it and it doesn’t take much effort to find brilliant scientists at Commerce who are finding ways to use the data. For example, Dr Chris Elvidge -- a remote sensing scientist based out of NOAA’s Boulder Research Facility -- has spent most of his career drumming up ways of using nighttime imagery.
  15. Using VIIRS, he has found ways to detect: illegal fishing, the location and spread of wildfires and gas flares that add greenhouse emissions. Also, VIIRS can help estimate GDP and other social indicators, especially in the rural parts of developing world as well as measure the ROI of electrification projects.
  16. The data is there. It’s collected everyday. And there is more there than many of us could imagine.
  17. Just from the VIIRS instrument, we collect about 2.5 terabytes of raw data per day that expands out to much more when we consider all the processed data. This is what Commerce is about. We collect some of the highest value data around, find ways to use it to advance and better society and the economy
  18. This is what my team is about. I’m part of the leadership team of the Commerce Data Service, a new data startup within the Office of the Secretary, where I lead data science initiatives advancing the missions of  the 12 bureaus of Commerce. The Data Service was established in November last year and we've been quickly growing and moving to take on some of the hard problems across the bureaus...
  19. Bureaus like the Census Bureau, NOAA, the Patent + Trademark Office, Bureau of Economic Analysis among other agencies that produce about 36% of the federal open data available through data.gov. Essentially, we're one of the data big dogs.
  20. As the Deputy Chief Data Officer of the US Department of Commerce, I have this extraordinary privilege of working with among the brightest scientists and policy makers in the country.
  21. We have satellites and radar stations that help us understand the environment.
  22. We conduct well over 200 of the highest quality demographic and economic surveys in the world, which supports research on trade, urban planning and schooling.
  23. And it's not for nothing. I'd like to take you through what it means to work on data projects in government. Government takes on the hardest problems and we need data to take on those problems. If any one person needs help and asks for help, it’s the government that needs to step up to the challenge, whether it’s for defense, homelessness, housing, healthcare, education or the economy.
  24. According to the Census Bureau, we have nearly 320 million Americans. That’s 320 million customers. At the Commerce Data Service, we are doing our part by helping to make government more data-driven. But given the nature of our portfolio, we have to work differently.
  25. I often hear people start a data conversation with “what’s your stack?”, “how fast is your GPU cluster?”, “are you a spark guy?”. This indicates to me that someone is starting a project with technology first.
  26. Well, the thing is, our modes of interaction with our customers are not usually through micro-touches such as purchases, likes, views. The actions of a government are mostly in long touches -- hard conversations, in person services, laws and policies to create the right conditions.
  27. This is a hard realization for me. The first conversation a data scientist needs to have when starting a gov project is with the people out in the field. It's humbling, it's tough, but ultimately, there is more to algorithmic accuracy than the data. There’s the operational awareness. Both are equally important. We need to take a hard look at what data can actually do.
  28. In government, data science projects need to start with conversations around signal + purpose.
  29. Signal pertains to the substance of data. It’s about if that data even makes sense for what you want to do, if it matches the right time frames, the geographic resolution, the fidelity and reliability of the way it’s collected. There are data systems that can detect wildfires, but as amazing as it is, if it’s slightly off the decision time scale, it can’t be used. Data is an amazing national resource, but it needs to be shaped and understood.
  30. For data to affect change, we need adoption of products. Adoption is achieved through understanding purpose. We’re here to do good. We need to have a purpose to do good.
  31. A great mission might not have good data. Great data might not have an actionable purpose. Jointly, signal and purpose are a way to proxy for viability.
  32. Ultimately, in government we do not have simple 1 or 2 dimensional problems,
  33. because data is only one of n-dimensions of project when considering all else in the world.
  34. Thus, to ensure we're doing right by the public, we've worked out a set of six conditions for data and delivery awesomeness A reason for existence: Why is there a policy, program or process? How does it work? What is the system blueprint -- tech and social. This is the key for developing a theory of change. Access to the field. We need to speak with people who actually act on information and understand how they view new products and data. It's ultimately about them. Access to actionable data. We need to be able to dive quickly and deeply into the data to find signal , as a data product without signal in the data is just a pretty picture. Ethical intervention points. Using the social blueprint, we need to find an intervention point where a data science product would make sense. Methodologically defensible yet intellectually accessible. Many data scientists like to go down the path of algorithmic splendor, but we can't do that in our world as it alienates too many stakeholders. So, our work needs to be methodologically bulletproof by research standards but explainable by a generalist. Once we have buy-in, we can re-introduce that splendor Path to sustainability. Lastly, projects need an endpoint or a reason to be sustainable. And this is born out of testing.
  35. These conditions allow us to create change, influence strategy, and seed for innovation.
  36. And we apply this to all projects in our current portfolio of 40 projects. The vast majority are in the R&D phase, but I'd like to talk about a few projects that are now in the open.
  37. One of efforts uses data science to help strengthen export services
  38. And to broaden and deepen impact, ITA and the US Commercial Service, which has trade specialists in 100 cities and 75 countries worldwide, is collaborating with the Commerce Data Service to incorporate data into their US national field strategy.
  39. Example client
  40. We call this the New Exporters Project and it’s an effort to experiment using data science to combine ITA’s client data with commercial data sources to find untapped markets.
  41. In a given year, ITA reaches thousands businesses, providing everything from business match making services to market reports to company due diligence.
  42. ITA is looking to reach far more business through their business disruption initiative. By fine tuning services by customer segment, they can reach a far broader audience of businesses. Here are a few examples of what data science can do:
  43. Think about all the companies that are export-ready and don’t know it. Using a combination of unsupervised learning and supervised learning, we’re developing fine tuned ways of searching for untouched companies, figuring out which company types are more likely to use which types of services, and migrate to a market-wide view.
  44. How about the trade specialist in rural America may need to drive 2 hours to meet a potential exporter. That’s a huge time spend. We’re developing scoring models to figure out the potential utility of our services ahead of time before that long drive. For example, smaller manufacturing facilities may be associated with lighter touch services like market reports – so an emailed report may actually be a better first step. Likewise, small to medium sized businesses with a larger market cap in certain industry may be able to afford to invest in developing international relationships
  45. Which positions in a company will use which services? It may be that different positions in a given company may ask for one service one service over another -- but to create a rule of thumb is a statistical research problem. Having biz dev in a title may be associated with more light touches. A CEO title may actually be a wildcard.  So, having a good lead off offering could be the difference between use and non-use.
  46. Exporting is clearly a Commerce priority. We’re just getting started.
  47. One of the priorities at Commerce is data education and upskilling – both internally and externally.
  48. More data skills will improve efficiency. The smallest behavioral change may scale. So, at Commerce, we’ve launched an internal initiative called the Commerce Data Academy.
  49. Back in December, the Data Service launched the Commerce Data Academy to show what’s possible through data.
  50. We started with a pilot of 4 three-hour classes taught by General Assembly.
  51. And as it was a pilot, we didn’t think that we would end up with 422 registration with a 90% attendance rate. Who would’ve thought?
  52. We then started to think… what if we went big. Hail Mary it. And expand the offering to cover JavaScript, Machine Learning, basic programing.
  53. And we scaled it to 14 three hour class taught by our Data Service staff with 2 two-week long intensives taught by General Assembly.
  54. We’ve seen a huge bump. Now we have 3,500 registrations.   In addition, the 10 most committed public servants from the Academy are now on detail with our shop to exercise those new skills to build products and capacity for their home agencies. This model has worked out so well that at least one other agency has forked the CDA model.
  55. 4-times more courses, led to 6.9 growth in interest, really tells us that there is unlimited potential to disrupt the skills space.
  56. The upshot is that by showing we have the skills in the open now has established data skills as a “thing” within the Department of Commerce and there is a new internal market for data products.
  57. Another area we are focusing on is Data Usability
  58. Commerce has some of the most highly-valued data set. Unfortunately, they are often under-utilized and unused; primarily because they are difficult to find, hard to understand and even harder to process (because many do not understand the collection constraints involved in the production of the data).
  59. Usability of data is dependent on the context, examples, and compelling purpose. And to help open data move to open knowledge, we’re stepping up our game. We launched the Commerce Data Usability Project to publish long form tutorials that illustrate data use cases, code, and narrative around high-value, high potential data. And it's targeted at undergraduate and graduate students -- the next generation of data scientists who are hungry to learn. We’ve partnered with private sector companies, academia, and nonprofits to show how data is being used around the country.
  60. We have a nice bench of contributors and more always coming. - Mapbox has contributed two tutorials on how to get started with interactive web maps using NOAA Global Weather Forecast data; - Zillow has produced a tutorial on analyzing housing affordability combining their data and Census data; - Earth Genome illustrated how to manipulate digital elevation model data that plays a key role in wetlands models.
  61. We are highlighting the power of contextualizing and illuminating #OpenData. How many people here believe that #OpenData can currently help them find their customers and users? The Commerce Data Service provides very specific detail on doing just that using data from the Census American Community Survey (ACS). See http://commercedataservice.github.io/tutorial_acs_rank/.   #OpenData from the Department can help businesses understand their computer security (http://commercedataservice.github.io/tutorial_nist_nvd/), find affordable housing options for their employees (http://commercedataservice.github.io/tutorial_zillow_acs/), help them determine weather risk (http://commercedataservice.github.io/tutorial_noaa_hail/), help predict rainfall and flooding issues (http://commercedataservice.github.io/tutorial_mapbox_part1/), help them determine hotbeds of human activity – using satellite data (http://commercedataservice.github.io/tutorial_viirs_part1/ ), and to help them with water management concerns (http://commercedataservice.github.io/tutorial_earthgenome/)
  62. In the coming weeks, Microsoft and Columbia University have signed up to release a series of tutorial on how to begin to use analytical tools. Many more to come and we welcome collaborations. There is agreement out there that product gets used if people are furnished with a basic understanding of what that product is. In data and tech, free and balanced education really is a powerful tool.  More and more organizations want to show how open data works for them.
  63. Our tutorials are designed to engage data audiences, encourage adoption of datasets and associated workflows, and facilitate innovation. To do this, we’ve ensured that all tutorials are built according to the following guidelines: A novel analysis or question posed to the data Visually arresting graphics Open and free code and data for the public to use. It is important to note that we are language, method, and approach agnostic. This is what you have to do if you want to contribute to the initiative.
  64. Income Inequality is one of the formidable challenges of our time.
  65. However, it is a hard topic … and not many people talk about or interact with it because of this.
  66. Our mission was to use data to drive this mission.
  67. We want to create a data-driven platform to focus on this issue. The first thing we have to do is examine the data sources.
  68. The ACS does not have the detail that we require.
  69. The Census Current Population Survey (CPS) has limitations that preclude us from having a conversation on the detailed data. These limitations include: Medians falling in the upper, open-ended interval are plugged with "$250,000” The data sets aggregate everyone above $100,000 together Limitations on job-to-job comparison Granularity of breakdowns
  70. The PUMS is the data that we choose to use. Very Rich Data Set: Individual and Household Data sets Income breakdowns by types Job breakdown by industry Geographic breakdown below State Difficult to Use: USA individual file alone is 2 Excel files!!! Data Dictionary 138 pages!!! Very specific ways to match variables that are difficult to understand
  71. MIDAAS is an API and website that unpacks the ACS PUMS data and creates a forum for us to have that discussion.
  72. Another issue is the School-to-Prison pipeline.
  73. We’re just warming up. That’s just a few of the 40 projects. Big ones on the way. Stay tuned.