Mais conteúdo relacionado
Semelhante a Using AI to unleash the power of unstructured government data (20)
Mais de Deloitte United States (20)
Using AI to unleash the power of unstructured government data
- 1. Using AI to unleash the power of unstructured government data:
Applications and examples of natural language processing (NLP) across
government
Deloitte Center for
Government Insights
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 2. Why is NLP
required?
Government
agencies awash
in unstructured
data.
Difficult to
analyze
unstructured
text.
Useful
information may
be trapped
inside the data.
How to tap
into such
information?
Solution
Derive
actionable
insights
Connect the
dots
Facilitate
policy
analysis
Natural Language
Processing (NLP)
But…
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 3. What is NLP?
• A branch of Computer Science
• Also known as Computational
Linguistics
• Allows the computer to
communicate with humans
(text, audio, …)
Help generate
information and absorb
information
Use technology from
• Linguistics
• AI
• Machine learning
• Formal Language Theory
Ultimate goal is to help
communication
Copyright © 2019 Deloitte Development LLC. All rights reserved.
Source: Ayn de Jesus, “AI for speech recognition”, Aug.2018,
SAS Institute Inc., “Natural language processing: What it is and why it matters,” accessed December 19,2018.
- 4. NLP
The Evolution of NLP and underlying algorithms
Turing test of
“Computing
Machinery
and
Intelligence”
Advanced
speech
recognition
technologies
Topic modelling
introduced
IBM
sponsored
the ‘Index
Thomistic
us’
a computer-
readable
compilation
of St.
Aquinas’
works
Richer
Statistical
Models
Georgetown
Russian
translation
experiment
Machine
Learning
algorithms
introduced
Natural
Language
Generation
takes off
Pattern
recognition
and “nearest
neighbour”
algorithms
Source: Roberto Busa, S.J., and the Invention of the Machine-Generated Concordance, Bhargav Shah,"The power of natural language processing: Today's boom in artificial intelligence", Medium,
Chris Smith et al., "The history of artificial intelligence," University of Washington, Eric Eaton, "Introduction to machine learning," presentation, University of Pennsylvania, Kendall Fortney, "Pre-
processing in natural language machine learning," Towards Data Science, Clark Boyd, "The past, present, and future of speech recognition technology," Medium, Hofmann, “Probabilistic latent
semantic indexing,” Proceedings of the twenty-second Annual International SIGIR Conference on Research and Development in Information Retrieval; Robert Dale, Barbara Di Eugenio, and Donia
Scott, “Introduction to the special issue on natural language generation,” September 1998, Medium, “History and frontier of the neural machine translation,” August 17, 2017; Ram Menon, “The
rise of the conversational AI,” Forbes, December 4, 2017.
The term
‘deep
learning’
introduced
Advanced
topic models
such as LDA
introduced
Neural Machine
Translation gets
implemented
Conversational
AI gathers
momentum
1949 1954 1980s 2000s 2006 2017
1950 1960s 1990s 2003 2015-16
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 5. Key capabilities of NLP
Information
Extraction
Automatically
extracts structured
information from
unstructured
documents.
Text
Categorization
Automatically
categorizes
documents with a
predefined set of
categories
Text
Clustering
Groups documents
into clusters so that
documents within a
cluster.
Establishes
semantic
relations
between entities.
Relationship
extraction
Named Entity
Resolution
Automatically
extracts and
classifies the named
entities into pre-
defined labels and
links them to a
specific ontology.
Automatically
uncovers hidden
topics from large
collections of
documents.
Topic
Modeling
Decodes the
meaning behind
human language
and helps analyze
people’s sentiment.
Sentiment
Analysis
Copyright © 2019 Deloitte Development LLC. All rights reserved.
Source: Fabrizio Sebastiani,”Text categorization”, 2005. Meaning Cloud, “What is text clustering?,” Paul A. Watters, ”Named entity resolution in social media,” Elsevier, 2016, Nguyen Bach
and Sameer Badaskar, “A review of relation extraction”, May 2011, Stanford.edu. “Sentiment Analysis- What is Sentiment Analysis”.
- 6. NLP cuts across domains and can help address critical government
issues
Healthcare
Defense and
National Security
Financial Services
Energy and
Environment
Analyze Public
Feedback
Improve predictions
Improve regulatory
compliance
Enhance policy
analysis
Government IssuesDomains
NLP
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 7. Improving predictions at the US Food and Drug Administration (FDA)
The National Center for Toxicological Research (NCTR) at FDA used
NLP to identify relevant drug groups.
Topic modeling was used on:
➢ 10 years of reports extracted from the FDA’s Adverse Event
Reporting System (FAERS).
➢ Over 60,000 drug adverse event pairs.
The objective: To better predict potential adverse drug reactions.
Source: Mitra Rocca, “Lessons Learned from NLP Implementations at FDA,” U.S. Food and Drug Administration,
June 15, 2017.
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 8. DoD’s Defense Advanced Research Projects Agency (DARPA),
launched in 2012, the Deep Exploration and Filtering of Text
(DEFT) program.
The DEFT program uses NLP in an effort to uncover connections
implicit in large text documents.
The objective: To improve the efficiency of defense and
intelligence analysts who investigate multiple documents to
detect anomalies and causal relationships.
Improving forensics investigations at DARPA, Department of Defense
Source: Boyan Onyshkevych, “Deep Exploration and Filtering of Text (DEFT),” US Department of Defense, Defense Advanced Research Projects Agency.
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 9. Analyzing public feedback for the Government of UK
Source: Gov.UK, “Understanding More from User Feedback,” by Dan Heron, Data in Government Blog, November 9, 2016,
https://dataingovernment.blog.gov.uk/2016/11/09/understanding-more-from-user-feedback/ .
The UK government uses Latent Dirichlet Allocation (LDA), a
method of topic modeling, to analyze public comments on
GOV.UK.
LDA is designed to help the government to uncover any
relation between customer complaints and comments.
For e.g., mortgage complaints often contain allegations of
racial discrimination.
The objective: To enable the government to better address
public feedback.
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 10. The Center for Tobacco Products (CTP), part of the FDA, uses
topic modeling to identify key terms and cluster documents
based on topics (menthol, youth consumption of menthol
etc.)
The objective:
➢ To understand the impact of manufacture and distribution
of tobacco products.
➢ To inform policy-making, particularly concerning the
implicit marketing of tobacco products to youths.
Informing policy-making at the Food and Drug Administration (FDA)
Source: U.S. Food and Drug Administration, “Data Mining at FDA,” by Hesha J. Duggirala et al, August 20, 2018,
https://www.fda.gov/scienceresearch/dataminingatfda/ucm446239.html.
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 11. Getting started with NLP
Define the problem
Build the team
Identify the data
Develop models
Test and deploy
the model
Define the problem that the agency faces and identify which technologies,
including NLP, might best address them.
Create a team at the beginning of the project and define specific responsibilities.
Recruit data science experts from outside to build a robust capability.
Clean the data, create labels and perform exploratory analysis. Some datasets
may be easily acquired; others may not be in a machine-readable format.
Develop the NLP models that best suit the needs of the initiative leaders. The
data science team could develop ways to reuse the data and codes in the future.
Amend the NLP model based on user feedback and deploy it after thorough
testing. Most importantly, train the end users.
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 12. Contacts
William D. Eggers
Executive Director, Deloitte Center for
Government Insights
Deloitte Services LP
+1 571 882 6585
weggers@deloitte.com
Matt Gracie
Managing Director, Strategy and Analytics,
Deloitte Consulting LLP
+1 410 507 7839
magracie@deloitte.com
Copyright © 2019 Deloitte Development LLC. All rights reserved.
- 13. About Deloitte
Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”), its network of member firms, and their related entities. DTTL and each of
its member firms are legally separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients. In the United States, Deloitte refers to one or more
of the US member firms of DTTL, their related entities that operate using the “Deloitte” name in the United States and their respective affiliates. Certain services may not be available to attest
clients under the rules and regulations of public accounting. Please see www.deloitte.com/about to learn more about our global network of member firms.
Disclaimer
This publication contains general information only and Deloitte is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or
services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any
decision or taking any action that may affect your business, you should consult a qualified professional advisor.
Deloitte shall not be responsible for any loss sustained by any person who relies on this publication.
Copyright © 2019 Deloitte Development LLC. All rights reserved.