Making Decisions in a World Awash in Data: We’re going to need a different boat : Anthony Scriffignano’s Talk for the MIT Program on Information Science
In his abstract, Scriffignano summarizes as follows:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
Test driven cloud development using Oracle SOA CS and Oracle Developer CS
Semelhante a Making Decisions in a World Awash in Data: We’re going to need a different boat : Anthony Scriffignano’s Talk for the MIT Program on Information Science
Semelhante a Making Decisions in a World Awash in Data: We’re going to need a different boat : Anthony Scriffignano’s Talk for the MIT Program on Information Science (20)
Making Decisions in a World Awash in Data: We’re going to need a different boat : Anthony Scriffignano’s Talk for the MIT Program on Information Science
1. MIT Brown Bag Session:
Making Decisions in a
World Awash in Data:
We’re going to need a
different boat!
Anthony J. Scriffignano,Ph.D.
SVP/Chief Data Scientist
Dun & Bradstreet
14-November,2016
2. 22
The Jetsons image is licensed under Creative Commons
We need to seriously think
about the implications of
trivial inference from data…
3. 3
How is the data landscape changing?
What we are doing in data science to respond?
What type of skills and thinking are required to
remain relevant in this evolving world?
4. 4
The New Normal:
How the data around us
continues to change
What’s changing?
What’s New?
When do we have enough?
5. 5
We must observe what is changing critically
Reading about events
that happened in the
past, listening to people
who are not present.
Reading about things in the “now” – can’t
tell who is communicating with whom –
does anybody really know what is going
on?
Courtesy Get Smart (1965–1970)
Information was created and
shared in silos
According to Google Plus. Precisely five photographs were ever taken of Neil Armstrong while Apollo 11 operated on the surface of the moon. Only four of those
photos show Armstrong outside the Lunar Module and actually moonwalking. Only three of them show Armstrong in direct view, rather than a reflection. Aug 31, 2012
6. 6
A qualitative look at a shorter period from the
perspective of an extensible analytical frame…
What about the digital footprint of all of the
smartphones?
What about the social networks the crowd?
What about the metadata in the photos?
What are the opportunity costs to other
activities?
The largest corpus of data preceded the event
Most data created about the event had
significant, and asymmetric latency
The rate of “data decay” attributable to the
participants in the event is significant
Pope Benedict Inauguration
Pope Francis Inauguration
BEWARE THE DANGERS OF MAKING
INFRENCE OUTSIDE FRAMES OF
REFERENCE, KNOW THE RULES OF
WHEN YOU CAN OR PAY THE PRICE!
8. 8
When is enough enough?
Identify the Scenario
• Relative size
• Key question
Assess
DispositiveThreshold
• Estimate
• Triangulate
Decision Elasticity
• Bias
• Opportunity cost
D A T A I N
H A N D
D I S C O V E R A B L E
D A T A
E X I S T I N G B U T
I N A C C E S S I B L E D A T A
@Scriffignano1
9. 9
VOLUME
Data sensing
Curation, Data at
rest vs. data in
motion
More is not
necessarily
better
VELOCITY
Judging
Simultaneity of
truth
Time of curation
vs. time of
creation
The myth of real
time data
VERACITY
Triangulation,
Non-regressive
methods
Malfeasance
innovation,
regulatory
All true data is
not necessarily
simultaneously
true
VARIETY
Entity extraction,
discovery
Data that exists
but is
unavailable,
unstructured
More data is
being
disregarded than
used
VALUE
Disambiguation
Opportunity cost
of curation,
single use data
Value
deteriorates at
an alarming rate
Challenging assumptions:Taking a More Sophisticated
View of “Big Data”
10. 10
This change gives rise to choices about “where to land”
• Where to “land”
• How to measure in a changing environment
• Implications of choosing poorly
11. 11
How are We Reacting:
Bringing data science into the room
Bringing Data Science into the room
“Black cat” problems
Emerging relationships
Quantum observation / thinking
12. Implications
Emerging Trends Description Opportunities Risks
Adoption of
Natural Language
Processing
• Semantic
Disambiguation:
automated parsing and
analysis of unstructured
and structured text, and
spoken language – or, in a
more advanced state,
across multiple languages
• Detecting conversations
of interest in the ocean
of social chatter
• Foundational capability
for detecting
relationships between
entities of interest and
their behaviors
• Falling behind “bad
guys,” who may use
increasingly
sophisticated
“cloaking”
techniques in digital
communications
Export of Fintech
innovation into
other areas
• Social decision-making
• Blockchain and related
technologies: promotes
transparency and security
of transactions
• Supports regulatory
efforts e.g. privacy
• Exporting techniques
outside of finance
• New malfeasance,
especially with
alternate digital
identity
• Dispositive
threshold
Emergence of AI
in place of human
decision makers
• Relegation of decision
making to algorithms, e.g.
in trading, but potentially in
healthcare, transportation,
energy, etc.
• Significantly faster,
cheaper and more
efficient outcomes
• Drives scale /
consistency
• Cybersecurity is
even more critical
• Economically
optimal and socially
beneficial?
• Perpetuating
heuristic error
Emerging Digital Technologies: Behaviors
In scope for today’s discussion
13. 13
How data is being discovered and used is a constantly-changing
environment with changing context.
Situational awareness…
• More than 85% of data
creation is
unstructured
• Language and use of
language are constantly
evolving
• Commonly available
tools and solutions only
address a small part of
this space
Unstructured
Data
14. 14
Focusing on smaller parts of the problem space…
CHALLENGES
The “John Smith” problem
Caroline M Smith
302 N Liberty St.
Albion, IA
Addr Type: Residential
Carrie Smith
Monolith Corporation
1716 Locust St.
Des Moines, IA
Addr. Type: Commercial
Caroline Smith
University of Iowa
21 E Market St.
Iowa City, IA
Addr. Type: Commercial
The “Sybil” problem
Carrie Smith
Tenderheart Daycare
2635 Cleveland Dr.
Adel, IA
Addr. Type: Commercial
The “Ann Taylor” problem
PROGRESSIVE DECOMPOSITION
Who is speaking?
About whom?
How do they feel?
In what context?
EPISTEMOLOGY
Dispositive threshold
15. 15
At the same time, we must continue to innovate in traditional spaces
where we have strong enabling capabilities.
16. 16
Intentionally focusing on smaller parts of the problem space…
THE CHALLENGE
The “John Smith” problem
Caroline M Smith
302 N Liberty St.
Albion, IA
Addr Type: Residential
Carrie Smith
Monolith Corporation
1716 Locust St.
Des Moines, IA
Addr. Type: Commercial
Caroline Smith
University of Iowa
21 E Market St.
Iowa City, IA
Addr. Type: Commercial
The “Sybil” problem
Carrie Smith
Tenderheart Daycare
2635 Cleveland Dr.
Adel, IA
Addr. Type: Commercial
What would it look like?
The “Ann Taylor” problem
Who is
speaking?
About
whom?
How do
they feel?
In what
context?
Epistemology
17. 17
Simple mean of sentiment
Weighted mean of sentiment
Standard deviation of sentiment
Thought experiment (assuming completion)
18. 18
New Answers…
Are there identifiable
modes? Do they change
when speaking about
their own enterprise?
Is the leadership leading
or lagging mean
sentiment? How does this
change over time? How
do leaders influence?
19. 19
To avoid getting distracted by “all things social”, the
science involves continuous evolution and focus on
specific use cases that drive value
Sarcasm
ABC corporation is a wonderful
company, if you don’t do business
with them.
Neologism
Be sure to like us on FaceBook and
use #shallow when you Tweet.
Grammar variations
FBI is Hunting Terrorists With
Explosives.
Punctuation / Intrusion of foreign language
“Hi mom!” vs. “Hi, mom?”
Intentional mis-spelling
RU There?
Context /
Behavior
Sentiment
Attribution
Entity
Extraction
USE CASES CONFOUNDING
CHARACTERISTICS
DERIVING EMPIRICAL MEASURES
THAT INFORM USE CASES
D&B proprietary information, do not distribute or copy without permission
Feedback
21. 21
Watch this space…
• Inter- and intra-language correlation : deciding when things mean the same thing
• Inter- and intra-language transformation : transforming inference among languages
• Changing behavior to attract/obviate grapheme analysis : reacting to changing language
• Emerging “metalanguage” (e.g. “textspeak”) : reacting to language about language
• A language of “things” : reacting to new languages used by automation
• Using language to hide language : reacting to attempts to obscure via language
• Unicode is not universal… : understanding the limitations of automation
22. Implications
Emerging Trends Description Opportunities Risks
Connected
Space
• Discovery of Complex
Counterparty Relationships:
construction of an n-dimensional
space where dyadic relationships
are established into a connected
graph
• Discovering
relationships between
entities of interest and
their behavioral
patterns
• Changing the
environment by
measuring it (e.g.
fraud) and
creating smarter
malfeasance
• Confirmation bias
based on available
data
Large Scale
Machine
Learning
• Intelligence models which rely on
computational intelligence:
automated learning, classification
and storage
• Extending discovery of
relationships beyond
rule-based approaches
• New valuable insights,
esp. counter-intuitive
and paradoxical
• “Garbage in,
garbage out”
• Exporting human
bias into training
sets
Digital
Everything
• Scientific concepts become
computable as data is digitized in
new and more nuanced ways
(e.g. digital X-ray, fitness
monitors, autonomous vehicles)
• Discovery of patterns
and solutions that are
theoretically
“invisible”
• A powerful
weapon in hands
of a “bad guy”
• Laws will always
lag evolution
Emerging Digital Technologies: Relationships
In scope for today’s discussion
23. 23
The Black Cat Problem
… but also some very ominous
23
Dealing with “Black Cat” problems
• Signals
• Systemic measures
• Anomaly detection
• Isotropism
• Character / quality measures
• Data sensing
• Triggers
24. 24
Visualizing extremely complex, changing relationships
addresses questions never before feasible
Dyadic relationships across
multiple perspectives
Abstracting dimensions
Time
“What If”
scenarios
News
Social
signals
Market
Data
Events
Internal
Data
Some examples
• Understanding Signals derived from changes to business information
• Discovering and investigating clusters of unusual behavior
• Exploring the impact of new regulation
• Applying standard measures to a highly dynamic environment
• Exploring the impact of new market forces
• Studying the real or potential impact of supply chain interruptions
• Investigating emerging capabilities (e.g. reputational risk)
Asking new questions never before feasible
Observing key
measures over time
Blending with
similarly
constructed graphs
25. Understanding language as a changing phenomenon
leads to greater computational capability with
regard to changing behavior
Money laundering
CorporateTheft Identify
Bust- out
Shell Company
Trade Rings
T R A D I T I O N A L V I E W
M O R E N U A N C E D V I E W
Cybersecurity - inside
out/outside in
Data sovereignty
Permissible use
Discovering prior behavior
vs. emerging behavior in
extremely large sets of data
@Scriffignano1 25
26. Manifesting new relationships in our behaviorManifesting new relationships in our behavior
Data Privacy
Expressed Consent
Intellectual Property
Transferring data across borders
New data assets
Compliance with industry
standards and best practices
@Scriffignano1 26
Data sovereignty
Cybersecurity
Integrated global value chain
Emergency preparedness
National security: cyber-everything
Border protection
Balance ofTrade
27. 27
The Path Ahead:
Creating a mindset in the
organization that evolves
Where do we need to focus
Future challenges
28. Implications
Emerging Trends Description Opportunities Risks
Changing Nature of
Computing
• Neuromorphic computing:
Biologically-inspired techniques
aiming to mimic human thought
• Quantum computing: Non-
Boolean physical and algorithmic
devices
• Computing as a commodity:
ubiquitous access to unlimited
computational power on a “pay-as-
you-go” model
• Analytics / Data Science as a
Service: Attempts to deliver
capabilities down-market or in a
more scalable way
• Solutions to previously
intractable problems
enabled by increased
“parallelism,” and ability
to compute on non-
binary concepts
• ‘Bad guys’ get access
to unprecedented
computing power
and techniques
• Need to rethink
cybersecurity
Internet of Things • Objects connected on a network,
sending and receiving data,
distinctions by use (e.g. Industrial)
• Gleaning previously
unobserved patterns of
behaviors and
relationships
• Cybersecurity and
privacy challenges
• Authentication /
validation
Emerging Digital Technologies:“Things”
In scope for today’s discussion
29. 29
Quantum, Neuromorphic Computing
Technology to go beyond digital / Boolean computation
Neural Networks – integrated computing models inspired by biological analogs
Quantum Computing – Continuous, non-Boolean computational engines and algorithms
Substitution of Addition/Multiplication/AND & OR with Translation/Rotation & Expectation
Potential applications:
•Quicker non-deterministic searches from Quantum Algorithms
•Reconstruction of implied data/metadata models
•Quantum Pattern Discovery (e.g. anisotropism, neosophism)
30. 30
Quantum Computing Information stored in
qubits; qubits can be in
base states (|0>, |1>) or
any superposition or
entanglement (inner
product) of these.
Each additional qubit adds
infinitely many more
superpositions
Calculations are
performed by unitary
transformations –
essentially linear
translations or rotations –
on the qubit states
31. 31
Food for Thought…
Traditional
thinking Quantum
thinking
• Deterministic
Add/change/delete/search
• Workflow
• Value Chain dynamics
• Regressive analysis
• Non-Deterministic approach to
monotonicity
• Probabilistic workflow / heuristics
• Value Chain order-n understanding
• Non-Regressive analysis, especially with
new data/new behavior
34. 34
It is very important to have a working definition of
“Innovation”
New ways of doing old things?
Doing new things?
(Traditional)
Ways of doing new things?
Creating simpler unsolved
problems?
New ways of knowing?
New ways of cheating?
35. 35
Reflecting on the “Data Journey” of Large Multinational Organizations
From focus on data
to focus on meaning
From analytic focus
to holistic focus
From ingestion
to curation
From early adoption to
integrated value chain
Technology Process
MindsetPeople
36. 36
The Evolving Leadership Mindset for Data-Inspired Organizations
Initial
Focus
Emerging
Mindset
Modern
Mindset
• Awareness of a skills gap
• Recognition of significant risk and opportunity
• Some degree of feeling overwhelmed by the “new”
• Silos of evolution, internal competition due to new roles/responsibilities
• Focus on evolving the existing workforce and hiring new skills
• As landscape rapidly changes, skills required shift from tools to competencies
• More focus on governance, provenance, security, malfeasance
• More educated customers placing higher demands on the organization
• Breaking down silos of data and innovation
• Ability to measure and respond in a truly agile fashion
• Reflective leaders who constantly re-evaluate and constantly learn
• Analytical, data-based decisions evolve through new methods, learning
37. 37
Reflecting on the Journey -- Things to Consider
Too Much Data?
• Start with a problem or question, not a tool or dataset
• Understand the going-in assumptions
• Continuously evaluate how the environment is changing
NewTypes of Data
• Select methods (especially non-regressive) carefully
• Use new methods and visualizations for a reason, not for an
expedient
• Be aware of “Black Cat” problems
The Burning Platform
• Continuously evaluate new skills and capabilities
• Continuously evaluate new ways of knowing, breaking down
problems into smaller pieces, reducing complexity
• The decision to do nothing is a decision to sink further behind
37
38. 38
We can’t solve problems by using
the same kind of thinking we used
when we created them.