A Thinking Person's Guide to Using Big Data for Development: Myths, Opportunities, and Pitfalls
Accompanying Paper Available at:
Caveat Emptor: The Risks of Using Big Data for Human Development
IEEE Technology and Society Magazine 38(3):82-90
DOI: 10.1109/MTS.2019.2930273
September 2019
https://www.researchgate.net/publication/335745617_Caveat_Emptor_The_Risks_of_Using_Big_Data_for_Human_Development
A Thinking Person's Guide to Using Big Data for Development: Myths, Opportunities, and Pitfalls
1. A THINKING PERSON’S
GUIDE TO:
BIG DATA FOR
DEVELOPMENT:
MYTHS, OPPORTUNITIES, AND
PITFALLS
Junaid Qadir
Associate Professor,
Information Technology
University (ITU), Pakistan
5. Talk’s
agenda:1 Is big data a panacea?
2 Why human social systems are complex?
• Does the data does not speak for itself?
• Big data’s big bias problem
• What about missing data?
• Most patterns are just illusions
• Are proper experiments no longer necessary?
• Big data obviates the need for nuanced subjective understanding
• The law of unintended consequences
3 Preventing big data from becoming a weapon of Math
destruction• When models become self-enforcing
• The way forward
7. Big data/ technological utopians
Forget taxonomy,
ontology, and psychology.
Who knows why people do
what they do? The point is
they do it, and we can track
and measure it with
unprecedented fidelity.
With enough data,
the numbers speak
for themselves.
The article attributed to Peter Norvig
(of Google), “All models are wrong, and
increasingly you can succeed without them.”
8. Does the data does not speak for
itself?
CopernicusPtolemy
9. Big Data Hubris: Big Data Fundamentalism
To set the record straight:
That's a silly statement,
I didn't say it, and I disagree with it.
Using data and statistics is not a new
idea in science, and it is always done
with respect to a theory.
Predictions depends on theory:
• Prediction requires a leap of faith (i.e., belief in some assumptions)
• The theory that the future will be similar to the past;
10. Sometimes things happen just by chance
Science requires differentiating effects from chance
Does the ‘magic tablet’
work as advertised or is it
just a gimmick?
"If you’re trying to establish cause-and-effect relationships,
do try to do so with a properly designed experiment." - Robert
Hooke.
12. Is not “data’s bigness” sufficient?
1936 US
Elections
A. Landon
People polled: 10 million
Replies received: 2.3 million
Prediction: landslide victory for Landon
F. Roosevelt
People polled: 50,000
Prediction: Roosevelt
No amount of big data can help in drawing correct
inferences if the dataset is systematically biased
13. Survivorship bias or survival bias is the logical error of concentrating
only on the people or things that made it past some selection process
Bullet holes in the plane:
which part to strengthen?
Abraham Wald
What about the missing data?
“Is there any point to which you would wish to draw my attention?”
“To the curious incident of the dog in the night-time.”
“The dog did nothing in the night-time.”
“That was the curious incident,” remarked Sherlock Holmes.
14. The central question is not to be fooled by randomness;
hard to escape this if your findings are not guided by
theory
Easy to mistake noise for the signal
15. Most patterns are just illusions
Deriving a theory
after seeing the data
is dangerous
16. ‘Proofiness’ prejudice
“The first lesson that you must learn is
that, when I call for statistics about the
rate of infant mortality, what I want is
proof that fewer babies died when I was
Prime Minister than when anyone else
was Prime Minister.
That is a political statistic.” – Churchill
“The art of using bogus mathematical arguments to prove something
that you know in your heart is true—even when it’s not.”
“If you torture the data enough, it will confess”
18. Why are human social systems
complex?
2
“Technology amplifies human intent and capacity;
it doesn't substitute for them.”—Kentaro Toyama
19. Everything that counts cannot be counted!
Jay Forrester (MIT)
Omitting structures or variables
known to be important because
numerical data are unavailable is
actually less scientific and less
accurate than using your best
judgment to estimate their values.
To omit such variables is
equivalent to saying they have zero
effect—probably the only value that
is known to be wrong!
Forrester (1961, p. 57).
20. Fitting the reality to fit the model
The man leaves the store wearing the suit, his right elbow crooked and
sticking out.
The only way he can walk is with a herky-jerky, spastic gait.
A man after trying a made-to-order suit said to the
tailor,“The sleeve is two inches too long!”
The tailor says, “No, just bend your elbow like this.
See, it pulls up the sleeve.”
The man says, “but look now at the collar! When I bend my
elbow, the collar goes halfway up the back of my head.”
The tailor says,“Raise your head up and back. Perfect.”
Just then, two passersby notice him.
the first: “Look at that poor crippled guy. My heart goes out to him.”
the second: “Yeah, but his tailor must be a genius! The suit is a perfect fit!”
21. Humanities: what use? Isn’t it all obvious?
When every answer and its opposite
appears equally obvious, then
“something is wrong with the entire
argument of ‘obviousness.’
Once told, the people could still intuitively
understand it. “Aha, City men are more used
to working in crowded conditions and in
corporations, with chains of command, strict
standards of clothing and social etiquette,
and so on.” Even this is obvious!
Sociologists concluded on the basis of a large expensive study (comprising
600,000 WW2 servicemen) that “Men from rural backgrounds were usually in
better spirits during their Army life than soldiers from city backgrounds”.
Most people reacted as: “Aha, that makes
perfect sense. Rural men are accustomed
to harsher living standards and more
physical labor than city men, so naturally
they had an easier time adjusting.”
But actually, the finding
was the opposite.
22. Why social systems are complex?
Social systems are complex adaptive systems
23. Counterintuitive behavior of social systems
Jay Forrester
(MIT)
Most people believe cause and effect are
closely related in time and space, while in
complex dynamic systems cause and effect
are often distant in time and space.
Social systems are not linear but
belong to the class of systems called
multi-loop nonlinear feedback
systems.
Interrelationships in systems are far
more interesting and important than
separate details.
Nonlinearity means that the act of playing the game
has a way of changing the rules, Chaos (Gleick)
24. The whole can be different from the parts
The sectional views differ considerably
qualitatively from the reality
25. 1) Today’s problems come from yesterday’s solutions.
2) The harder you push, the harder the system pushes back.
3) Behavior grows better before it grows worse.
4) The easy way out usually leads back in.
5) The cure can be worse than the disease.
6) Faster is slower.
7) Cause and effect are not closely related in time and
space.
8) Small changes can produce big results – but the areas of
highest leverage are often the least obvious.
9) You can have your cake and eat it too – but not at once.
10) Dividing an elephant in half does not produce two small
elephants.
11) There is no blame.
The importance of “Systems Thinking”
26. The law of the unintended
consequences
Social systems act as
“an enigma within a riddle within a mystery”
27. The Cobra Effect
(unintended consequences)
refers to the way
that measures taken
to improve a
situation can directly
make it worse.
During the British Raj, a bounty
system was devised to counter
the rise of venomous cobras.
The system worked really well; a
lot of cobras were killed and
Except that, entrepreneurs
figured out they could make
money by farming cobras and
killing more of them.
After the government scrapped
the system, there were more
cobras than before.
28. Campbell’s Law
“The more any
quantitative social
indicator is used for social
decision-making, the
more subject it will be to
corruption pressures and
the more apt it will be to
distort and corrupt the
social processes it is
intended to monitor.”
29. Goodhart’s Law
“When a measure
becomes a target, it
ceases to be a good
measure.”
Mahbub Ul Haq
“GNP can increase while
human lives shrivel”—
Mahbub ul Haq
31. THERE ARE ETHICAL
CHOICES IN EVERY SINGLE
ALGORITHM WE BUILD
“
Should someone buy expensive medicine for his sick
child, depriving the rest of the family essential nutrition?
An ethico-philosophical question not algorithmic question
When dimensions are
heterogeneous, every
index reflects subjective
preferences.
32. Machine predictions going awry!
In the movie Minority Report, the cop tackles and
handcuffs individuals who have committed no crime
(yet), proclaiming stuff like:
“By mandate of the District of Columbia
Precrime Division, I’m placing you under
arrest for the future murder of Sarah Marks
and Donald Dubin.”
The arrested person confronts Cruise and asks:
“You ever get any false
positives?”
In fact, it is very easy to do design a pre-crime criminal catching
algorithm that will catch ALL the criminals!
34. The Pygmalion Effect
A famous map of NY created by
famous cartographers Lindberg
and Alpers
Agloe NY was not a real town. It
was a paper town—a booby trap to
catch plagiarizers.
People figured based on the map that Agloe
must have gone missing and rebuilt it!
A few years after Lindberg and
Alpers set their map trap, the fake
town appeared on a Rand McNally
map, prompting the two
mapmakers to sue for copyright
infringement.
35. Big data can help strengthen stereotypes
Harvard Professor
Latanya Sweeney
36. Conclusions
1. Big data is a fantastic tool for supplementing
traditional data analysis methods.
2. But a thinking person realizes that big data do
not automatically solve the problem that has
obsessed statisticians and scientists for centuries
3. We need AI/ML/big data algorithms that are
ethical (i.e., fair, transparent, and generally
beneficial).