Looking at a problem or data in isolation might have worked in a slow and disconnected past, but it is now a dangerous practice. The world is networked and Six Degrees of Separation have shrunk to 2 or 3 courtesy of the connectivity and networking of people, machines and things. Nothing is singular and isolated anymore, and establishing causality, future implications and likely outcomes is no longer simple or certain. Few systems can be treated as a Black Box with an input and an output related by some stable and linear function. Multiple inputs and outputs and stochastic transfer functions rule, and the resulting combinatorics confound us to the point where uncertainty is now the ‘uncomfortable’ norm!
Data mining is about drilling down to the fine detail in relatively small and contained data sets. A PC, spread sheets, structured data and simple analysis tools are the hallmarks of this domain. But Big Data is about the Big Picture, relationships, paths, and links which are way beyond the PC and simple tools. We are talking huge, sophisticated, fast evolving, and very specialised. It is already challenging many long held truths, discovering new ones, whilst revealing previously unknown relationships. The biggest problem is that we lack ‘Big Understanding’ or even the capacity to analyse and model situations to the point where clarity emerges. Our most powerful tools turn out to be computer modelling, simulation, Artificial Intelligence, and Visualisation.
Day on day our machine dependency grows as we tackle the vital Green, Social, Health, Science and Industry issues. We need the necessary wisdoms, we need the truths in order to make wise decisions that will impact future generations. So it is no accident that our Symbiosis with machines grows in hand with our abilities, but Big Data is no panacea, it is one of a raft of powerful new tools and should be seen so in our mission to gain better understanding and greater wisdom.
3. WHY
All
The
Excitement ?
A new tool addresses some of our biggest challenges:
-A fully networked, connected and increasingly open world
-Disparity of disconnected/siloed disciplines and industries
-Rapid rise of complexity far exceeding human abilities
-Gross failure of old thinking, models and methods
-Rise of non-linerity and emergent behaviour
-Displacement of technologies and people
-Growth of interdisciplinary relationships
-Acceleration of technology and change
-Freedom of data and information
-Globalisation of everything
- +++++
4. WHY
The
BIG
Deal ?
These problem sets are way beyond:
-The desk top
-Past thinking
-Spread sheets
-Old data bases
-Simple analysis
-Basic mathematics
-Simple programming
-Relational data bases
-All our past experiences
- +++++
5. WHY
Is
I t a l l s o
I m p o r t a n t ?
We are in a new era of:
-Novel causalities
-Correlation discovery
-More powerful computer models
-Unusual and unexpected solutions
-Extremely rare event identification
-Unexpected behaviours and outcomes
-Original classes of relationship discovery
-Degree of freedom reduction from six to two
-Rare and improbable association identification
-Previously unseen classes of objects/behaviours
-Essential element creation for sustainable futures
- +++++
6. FREEDOM
What
Degree of
S e p a r a t i o n ?
Analogue World
D i s c o n n e c t e d
Insulated Society
Digital World
C o n n e c t e d
N e t w o r k e d
People
Organisations
Machines
Things
<6
<5
∞
∞
<3
<2
<10
<30
The smaller the separation
the bigger the networking
and data generated
7. BIG
Global
Dynamic
Distributed
We have to addresses new sets of
issues and dimensions created by
a fast moving digital world that
have remained largely unseen,
untapped, and of a scale and
complexity never seen before
8. SCALE!
Beyond
H u m a n
C a p a c i t y
A small sample of our formidable challenges:
-Sustainability
-Genome Decode
-Protein structure/folding
-Social network associations
-Genome-protein communication
-Disease causality and propagation
-Global/National medical records analysis
-Seismic analysis for raw material location
-Money laundering and tax avoidance tracing
-Terrorist and criminal activity characterisation
-Astronomical data analysis and ‘body’ classification
+++++
9. SIZE
How
M uch
D a t a ?
0 2000 2005 2010 2015 2020 2025 2030
1000
100
10
1
0.1
0.01
0.001
More Data
Created in
2002 than
all of time
up to that
point !
Spread of data
creation estimates
Beyond Moore’s Law
exponential growth
that is Aaccelerating
EBytes/day
10. SIZE
How
M uch
D a t a ?
>> than the per day
2003 ~ 5EB/year
2015 ~ 5EB/day
11. WHERE
Does
I t a l l
C o m e F ro m ?
Things
Pe o p l e
M ac hine s
Education
H e a l t h C a re
I n s t i t u t i o n s
G o v e r n m e n t
C o m m u n i c a t i o n
+ ++ ++ +++ +++++
Transportation
Networking
C o m m e rc e
Business
S e c u r i t y
Po l i c i n g
S c i e n c e
Media
++++
C o m p e t i t i o n
Exploration
Research
Markets
G re e n
S o c i a l
O p e n
A p p s
++++
12. ANALYTICS
Structured
U n - S t r u c t u re d
S e m i - S t r u c t u re d
Applicability:
-Retail
-Science
-Banking
-Security
-Defence
-Medicine
-Wholesale
-Commerce
-Production
-Technology
-Manufacturing
Manufacturing
Government
Exploration
Institutions
Resourcing
Innovation
Education
Creativity
Logistics
Energy
+++++
13. ANALYTICS
Lots
o f D e t a i l
v Re l a t i o n s h i p s
Data Mining Data
Micro-View
Data &
Detail
Big Data
Macro-View
Relationships
Limited
C o n t a i n e d
C o n s t r a i n e d
Expaning
Te n d i n g t o
The Infinite
14. HUH?
Knowns
U n k n o w n s
U n k n o w n U n k n o w n s
The many problems:
-Certain and well defined challenges
-Suspected or manifest in some way, but ill defined
-To be discovered, become apparent, present problems
-Primary limitations are our ability to detect and characterise
-Secondary limitations include our inability to recognise significance
-Causality, probablity, statistics conspire to conceal, confuse and trick us!
- +++++
15. TRUTH
Is It
S t a t i c
Veracious ?
The Earth is:
-Flat
-Static
-Spherical
-An oblate spheroid
-T h e c e n t re o f u n i ve r s e
-T h e a x i s o f t h e s u n a n d s t a r s
-Centre of universe
P l an ts:
-Grow out of the soil
-Have no sensory facility
-Cannot grow without light
The ability to o b s e r ve , measure and
model with increasing accuracy
creates dynamic and more
relevant truths in line
with our growing
knowledge &
reality
In general ‘truth’ is dynamic and not
a fixed entity - it mutates as we
gather more information
and create deeper
understandings
16. HUMAN
Limited
Reasoning
and Analysis !
Big Data scale and complexity:
-Render Big Data beyond human abilities alone
-See structured and relational databases falling far short
-Make crude correlation and association analysis inadequate
-Includes many disparate/hidden relationships that are confounding
-Introduces multi-dimensional visualisation/conceptualisation difficulties
-Extends analysis beyond ‘Order 5’ mathematical models/general methods
- +++++
17. THEORY
We
Do Not
H a v e O n e !
Big Data really needs a Big Theory:
-Complexity confounds us
-There are no generalised solutions
-There is no suitable math framework
-To some degree we are working partially blind
-We can only use what we have already establisged
-Computer modelling/simulation/analogues can be used
-Hypothesis testing and experimental trials are often vital
- +++++
We Have ‘NO”
G e n e r a l P u r p o s e
To o l s / M e t h o d o l o g i e s
19. Vital
Sy m bi o si s &
A u g m e n t a t i o n
LowHigh
Low High
MAN AI
HighLow
Analysis
Modelling
Processing
Mathematics
Computation
History
Intuition
Creativity
Experience
Dimensionality
+
20. PICTURESVary with people, things +++:
-Social
-Intent
-Interest
-Mobility
-Browsing
-Expertise
-Ownership
-Connectivity
-Consumption
-Communication
- +++++
In
F o r m
&Scale
21. PICTURES We
Might
C o n j o u r !
This will vary with people, things, organisation:
-Social
-Intent
-Interest
-Mobility
-Browsing
-Expertise
-Ownership
-Connectivity
-Consumption
-Communication
- +++++
22. TOOLSGenerally beyond the abilities of most companies:
-Graph theory
-Hash filtering
-Causality testing
-Weighted mapping
-Trajectory projection
-N Dimensional sifting
- +++++
Key to
S u c c e s s
S p e c i a l i s e d
23. THE #
Key to
A n a l y s i s B y
Transformation
Very weak to strong relationship identification:
-Generally applicable
-Reveals subtle relationships
-Effective for small and big data
-Weeds out/reveals concealed links
- +++++
27. GENESIS
The
J o u r n e y
H a s J u s t B e g u n
Big Data and Complexity
need generalised theories like physics
needed thermodynamics and quantum mechanics
28. UNDERSTANDING
Never
Been so
Difficult
Our single biggest challenge as a species:
-Our past was built on it
-Our future depends upon it
-We are not evolving to be any smarter
-Ultimately we are limited by our tools
-Technology is our only survival route
29. END GAME
Wisdoms
K n o w l e d g e
U n c e r t a i n t y
Our big conceptual challenge: moving
on from a history of ‘static truths’ and
mostly clear and certain answers to a
world dominated by the probabalistic
and uncertain where ‘the truth’ has to
be updated and rewritten
Our primary tool here: Increasingly
powerful instruments of observation and
m e a s u r e m e n t , c o m p l e m e n t e d b y
computer deep modelling and simulation