- A biologist is interested in large, unstructured IoT data to gain insights from connections between different data points, similar to how biologists study connections between biological entities.
- Currently the IoT lacks common data formats, ontologies to provide context to things, and data repositories, limiting analytical flexibility and insights that can be gained.
- Biologists overcame similar problems by developing standards like gene ontologies, metadata requirements, and public data repositories, enabling knowledge inference from large, diverse datasets.
- Key concepts from biology that could help the IoT include developing ontologies to define thing functions, processes, and localizations in order to organize knowledge and enable inferencing across the large, diverse data generated by the Io
3. • Why a biologist is interested in
large, unstructured data
• What wrong is with the IoT in its
current state
• How biologists deal with similar
problems
• Which academic concepts would
be useful in the IoT
WHAT TO EXPECT IN THE NEXT HOUR…
(including questions!)
4. • Why a biologist is interested
in large, unstructured data
• What wrong is with the IoT in its
current state
• How biologists deal with similar
problems
• Which academic concepts would
be useful in the IoT
WHAT TO EXPECT IN THE NEXT 10 MINUTES
5. DNA = storage of a blueprint
RNA = ‘active copy’ of DNA
protein = the building blocks
of cells and tissues
LIFE AS WE KNOW IT
transcription
translation
Gregor Johann Mendel,
exhibited in the Library at the NIMR
6. ‣ Reading DNA information
‣ Determining “the sequence
of a gene” was a PhD in the
early 1980s
‣ Data processing was mainly
transcribing the observation
into a research paper
BIOLOGY THEN AND NOW
SEQUENCE INFORMATION
Sanger sequencing
ca. 1980
http://www.eplantscience.com
7. 189,739,230,107 bases base pairs on 15th April 2015
(from 159,813,411,760 bases pairs in April 2015)
‣ We can sequence a human
genome in half a day
‣ Sequence databases grow
faster than storage capacity
‣ Data processing is the key
step in scientific
understanding
BIOLOGY THEN AND NOW
SEQUENCE INFORMATION
1990: automation
kilobases a day
2007: next-gen seq
megabases a day
2015: 1000s of
instruments world-wide
8. BIOLOGY THEN AND NOW
GENE ACTIVITY INFORMATION
‣ When are genes needed?
‣ Classical molecular biology
workflow, taking days…
‣ Data is semi-quantitative;
testing one gene at the time
Northern blot, ca. 1995
‣ High-throughput gene expression profiling
since mid-1990s
‣ Quantitative information for every gene in an
organism
‣ Key challenge is the graphical representation
and interpretation of the data
screenshot from
FlyBase, today
9. 2
6 ATP
‣ Signal transduction and
metabolic pathways
‣ Characterisation of proteins
and substrates that mediate
chemical reactions
‣ Nobel prize material
BIOLOGY THEN AND NOW
BIOCHEMISTRY
10. ‣ We know about 250k metabolites
‣ 100k protein structures
‣ on the order of 10k different
chemical reactions
BIOLOGY THEN AND NOW
BIOCHEMISTRY
“The Robot Scientist”
“small molecules”
(Organic & Biomolecular Chemistry Blog)
protein
(via the Protein Databank, www.pdb.org)
11. ‣Everything is connected
‣ Big, noisy, often
unstructured data
‣ We are learning how biological
entities depend on each other
DNA > RNA > proteins
12. • Why a biologist is interested in
large, unstructured data
• What wrong is with the IoT
in its current state
• How biologists deal with similar
problems
• Which academic concepts would
be useful in the IoT
WHAT TO EXPECT IN THE NEXT 5 MINUTES
13. ‣ Everything is connected
‣ Big, noisy, often
unstructured data
www.thingslearn.com
Analytics, context integration, machine learning
and predictive modelling for the IoT.
14. 0 clean shirt left
+
washing machine estimates 97% of
your last pack of powder used
+
it’s Wednesday, 23:55
+
the last four Thursdays had a
morning business meeting
+
the car is parked 20 m from a shop
+
last retail activity: 8 sec ago
Send immediate text reminder
to pick up washing powder +
send tweet from @BorisHouse
“need identified” +
“notification appropriate”
Actionable insight.
From everything.
15. NO ANALYTICAL FLEXIBILITY IN M2M/IOT
Matt Hatton, Machina Research
The BLN IoT ‘14
Internet replaces wire
It’s all about the
context
M2M
consumer
IoT
defined I-P-O
like it’s 1975
context
context
context
context
context
context
context
Is this hot?
16. LIFE SCIENCE STRATEGIES
DON’T WORK IN THE IOT
- There are no commonly accepted
- ‘catalogue’ of things,
- ‘ontology’ of things,
- ‘data format’ of things,
- ‘meta data’ for things.
- Most businesses are driven by revenue, not
long-term strategic vision
- Service providers have no need to publish
- Data can be highly personal (cheap excuse)
unless they’re
17. Trojan Room
coffee pot -
ca. 1993
Oct. 1995
“The Internet of Things”
Kevin Ashton, ca. 1999
20 YEARS OF NON-CONVERGENT EVOLUTION
FIRST DATA POTENTIAL RECOGNISED TODAY’S REALITY
“ignorant coexistence”
➡ Commonly accepted platforms
and formats for data exchange
➡ Meta-data deposition is a must
➡ Infrastructure provides entry
point for computational
knowledge inference
“designed to ask questions”
18. • Why a biologist is interested in
large, unstructured data
• What wrong is with the IoT in its
current state
• How biologists deal with
similar problems
• Which academic concepts would
be useful in the IoT
WHAT TO EXPECT IN THE NEXT 10 MINUTES
19. Oct. 1995
TOWARDS MIAMI STANDARD AND
DATA REPOSITORIES
cf. IoT
Nov. 1993
MInimal Annotation for
MIcroarray Info
20. META DATA, SHARING AND
DATA REPOSITORIES
founded in Nov. 1999
But this is a complex and ambitious project, and is one of the biggest challenges that
bioinformatics has yet faced. Major difficulties stem from the detail required to describe the
conditions of an experiment, and the relative and imprecise nature of measurements of
expression levels.The potentially huge volume of data only adds to these difficulties.
Nature
Feb. 2000
“
“
Nov. 2000
Oct. 2002
Wide adoption as
requirement for
publication in
scientific journals
21. META DATA, SHARING AND
DATA REPOSITORIES
cf. IoT 2014
since 2003
http://en.wikipedia.org/wiki/Silo
25. CURRENT GOVERNMENT
INVESTMENTS INTO GENE
ONTOLOGY
NIH alone spent $44,616,906 on the
ontology structure since 2001
(I don’t have data for UK/EU spendings)
~100 full-time salaries for experts with
domain-specific knowledge
~40,000 terms
26. story
measurements
+ meta data
open, public repositories
human
curators
ontology
terms
community
PUBLISH OR PERISH
ok?
journal
informal exchange - no credit!
funders
assessment
The majority of this
infrastructure is paid for by
governments and charities
industry!
27.
28. OUR PROBLEM IS KNOWLEDGE
DATA != INSIGHT
WITHOUT ORGANISING IT
29. • Why a biologist is interested in
large, unstructured data
• What wrong is with the IoT in its
current state
• How biologists deal with similar
problems
• Which academic concepts
would be useful in the IoT
WHAT TO EXPECT IN THE NEXT 10 MINUTES
30. measurements
+ meta data
storage &
provenance
human
curators
ontology
terms
user
PUBLISH OR YOU’RE NOT DOING IOT
ok?
Maybe the majority of this
infrastructure should be
paid for by governments?
company
cloud
device
registration
“ “
privileges
dataadded
value
31. WHAT IS AN ONTOLOGY?
used to establish conceptual
connection between entities
knowledge inference
finger
ontology structure
- body part
- limb
- arm
- hand
- thumb
- fingerontology rules
‣controlled vocabulary
‣clearly defined relationships
is a
is a
connects to
part of
with ontological reasoning, a computer can
infer that “finger is a body part”, although we
haven’t explicitly defined it that way
32. ARE PEOPLE NOT ALREADY USING
ONTOLOGIES IN THE IOT?
Semantic Sensor Network Ontology
“thermostat”
The idea is not new! Cf. extension of the semantic web
with the Semantic Sensor Network.
‣catalogs
‣conventions
http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
33. ONTOLOGIES HAVE TO BE
PRAGMATIC COMPROMISES
Gene Ontology annotation
15 years of research
47 publications
100+ authors
50+ PhDs
15 direct annotations
~150 inferred annotations
34. THE THREE BRANCHES OF
Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352
Localization:Where is an entity acting?
Function:What does the entity do?
Process:When is the entity needed?
35. inferences on “is a”
“part of”
“regulates”
“has part”
from geneontology.org from Ashburner et al., Nat Genet. 2000, 25(1):25-9.
GO AND CONTEXT
36. THE BRANCHES OF GO AND THE IOT
Localization: inside, (my?) home, living room
Function:
measures temperature
regulates temperature
interacts with user directly
interacts with user via app
Process:
regulation of temperature
measurement of ambient temperature
‘is proxy / is avatar’ for
presence
fire
ice age
37. A LAST WORD ON PRAGMATISM
“perfect” ontology
The SSN Ontology allows for
inference entirely on the basis
of its structure and annotation.
In reality, many parameters are
difficult to establish and the
effort to annotate things
outweighs the utility.
“crude” ontology
A simplified structure allows for
quick annotation even by non-
specialists.
The lack of details can lead to
clashes in the ontology =>
more smartness has to go into
software; more coding effort.
1 billlion
different things
1 milllion
use cases
38. 0 clean shirt left
+
washing machine estimates 97% of
your last pack of powder used
+
it’s Wednesday, 23:55
+
the last four Thursdays had a
morning business meeting
+
the car is parked 20 m from a shop
+
last retail activity: 8 sec ago
Send immediate text reminder
to pick up washing powder +
send tweet from @BorisHouse
“need identified” +
“notification appropriate”
Actionable insight.
From everything.
“not home”
“buying”
credit card: “highly personal device” ~ alive and awake
3% left and
not pressed
“indicator of esteem”
39. Today’s biology is a
quantitative, data-
rich science.
Infrastructure for ‘big
data’ was driven by
academics.
Data is only useful if
it can be turned
into knowledge.
Understanding of data
requires ‘data about
the data’.
Meta-data should be
in a universally
understood format.
Ontologies provide
context.
Gene Ontology
(GO) is a de facto
standard.
Human curation is
key to GO.
Public funders and
industry contribute
significantly to GO.
Should governments
be involved in IoT?
GO is not a ‘one fits
all’, but has a few
useful concepts.
What does the thing
do? Thing function.
For what can the
thing be an avatar?
Thing process.
Where is the thing?
Thing localization.
@BorisAdryan