What the Internet of Things should learn from the life sciences. About the utility of open data, ontologies and public repositories as routinely used in the academic life science, but rarely in the IoT.
2. Who is
@BorisAdryan
• Computational biologist
• Research group leader
• Lecturer in genome biology
• Advisor at
• 2015 Fellow of the
3. LIFE AS WE KNOW IT
DNA = storage of a blueprint
transcription
RNA = ‘active copy’ of DNA
translation
protein = the building blocks
of cells and tissues
Gregor Johann Mendel,
exhibited in the Library at the NIMR
4. BIOLOGY THEN AND NOW
SEQUENCE INFORMATION
• Reading DNA information
• Determining “the sequence
of a gene” was a PhD in the
early 1980s
• Data processing was mainly
transcribing the observation
into a research paper
Sanger sequencing
ca. 1980
http://www.eplantscience.com
5. BIOLOGY THEN AND NOW
SEQUENCE INFORMATION
181,563,676,918 bases base pairs on 15th October 2014
(from 165,722,980,375 bases on 24th August 2014)
• We can sequence a human
genome in half a day
• Sequence databases grow
faster than storage capacity
• Data processing is the key
step in scientific
understanding
6. BIOLOGY THEN AND NOW
GENE ACTIVITY INFORMATION
• When are genes needed?
• Classical molecular biology
workflow, taking days…
• Data is semi-quantitative;
testing one gene at the time
Northern blot for d-vhl
ca. May 1999
7. BIOLOGY THEN AND NOW
GENE ACTIVITY INFORMATION
• High-throughput gene
expression profiling since
mid-1990s
• Quantitative information for
every gene in an organism
• Key challenge is the
presentation and
interpretation of the data
8. BIOLOGY THEN AND NOW
2
6 ATP
BIOCHEMISTRY
• Signal transduction and
metabolic pathways
• Characterisation of proteins
and substrates that mediate
chemical reactions
• Nobel prize material
9. BIOLOGY THEN AND NOW
BIOCHEMISTRY
• We know about 250k
metabolites
• 100k protein structures
• on the order of 10k
different chemical
reactions
10. ‣We are learning how
biological entities depend
on each other
‣ Everything is connected
‣ Big, noisy, often
unstructured data
11. ‣ Everything is connected
‣ Big, noisy, often
unstructured data
www.thingslearn.com
Analytics, context integration, machine learning
and predictive modelling for the IoT.
12. THERE’S NO ANALYTICAL
FLEXIBILITY IN M2M/IOT
Matt Hatton, Machina Research
The BLN IoT ‘14
Internet replaces wire
It’s all about the
connectedness
M2M
consumer
IoT
13. LIFE SCIENCE STRATEGIES
DON’T WORK IN THE IOT
- There are no commonly accepted
- ‘catalogue’ of things,
- ‘ontology’ of things,
- ‘data format’ of things,
- ‘meta data’ for things.
-Most businesses are driven by revenue, not
long-term strategic vision
- Service providers have no need to publish
- Data can be highly personal (cheap excuse)
unless they’re
17. CURRENT GOVERNMENT
INVESTMENTS INTO GENE
ONTOLOGY
NIH alone spent $44,616,906 on the
ontology structure since 2001
(no data for UK/EU spendings)
~100 full-time salaries for experts with
domain-specific knowledge
~40,000 terms
19. META DATA, SHARING AND
DATA REPOSITORIES
founded in Nov. 1999
Nature
Feb. 2000
But this is a complex and ambitious project, and is one of the biggest challenges that
bioinformatics has yet faced. Major difficulties stem from the detail required to describe the
conditions of an experiment, and the relative and imprecise nature of measurements of
expression levels. The potentially huge volume of data only adds to these difficulties.
“
“
Nov. 2000 Oct. 2002
Wide adoption as
requirement for
publication in
scientific journals
20. META DATA, SHARING AND
DATA REPOSITORIES
cf. IoT 2014
since 2003
Semantic Sensor Network Ontology http://en.wikipedia.org/wiki/Silo
21. PUBLISH OR PERISH
story
measurements
+ meta data
open, public repositories
human
curators
ontology
terms
community
ok?
journal
informal exchange - no credit!
funders
assessment
industry!
The majority of this
infrastructure is paid for by
governments and charities
22.
23. PUBLISH OR YOU’RE NOT DOING IOT
measurements
+ meta data
storage &
provenance
human
curators
ontology
terms
user
ok?
Maybe the majority of this
infrastructure should be
paid for by governments?
company
cloud
device
registration
“ “
added privileges data
value
24. WHAT THE IOT SHOULD LEARN
FROM THE LIFE SCIENCES
• Given the predicted importance and impact of the IoT, we can and
should not leave the development of infrastructure to commercial
stakeholders alone.
• We need a lot more incentives to participate and targeted investment
from the government (“the funders”) into reliable infrastructure.
• It took the computational life sciences less than 4 years(!) to grow from
a grass roots movement to having industry-scale, expandable
infrastructure.
• Shared vision, dogmatic implementation, effective lobbying.
@BorisAdryan is interested to hear about IoT job opportunities.