The document summarizes key findings from the 2010 Global Burden of Disease Study, which assessed the prevalence of 291 diseases and injuries and 67 risk factors across 187 countries from 1990 to 2010. It found that ischemic heart disease, HIV/AIDS, and lower respiratory infections caused the most disability-adjusted life years lost globally. The study applied consistent estimation methods to comprehensively measure the health burden and trends in both communicable and non-communicable diseases. It represents an improvement over previous analyses by strengthening statistical methods and expanding the scope of conditions and risk factors considered.
Welcoming to incoming bioinformatics students at UCSF
1. Biological & Medical
Informatics:!
the beginning
Daniel Himmelstein!
September 24, 2014
Hand Drawn Map of SF!
by Jenni Sparks
Before the Money Came!
Bettye LaVette
8. The exponential rise of ‘omics’
Andrew Su
on Twitter
‘omics’ — collective characterization and
quantification of biomolecules
9. Data Scientist:
The Sexiest Job of the 21st Century
Meet the people who
can coax treasure out of
messy, unstructured data.
by Thomas H. Davenport
and D.J. Patil
hen Jonathan Goldman ar-
rived for work in June 2006
at LinkedIn, the business
networking site, the place still
felt like a start-up. The com-
pany had just under 8 million
accounts, and the number was
growing quickly as existing mem-
bers invited their friends and col-
leagues to join. But users weren’t
seeking out connections with the people who were already on the site
at the rate executives had expected. Something was apparently miss-
ing in the social experience. As one LinkedIn manager put it, “It was
Meet the people who
can coax treasure out of
messy, unstructured data
by Thomas H. Davenport
and D.J. Patil
70 Harvard Business Review October 2012
70 Harvard Business Review October 2012 Harvard Business Review October 201
Artwork: Tamar Cohen, Andrew J Buboltz, 2011
Definition (wikipedia):
!
the study of the generalizable
extraction of knowledge from data
13. • Aggregate microbial rDNA
content of a seawater sample
• richness of operational
taxonomic units (OTUs)
• species distribution modeling
Diversity of the Marine Metagenome
Ladau et al. (2013) ISME
doi:10.1038/ismej.2013.37
Katie Pollard
-180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 180°
-180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 180°
-90°
-60°
-30°
0°
30°
60°
90°
-90°
-60°
-30°
0°
30°
60°
90°
MICROBIS
FUHRMAN2008
POMMIER2007
GOS
Figure S1: Sampling locations for data used in constructing maps. Models with
zero to eight parameters were fitted using MICROBIS data. Predictive performance of
the models was evaluated using both internal measures of model performance (AIC, BIC,
and PRESS) and three independent data sets, collected at the locations shown in red,
green, and yellow (see Table S1). Analyses were based on 377 samples (234 MICROBIS,
30 GOS, 9 POMMIER2007, 103 FUHRMAN2008) collected from 164 distinct locations.
-180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 1
-180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 1
-90°
-60°
-30°
0°
30°
60°
90°
MICROBIS
FUHRMAN2008
POMMIER2007
GOS
Figure S1: Sampling locations for data used in constructing maps. Mode
zero to eight parameters were fitted using MICROBIS data. Predictive performa
the models was evaluated using both internal measures of model performance (AIC
and PRESS) and three independent data sets, collected at the locations shown
green, and yellow (see Table S1). Analyses were based on 377 samples (234 MICR
30 GOS, 9 POMMIER2007, 103 FUHRMAN2008) collected from 164 distinct loca
14. Diversity in June
Ladau et al. (2013) ISME
doi:10.1038/ismej.2013.372.05 2.20 2.35 2.50 2.65
Log10(OTU Richness)
-9
-6
-3
0
30
60
90
-9
-6
-3
0
30
Log10(OTU Richness)
15. Diversity in December
Ladau et al. (2013) ISME
doi:10.1038/ismej.2013.372.05 2.20 2.35 2.50 2.65
Log10(OTU Richness)
-9
-6
-3
0
30
60
90
-9
-6
-3
0
30
16. Slime Mold & the
Greater Tokyo
Rail System
Tero et al (2010) Science
DOI: 10.1126/science.1177894http://youtu.be/GwKuFREOgmo
• 17 cm (7 in) agar-
filled petri dish
• plasmodium for
Tokyo
• quaker oats for
cities
• vegetate for a day
• decentralized,
distributed
planning
17. Tero et al (2010) Science
DOI: 10.1126/science.1177894
aftermath: no illumination
aftermath: geographic
constraint using illumination
18. The SlimeNet was comparable or
preferable to the RealNet in terms of:!
• efficiency
• fault tolerance
• cost
Actual Rail Network Slime Tubule Network
Tero et al (2010) Science
DOI: 10.1126/science.1177894
19. Human Evolution & Population Genetics
John Novembre
Ryan Hernandez
• 3,192 Europeans
• 500,568 SNPs
• Reduced to 2d (PCA)
Veeramah & Hammer (2014) Nat Rev Genet
doi:10.1038/nrg3625
out-of-Africa
bottleneck
• Europeans have less
genetic diversity
than Africans
Novembre et al (2008) Nature
doi:10.1038/nature07331
20. Genes mirror geography within Europe
Novembre et al (2008) Nature
doi:10.1038/nature07331
• Despite the low diversity in Europeans, 500 thousand common
variants discriminate population diversity with high resolution.
21. Medical Informatics
- An invited segment by Antoine Lizée -
How to build
intelligence around
patient medical
records
Adriana Karembeu & Antoine Lizee
at Sandler Neurosciences Center, UCSF
22. 4500 visits - 600 patients – 10th year (UCSF EPIC STUDY)
Images ~200MB/visit
Brain MRI
T1, T2,
proton density
Processed MRI
Cortical Thickness,
Myelin
Overlays
CT, Myelin,
Anatomical labels
GWAS
500,000+ SNPs
HLA
A,B,C,
DRB1, DQB1
Patient data Age, sex, history, etc.
Clinical data
Clinical Scores, treatments
Patient reported
Quality of Life questionnaires
Processed data
MRI-based
ReferenceData
Genotypes
~1MB/patient
(Para-) Clinical Data
~250 variables/visit
27. Highschool
Camp
College
Kin
UCSF
Research
Debate
1,278 nodes (1 type)
40,255 edges (1 type)
http://dhimmel.com
Facebook Friends
Genes
DiseasesPathophysiologies
Tissues
Genomic
Positions
Perturbations
Canonical
Pathways
BioCarta
KEGG
Reactome
miRNA
TFBS
Cancer
Hoods
Cancer
Modules
GO: BP
GO: MF
GO: CC
Oncogenic
Immunologic
Complex Diseases
29,241 nodes (19 types)
1,608,168 edges (20 types)
http://het.io
28. Multiple
SclerosisRF1 IL2RA
4 1 1 4
Multiple
SclerosisRF1 IRF8
4 1 1 4
Multiple
SclerosisRF1 CXCR4
4 2 1 4
Multiple
SclerosisRF1 Leukocyte
2 1 1 1
paths path
degree
product
degree
weighted
path count
0.707
0.25
0.25
0.177
0.677
0.707
ITCH
Lung
SUMO1
Multiple
Sclerosis
IRF1
Leukocyte
Crohn’s
Disease
IL2RA
IRF8
CXCR4
STAT3
expression
interaction
association
localization
association
association
association
interaction
Graph SubsetC
PDP(path) =
Y
d2Dpath
d w
metaedge-specific degrees
Network
G T De l
G G Di a
a aG D G Da
G Da
MetaPaths
GTDelGGDia
Multiple
SclerosisIRF1 IL2RA
4 1 1 4
Multiple
SclerosisIRF1 IRF8
4 1 1 4
Multiple
SclerosisIRF1 CXCR4
4 2 1 4
Multiple
SclerosisIRF1 Leukocyte
2 1 1 1
metapath paths path
degree
product
degree
weighted
path count
0.707
0.25
0.25
0.177
0.677
0.707
ITCHSUMO1
IL2RA
IRF8
CXCR4
interaction
a mG D P Dm
i iG G G Da
a lG D T Dl
e eG T G Da
physiology
B
D
PDP(path) =
Y
d2Dpath
d wm mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Da
m mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Da
DWPC(metapath) =
X
path2P aths
PDP(path)
metaedge-specific degrees
Feature Computation
{Cancer Hood}
{Positional}
GeTeGaDGiGeTlDGeTlD
{GO
Function}
{GO
Component}
{miRNA Target}
{BioCarta}
{Oncogenic}
{TF Target}
GaD
(any gene)
{Cancer Module}
GiGiGaD
{GO
Process}GiGaD{KEGG}
{Immunologic}
{Reactome}
{Perturbation}
GaDmPmD
GaD
(any disease)
GaDlTlD
GaDaGaD
2 0 2 4
Standardized Coe cient
Method (AUROC)
ridge (0.829)
lasso (0.823)
Machine Learning
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate
Recall
Partition (AUROC)
Testing (0.829)
Training (0.810)
Performance
0.2
0.6
1.0
1.4
1.8
Meta2.5
0.0 0.2 0.4 0.6 0.8 1.0
P-value
Density
Combine Predictions
& Statistical Evidence
15
Gene Meta2.5 HNLP WTCCC2
JAK2 0.047 0.102 0.0015
REL 0.001 0.040 0.0003
SH2B3 0.012 0.034 0.0130
RUNX3 0.016 0.025 0.0073
Table 5. Multiple sclerosis gene discovery.
Discover Novel
Susceptibility Genes
30. Mechanisms of Pathogenesis
Gene—{MSigDB Collection}—Gene—Disease DWPC Model
—
—
—
———
—
—
——
— — —
—
—
—
—
———
—
—
——
— — —
—
— —
—
—
0.4
0.6
0.8
1.0
Positional
C
ancerH
oodBioC
arta
G
O
C
om
ponent
m
iR
N
A
Target
G
O
FunctionR
eactom
eO
ncogenicTF
Target
KEG
G
G
O
Process
C
ancerM
odule
Im
m
unologic
Perturbation
Lasso
R
idge
AUROC
— ——
— ——
— ——
—
—— ——
— —
—
—
—
—
— —
—
—
0.4
0.6
0.8G
iG
aDG
eTeG
aD
G
eTlD
G
iG
eTlDG
aD
aG
aDG
aD
m
Pm
D
G
iG
iG
aD
G
aD
lTlD
G
aD
(any
gene)
G
aD
(any
disease)
Lasso
R
idge
AUROC
Pathophysiology
degenerative
immunologic
metabolic
neoplastic
psychiatric
unspeci c
37. Per Article Cost
from "Open Access: Market Size, Share, Forecast, and Trends"
Outsell. January 31, 2013
!
Subscription: $4,000.00
Open Access: $950.00
UCSF Open Access Fund
http://www.library.ucsf.edu/services/scholpub/oa/fund/eligibility
Fully OA Journal: $2,000
Hybrid OA: $1,000
• PeerJ — Lifetime publishing
plan for $99
• eLife — currently no APC,
“pain free publication”
• PLOS, BMC, Specialty Pubs
• F1000 Research, pre-review
publication
• preprints, arRxiv & bioRxiv
38. Article-level metrics
doi:10.1371/journal.pone.0013636.g005
Open Access increases Citations
Gargouri et al. PLOS One. 2010
• Alternative to journal impact
factor
• Citations, downloads, views,
social media
• Accelerates science —
impact factor = rejection
• Expands the audience
evaluating article importance
and quality
• Already used: h-index
Grow in importance
39. Public Data increases Citations
citations
Piwowar & Vision (2013)
DOI: 10.7717/peerj.175
• 10,555 microarray
studies
• Classified studies
by data availability
• 8 categories of
covariates
40. Availability & Reuse
• only applies to
original research
articles
• journals often
withhold the typeset
version
• does not affect reuse
Creative Commons Attribution Alone
Mandatory Archiving
!
NIH: PubMed Central
UC: eScholarship
• subscription journal
require the transfer of
article ownership
• enforce the article
copyright
• require licensing for
reuse
41. Tools for Efficiency & Reproducibility
Version control:
Online code repositories:
Interactive programming
environments:
ipython notebook