SlideShare uma empresa Scribd logo
1 de 62
Human Genetics & Big Data
Human Genetics & Big Data
Human Genetics & Ethics
Today we talk about
technology and methodology
Me, Us
• Allen Day, Principal Data Scientist, MapR
Human Genetics PhD, UCLA School of Medicine
6 years Hadoop, 10 years R (Genetics/Biostatistics)

• MapR
Distributes open source components for Hadoop
Adds major technology for performance, HA, industry standard
API’s

• See Also
– @allenday @mapR
– http://slideshare.net/allenday
– “allenday” most places (twitter, github, maprtech.com, etc.)
What Does Machine Learning Look
Like?
What Does Machine Learning Look
Like Under the Covers?
é
T
é A A ù é A A ù=ê
2 û ë
1
2 û
ë 1
ê
ë
é
=ê
ê
ë
é r ù é
ê 1 ú=ê
ê r2 ú ê
ë
û ë

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k,
high quality
O(κ d log k) or O(d log κ log k) for larger k,
looser quality
Here’s how to keep it simple yet powerful…

T ù
A1 úé
A1
AT úë
2 û

A2 ù
û

ù
T
T
A1 A1 A1 A 2 ú
AT A1 AT A 2 ú
2
2
û

ù
T
T
A1 A1 A1 A 2 úé h1
ê
T
T
úê h 2
A 2 A1 A 2 A 2 ûë
é
é T
ùê h1
T
r1 = ê A1 A1 A1 A 2 ú
ë
ûê h 2
ë

ù
ú
ú
û
ù
ú
ú
û
Behavior of a
crowd helps us
understand what
individuals will do

HOW RECOMMENDATIONS WORK
Recommendations
Alice

Charles

Alice got an apple and a
puppy

Charles got a bicycle
Recommendations
Alice

Bob

Charles

Alice got an apple and a
puppy

Bob got an apple

Charles got a bicycle
Recommendations
Alice

Bob

Charles

?

What else would Bob like?
Recommendations
Alice

Bob

Charles

A puppy, of course!
Recommendations
Alice
What if everybody gets a
pony?
Bob

Charles

?

Now what does Bob want?
Log Files
Alice
Charles
Charles
Alice

Alice
Bob
Bob
Log Files
u1

t1

u2

t2

u2

t3

u1

t4

u1

t3

u3

t3

u3

t1
Log Files and Dimensions
u1

t1

u2

t2

u2

t3

Things
t1

u1

t4
t2

u1

t3
t3

u3

t3

t4
u3

t1

Users
u1 Alice
u2 Charles
u3 Bob
History Matrix

Alice

✔

Bob

✔

Charles

✔

✔
✔
✔

✔
Co-occurrence Matrix

1

1
2

2

1
1

1

1
Indicator Matrix

✔
Indicator Matrix

✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators:

(t1)
Problems with Raw Co-occurrence
• Very popular items co-occur with everything
– Welcome document
– Elevator music
– Everybody wants a pony

• That isn’t interesting
– We want anomalous co-occurrence
Recommendation Basics
• Co-occurrence
t3

not t3

t1

2

1

not t1

1

1
Co-occurrence Matrix
not
1
not

1
Spot the Anomaly
A

not A

B

13

1000

not B

1000

100,000

A

not A

B

1

0

not B

0

10,000

0.90
4.52

A

not A

B

1

0

not B

0

2

A

not A

B

10

0

not B

0

100,000

1.95
14.3

• LLR (log likelihood ratio) is roughly like standard
deviations
Genes => Traits => Behaviors => Fitness
Typical Dimensions
in Genetics/Medicine
•
•
•
•

Genotype
Gene Expression
Samples
Phenotypes
Incidence/Co-occurrence
in Genetics/Medicine
• Genotype * Phenotype
• Genotype * Genotype (sample similarity)
• Sample * Sample (gene expression similarity)
– Known genes => Sample annotation
– Expression Level * Expression Level (sample similarity)
– Known samples => Gene annotation

• Gene expression * Phenotype
– Etiological subtypes & re-diagnosis

• Phenotype * Phenotype
– (expression distance OR genotype distance) Etiological reclassification
DTRA102-007 – Forensic DNA
Analysis Kit for Genetic Intelligence
•
•
•
•
•
•
•
•

Sex
Blood type
Ancestry
Hair morphology
Dimples
Freckles
Shoe size
Flat-footedness

•
•
•
•
•

Vision correction
Ear lobe attachment
Ear lobe crease
5th digit clinodactyly
Eye color, hair color, skin
color
• Height, handedness
• Etc

https://sbirsource.com/grantiq#/topics/85383
DTRA102-007: Sex and Ancestry
Genotype and Phenotypes & GWAS
DTRA102-007: chr7 Earlobe Morphology
SNPs and SNPs
HapMap: Genotype call / spatial ordering

This is the essence of the HapMap Project
Samples and Samples
Label sex based on expression
●
●

●
●
●
●● ●
●● ● ● ●
●
●●
●
● ●
●
●●
●●
●
● ●
● ● ● ● ● ● ●●
●●
●
● ●● ●● ● ●● ● ● ●●
●
● ●● ●
● ● ●●
●
●
●●
●
●●
●
● ●●
● ●
●
●
●
●
●
● ●● ●
● ● ●● ● ● ●●●● ●● ●● ● ● ●
●
●● ●
● ● ● ● ● ●●● ● ● ●●
●● ●
● ●
●● ● ●●● ●● ●● ● ●●
●
● ● ● ● ●●● ●● ● ●
● ●
●
●●
●
●
● ●● ● ● ●●●●●● ● ●●
● ● ●●●●●
●
●● ● ●●●●●● ●●●●●● ● ● ● ●● ●
● ● ●
●
●
●
●
● ● ● ●●● ●●●●●●●●●● ● ● ● ●
● ●
● ● ● ●
●
● ●
●●
●●● ● ● ● ●
●
● ●●●● ●●●● ●● ● ● ●
● ●
● ● ● ●●
●
●
●● ● ● ● ●●●●●●●●●● ● ●● ●●● ● ● ● ● ●
●
●●
●
● ● ● ●● ● ●●● ●●● ● ● ●●●
●●
● ●● ●●● ● ● ●● ● ●●●
●
●● ● ● ● ● ●
● ●● ● ● ●● ● ●
●
● ● ● ●●●●●●●● ●● ●●●●●
●
●●●●●●●●●●●●●●●●●●●●● ● ●
●
●
● ●●
● ●●
● ● ●
●
●
●●●● ●●● ● ●
●●●●●●●●●●●●●● ● ●● ●
●
●
●
●●
●
● ●● ●● ●●● ●●● ● ● ●
●
●●
●
●
●
● ●●● ●●●●● ●●●●●●●●●●●●●●
● ● ● ●●
●
● ● ● ●● ●●●● ●
●
● ● ●●●●●●●●●●●●●●●●●● ●●●
● ●●● ●● ●● ●● ●● ● ●
●
● ●●●● ●● ● ● ●
● ● ●● ● ● ●● ●
●
●
●
●●●● ● ●● ●
●●
●●
● ● ●● ●● ●
●
● ● ●● ●●●●● ●● ●● ●●●●● ● ●
● ●● ●●
●
● ● ● ●●●●●●●●● ●●● ●●●●●●●●●●● ● ●
● ● ●
●
● ● ● ●●●● ●●● ●●● ● ●● ●● ● ●
●
●
● ●
● ● ● ● ●● ● ● ●● ●
●● ● ● ● ● ●● ●
●
● ●●●●●●● ●● ● ●●● ●●●●
●
●
●
●
● ●●●● ●●●●●●●●●●● ●●●●● ●●● ● ●●●●●
●● ●
●●●● ●●● ● ●●●● ●
●●
●
●●●●● ● ● ●● ●●
● ●
●●● ●
●
●
● ●●
●● ● ●● ● ● ●● ●● ●● ● ●
●
● ●
● ● ●● ● ●●
●
●
● ●
●●
●
●● ●● ● ● ●●●●● ●●●●●● ● ● ●●● ●●● ●●● ● ●
●
●
●
● ●
● ● ●● ●●
● ● ●● ● ● ●
●● ●● ● ●●●●●●●●●● ●●● ●●●●● ●● ● ● ●
● ● ●● ●●●●●● ● ●
●● ●●●● ●● ●●● ● ● ●● ● ● ● ●
● ●
● ● ●●●
●●
●
● ● ● ● ●● ● ●
● ●● ●● ●●●● ●●●●● ● ●●● ●● ● ●
● ●
●
●
● ●● ● ● ●● ●
●
● ●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●● ●
●
●●● ● ●●●●●●●●●●●● ●● ●
●
●
● ●● ● ●
● ● ● ●●●● ●● ● ●
● ● ●●● ●●●●● ● ● ●
● ● ●● ● ● ●●●●●●●●●●●●●●●● ●●●●
●
● ● ● ● ●●●●●●●●●●●● ● ●
●
● ● ●●●● ●●●●●●●●●●● ●● ●●
●
● ●
●●●●● ●●●● ●● ●● ● ● ●
● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●
●● ●● ●●●● ●● ● ●●●
●●● ● ●
●● ●
●
●●● ●● ● ●● ●●● ● ●● ● ●
●
● ●●● ● ●●● ●●● ● ● ●●
●● ●●●●● ●●●●●●●●●● ●● ●● ●
●
●
●●●● ● ● ●●● ●● ● ● ●
●
●
●
● ●●●●●●●●●● ● ●●● ●●● ● ●
●●
●
● ● ● ● ●
●
●● ●● ●●●●●●●●●●●●●●● ● ● ● ●
● ●
●● ●●●●●●●●●●●● ●●
● ● ● ●●● ● ●● ●
●●
● ● ●●● ● ●● ●●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ● ● ●
●
●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●
●●● ●●●●● ●
● ● ●
●●● ●●● ●●●●●● ●● ●
●
● ●● ●● ●●●●●●● ● ● ●
●
●● ●
● ●● ● ●●● ●●
● ● ● ●● ● ●●●●● ● ●● ●● ● ● ●
●
● ●● ●
● ●
● ● ●●● ● ●●●●●●●●● ● ● ● ●● ●●
●● ●●●●●●● ●●●●●●●●●●● ●●●● ●●●●● ●● ● ●
●●● ● ●●● ● ●● ● ●●
● ●
●
●● ●●● ● ●● ● ●
●
● ●
●
●
●
● ● ●●●●●● ●●●●●●●● ●● ●● ●●● ●● ●● ● ●
● ●● ● ●● ● ● ●
●
● ● ● ●● ● ●●●●●●● ● ●● ● ●
● ● ●●●●●●●●●●●●●● ●● ● ●
●● ●
● ● ●● ●●●●●●● ●●●●● ●
●
●
●
●
●
●
●●● ● ● ●
● ●
● ●●● ●● ●●● ●● ●
●●● ●●●●●● ●●●●●● ● ●
●
●
●
●
● ● ●●●● ●●●●●●●●●●●● ● ●● ●●●● ●
●
●
●
● ●● ●●●●● ● ●● ●●●● ●
● ●●● ● ●●●●● ●● ●●● ● ● ● ●
● ●●●●●●●●●●●●●●●●● ●● ● ● ●● ● ● ●
●●●● ● ●
● ●●●●●●● ●●●● ● ●
●
●●
●
●
●●
●
● ● ● ●
● ●● ●
● ●●●●● ●● ●●● ●● ●● ● ●●● ●● ●
●
● ●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ●
●●●● ●● ● ●● ●●●●● ● ● ● ●
●●
●●
●●
●
●
●
●
●
● ●
●
● ● ●● ● ●● ● ●●●●●● ● ●●
●●
●● ● ●●● ●● ● ●
● ● ● ●●●● ● ●●● ● ●●●●●● ●● ● ● ●● ● ●
●
●
●
●●● ● ●
●● ●
●
●●
●●●● ●●●●● ●●● ● ● ● ●
● ●● ● ● ● ● ●
●
●
●
● ● ●● ● ●● ● ●
●
●
● ● ●
● ● ● ●● ●●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ●
●
●● ●● ● ● ● ● ● ●
●● ● ●●● ● ●
●●
● ●
●
●
●●
●
●
● ● ●● ●●●●●●●● ● ●●●● ●●●●●●● ● ● ● ● ●●
●● ● ●● ●●●●●● ●
●
●
●
●
● ●● ●● ●● ●●● ● ● ●
●●
●●
● ●● ● ●●●● ●●
●●● ●● ●●● ●●●●●●● ●● ●● ● ● ● ●●
●
●
●
● ●
● ● ●● ●● ●
● ●
●
●
●
● ●● ●● ● ● ●●
●
● ●●● ● ● ● ● ● ●
●
●
●
● ● ● ●●●●● ●● ●●●●●● ● ● ● ●
●
●
●● ●
●
● ●● ●● ●● ●●● ●
●
●●
● ●●●●●●●● ●●●● ●●●●● ●
● ● ● ●●●●●●●● ●●●●●●●●●●●● ● ● ● ●●
●●
●● ● ●
●
●● ● ● ●●● ●
●
●
● ●
●●●● ● ●●●
● ● ●
●
● ●
●
●● ● ● ●● ●● ● ●● ● ●●● ●● ●● ● ● ● ● ● ●
●
●
●●
●
● ●
●●● ● ● ● ● ● ● ●
● ●
● ●● ● ●●●●●● ● ● ● ● ● ●
●
●
●
●
●
● ●
● ●
●●● ●●● ●●●●●● ●●●● ●●●●● ● ●
●
●
● ● ●
●
●
●●
● ●
● ●● ● ● ● ● ●● ● ● ●
●
●
●
●
● ●
● ● ● ● ● ●● ● ● ● ●
● ●● ● ●●●● ●●●●● ●● ●● ● ●● ● ●
●● ● ●
●
● ●
●
●
●
● ● ●● ●● ● ●
● ● ● ●● ●●●
● ● ●● ●●●●● ● ●●●● ●● ● ●
●
●
●
● ● ● ●● ●
● ● ●●●● ● ●
●
● ●
● ●
● ●
●
●●
●
●●
● ●
● ●● ● ● ●●
●
● ●
●
●
●
● ● ● ● ●●● ●●
●
●
●
●
● ●●● ● ●●●●●●●●● ●● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ●●● ● ● ●●●●●●●●●● ●● ●
●● ●
●
●
● ● ●● ●
●
●● ● ● ●●●● ●● ● ●
●
●● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ● ● ● ● ● ● ●●
●
●
●●
●●
● ● ● ● ● ●●●●●●● ● ● ● ● ●●●●●
●
●●
●
●● ● ●
● ● ●● ● ●
●●
● ● ● ●●● ●● ●● ● ●● ● ●●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
● ●●●● ● ●
●
●
● ●● ● ● ● ● ● ● ● ●
● ● ● ●● ●● ● ●●●●● ● ● ●● ● ●
● ● ●● ● ●
●
●
● ●●● ● ● ●●
●
●
●
● ●
●
●
● ●● ●
●
●
●
●● ●
●●● ● ● ● ● ●
●
●
●
● ●
●●
● ●
●●
●
● ●● ● ● ● ● ● ●
●
●●
●●
● ● ●● ●
●●
●
● ● ● ● ●●● ●
●
●●
●
● ●● ●
● ●
●
●
● ● ●●
● ●● ● ●●
●● ● ● ●
●
● ● ● ● ●
● ●
●● ●
●
●● ●
●
●
●● ● ●●● ●
● ●
●●
●
●
●
●●
●
●
●
●
●
● ● ●
● ●
●● ● ●
●
●
●
●
● ● ●
●
●●
● ●
●
● ●
●
●
●
● ● ●● ● ● ● ● ● ● ●
●
●
● ●
● ●
●● ●
● ● ● ●●●● ● ●● ●●●
●
●
● ●● ●
●
●
●
●
●●
●
● ● ●
● ●● ● ●● ● ● ● ● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
● ●●
● ● ● ●●
●
●●
●
● ●●
●
●
●● ●
● ●
●
●
● ●● ●
●
●
●
●
● ●
●
● ●●
●
●
●
●
● ●
●
●
●
●
● ●
●●
●●
●●
●
● ●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ●
● ●●
●
●
●● ●
●
●● ● ● ●
●● ●
●
● ● ●
●
●
● ● ● ●●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
● ● ● ● ● ●●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●● ●
● ●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●● ●
● ●
●
●
●
● ●
●
● ● ● ● ●●
●
● ●
●
●
● ●
●
● ●●
●
●
●●
●
● ● ●
● ●
●
●
●●
●
●
●
●
●
●●
● ●● ● ●
●
● ●●
●●
● ●
● ●
●
●● ● ●
●
●
● ● ●
● ●
●
●
● ●● ●
● ●
●
●
●
●●
●
●
●
●●
●●
●● ● ● ●
●
●●
●
● ● ●●
● ●●●● ● ●
●● ●
●
●
●
●
● ●●
●
●
●●● ●
●●
●
●
●●
● ● ●
● ● ●● ● ● ● ● ● ●
●
● ●
● ●
●
● ●
●
●
●
● ●
● ● ●
●
●●
●
● ●
● ● ● ●●●
● ●
●●
● ●●● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●●●●
● ●●
●
● ●●
●
● ●
● ●
●
●
●
●
● ●● ● ● ● ● ● ● ● ●
●
● ●
●●●
●●●
● ●●
● ● ●
●
● ● ● ●● ●●● ● ●● ●● ●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●
●● ●
● ●
●
●
●
●
●
●
●
●●
●
●
●● ● ● ● ●
● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●
●
●
●
●
● ●
● ● ● ●●
●
●
● ●
●●
●
●
● ● ●
● ● ● ● ●● ●
●
●● ●
● ● ● ● ●●●●●● ● ●
●
● ●● ●
●
● ●● ● ● ● ●
●
●
● ● ●● ●
●
● ●● ● ● ●
●
● ●●
●
●
●●
● ● ● ● ● ●●●●● ● ●● ●● ● ● ●●● ● ● ●● ●●● ● ●●
●
● ● ● ●
● ●● ●●● ● ● ● ●● ●● ●●
●
● ● ●● ● ● ●
● ● ●
● ●●
● ●● ● ●● ● ● ● ● ●●● ● ●● ●●●●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ●●●● ● ● ●
● ●
●
●
●● ● ●● ●
● ●
●
● ●
● ●
●
●●● ●●●●● ● ● ●
●
●
●● ●
●
●●
● ● ● ●●
●● ● ●
● ● ● ●●●
●
● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●●●
●
●
●
● ● ●● ● ●
● ● ●● ●●●
●● ● ●●●●●
●
●
● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●
●
●
●
● ● ●● ● ● ● ●● ● ● ●●
●
●
●
● ●●
●● ●
●
● ●
● ● ●
● ● ●● ●● ●●●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●
● ● ● ●● ● ●● ●● ●●● ●● ●● ● ● ●● ● ● ● ●● ●
● ● ● ●● ● ● ●●
● ●●
●
● ● ● ●● ● ● ●
●●
● ●
● ● ● ● ● ● ● ● ● ● ●●
●
● ●● ●●
●●
●●● ● ● ●● ● ●
● ●
●
●
● ● ●● ●● ●● ●● ●●
● ●
●
●
● ●● ● ● ●●● ●●● ● ●●●●●●● ● ●● ● ●●●●●● ●● ● ● ●● ● ●●● ● ● ● ● ● ●●●●●● ●●
● ●
● ●●
●
●
● ● ● ●
●● ● ●●
●
●● ● ● ●●●●● ● ●●●●● ● ●● ●
●
●
●
● ●
●●● ●
●
●
●
●
●● ●●
● ● ●
●
●●● ● ●●●●● ● ●
● ● ●● ●● ● ●
● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ●● ● ● ●●● ●● ● ● ● ● ● ●
●
● ● ● ● ●● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ● ● ● ●●● ●●●
● ●●
●
● ●● ● ●
● ● ● ●● ●
●
● ● ●●●● ●●● ●● ●●●●
● ● ●● ● ●
●● ●
●
●
● ● ●
●
●●
● ●
● ●●
● ●● ● ●●●● ● ● ● ● ● ●● ●● ●●●●● ●●●●●●● ●● ● ● ● ●●● ● ●● ●●●● ● ●● ● ●●● ●● ● ● ● ● ● ● ●
●
● ● ●● ●
● ●
●● ● ● ●●● ●
● ● ● ● ● ●● ● ● ●
● ●
●
● ●● ● ●●●●●●●●●●●●●●● ●● ● ● ● ●●●●●● ●● ● ●●●
● ● ● ● ●● ●● ● ● ●
●●
●
●
●
●● ●● ● ●●● ●● ●●● ●● ● ● ● ●
●
●
●
● ● ●
● ● ●
● ●
● ●
●
● ●●
●
● ●
● ● ●● ● ● ●● ●
●●
●
● ●
●● ● ● ●
●
●● ● ●● ●●●●●●●●●●●●●● ●
●● ● ● ●● ●● ● ●● ● ● ●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●●● ●● ● ●●● ● ●●●●● ●● ●● ●● ● ●●● ● ● ● ● ● ●●
● ● ●●● ●
●● ● ● ● ●
●
●
●
● ● ● ●
● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●
● ●● ●● ●●
● ● ●
●
●
●
●●
●●● ●●●●●● ● ● ●
●
● ●● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ●●●● ● ●●● ● ●● ● ●●●● ● ● ●●● ● ● ●● ●●●● ● ● ● ●●●●●● ●● ●●● ● ●●●●● ●● ● ●●● ●● ●●● ●●●●● ● ●● ●● ●● ●● ● ● ●●
● ● ●
● ● ● ● ●● ●
●●
●
●
● ● ● ● ● ● ● ●●
●●●● ●●●●● ● ●● ●● ●
●● ● ●
● ● ● ●
● ●●●●●● ●●●●●●●●● ●●●● ●●●
● ●●●●● ●●●●●●●● ●●●
●●● ●
● ●● ● ●● ● ● ● ●
● ●
●
●
●
●
● ●●
●
● ● ●
●
●●● ●● ● ● ● ● ●● ●● ● ● ●●●●●●● ● ●●● ● ● ●●●●●●●●● ● ● ● ●●● ●●●● ●● ● ● ● ●●●● ●● ●●●●●● ● ●● ●● ● ● ● ● ● ● ● ●
●
● ●●
●
●
● ●
●
●
●● ●● ● ● ●
● ● ● ● ● ● ● ● ● ●● ●●
●
●
●● ●● ● ●
●
●
●
● ●● ● ●
●
● ●● ● ●● ●●● ●●● ●●●●
●
●● ● ● ● ●●●●●●●● ●●●●●●● ●●● ●● ●●● ● ● ●●
● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●
●● ● ●
● ●●● ● ●● ●●●●●● ●●●●
● ●
● ● ● ●●
●
●● ● ●●
●●●●● ●●● ● ●
●
●
●● ●● ● ●●● ● ● ● ● ● ●● ● ●●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ●●●●●
● ●●
●
● ●●● ● ● ●●● ● ●●● ●● ● ●● ●● ● ●●●●●●● ● ●●●●●●● ● ● ●●● ●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●● ● ● ●●● ● ●● ● ●●
● ●
●
● ● ● ● ●● ● ●
● ●
● ● ●
● ● ● ●● ●
●
● ● ●● ●●●● ●●● ●
● ●
●
●
●
●●
● ● ●
●
● ● ● ● ●● ● ● ● ● ●●●● ● ●● ●●● ● ● ● ● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ●●● ● ●● ●● ● ● ●●● ●●●● ●
●
●
● ●● ● ● ●
● ● ● ●●● ●● ● ● ●● ● ●
●
●
●
●
●●
● ●● ●● ● ●● ● ● ●● ● ●●● ● ● ● ●
● ●
●
● ●
● ●● ● ● ● ●● ●
●●
● ●●●● ●●●●● ●●●●● ●●● ● ●
●●
● ●
●
● ● ● ●●● ●●●●●●●● ●●●●●● ● ●
●
●
●●
●
●
● ● ●● ● ● ● ● ●● ●● ●● ● ●
● ●● ●
●
● ●
● ● ● ●
●● ● ● ● ●●● ●●●●●●●●●●●●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ●●●●●●● ● ● ● ●●●●●● ● ●●● ●● ● ●●●●●●●● ● ● ●●●●●●●● ● ●● ● ●●●● ●●● ● ● ● ● ●
●
● ● ● ●● ● ●
●
● ●
● ●
● ● ●
● ● ●● ● ●
●
●
● ●● ●● ● ●●●●● ●
●
● ●● ● ●●
●
●
●● ●●● ●● ●●
● ● ● ●●● ●
●
●
●
●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ●● ● ● ●
● ●●● ●● ● ● ●● ●● ● ●● ●● ● ● ●●● ●● ● ● ● ● ● ●● ● ●●● ●●● ● ● ●● ●●●● ● ● ●●● ●●●●● ● ● ●
●●
●
●● ●
● ● ●●
●
● ● ●● ●●● ● ●●
● ● ● ●●●● ● ●●● ●●●● ●● ● ● ● ● ●
● ● ●●● ●●●●● ●● ●
●● ● ● ●
●● ● ●
●
●
●
● ● ● ●● ●
●
●
● ●●
●
●
● ●● ●
●
●
● ● ●
●
● ● ●●●●●● ●●●●● ● ● ● ● ● ● ●● ● ● ●
● ● ●●●●● ●●●●● ●●
●
● ● ●● ●● ● ●
●
●
● ●● ● ● ●●● ● ●●● ● ● ●● ● ●●●●●●● ● ● ●● ●●●● ●● ● ● ● ● ● ●●●●● ●●● ●●●● ● ●●●●●●● ●●● ●● ●●●● ● ● ●●●●● ●
● ●
●
●
●
●
●
●●
●
● ● ● ●●●●● ●● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●● ●
●●
●
● ●
● ●● ●● ● ●● ● ● ● ● ●●
● ●
●
●
●
●●
●●
● ● ● ● ●
●
●
● ● ●●● ●
● ● ● ● ● ● ●● ●●● ● ● ●●● ● ●● ● ●●●●● ● ●●●●● ● ● ● ●● ●●●●●● ●●●● ● ●● ●●●● ● ● ●● ●●●●● ●● ●●● ● ●● ● ●● ●●●●●● ● ● ●●●●●●●●● ●●
● ● ● ● ● ●●●●●●●●●● ● ●● ● ● ●●
● ●
●
● ● ● ● ● ●●● ● ●●● ●
●●
● ● ● ●●● ●
● ● ● ● ● ● ● ●●
●●
●
●
●
●
●
● ● ● ● ●●
●●
●●● ● ● ● ● ● ●● ● ●● ●● ●●●●● ● ●●● ● ●● ● ●●● ●●●●● ● ●● ● ● ●● ● ●● ● ● ●● ● ●
● ●●
●
●
●●
●
●● ●
●
● ● ●● ● ● ●
●
●
● ● ● ●● ●●●●●● ●●●● ● ● ● ● ● ●
●●●
●
●●
● ● ●●● ●●●●● ●● ●● ●● ● ●
●●
●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ●●●● ●● ●●
●● ●
●
● ●● ● ● ● ● ●
●
●
● ●● ●●●● ●● ●●
● ●●● ● ● ●●● ●● ● ● ● ● ●●●●● ● ● ● ● ●●● ●●●●
● ● ● ●
● ●
●● ● ● ● ●
●
● ●
●
● ● ● ● ●
●
● ● ●
●
●
●●
● ●● ●
●● ● ●
●
●
●●●
●
●
●
●
●
●
● ● ●
●
●●
● ●●
●● ●● ● ●
●●
● ● ●●● ●
●
●●
●
● ● ●● ●
● ● ● ●
●
● ●
● ● ● ● ●●
●●
● ● ●●●●● ●● ● ● ● ●● ●● ●● ●●●● ● ●●● ● ● ● ●
●
●
● ● ● ●●●● ●
●●
● ●● ● ●
● ●
● ● ● ● ● ●●
●●
●●
● ●
●
●
●
●
●
●
●
● ●
● ● ● ● ●●● ● ●●●●● ● ● ●●● ● ●
● ● ● ● ●● ● ● ● ●
●
●
● ●●
●● ●
● ●
●
●
●
● ●●
● ● ● ● ●● ● ●
●
●
●
● ●● ●●
●●
●
●
●● ● ● ● ● ● ●
●●
● ●
● ●
●●
●
●
●
●
● ●● ●
● ● ●● ● ● ●●
●
●
● ● ●●
●
●
● ●●●
●
● ●
●
●
●● ● ●●● ●
●●
● ●
●
●
●
●
●
●● ●● ●● ● ● ● ● ● ●● ● ● ●
●
●
●
●
● ●
● ●●
●●
●
● ●
●
● ●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●

●

●

3.0
2.5
1.5

2.0

RPS4Y1 log(RMA)

3.5

●

●

1.5

2.0

2.5

3.0

XIST log10(RMA)

Celsius: a community resource for Affymetrix microarray data.
http://www.ncbi.nlm.nih.gov/pubmed/17570842

3.5
FZD10

SLC28A3

HSPC159

BDKRB1

HAS2

XYLT1

RNF24

RNF24

SOD2

RELB

RLF

NUPL1

EIF2C2

FOSL1

RELA

ETNK1

MMP12

AKR1C1

TNMD

CYTL1

SOX5

MIA

CHST3

PDLIM4

PDPN

WISP1

C1QTNF3

THBS3

COL10A1

COL10A1

COL11A1

COL11A1

EPYC

MATN3

MAST4

NGF

EDIL3

ITGA10

HAPLN1

HAPLN1

MATN4

LECT1

MATN1

COL9A1

COL11A2

COL11A2

ACAN

ACAN

ACAN

CSPG4

MMP13

NOS2A

LIF

MMP3

BMP2

BMP6

Expression and Expression (10K+ samples)
Gene Annotation (co-expression)
SLC28A3
HSPC159
BDKRB1
HAS2
XYLT1
RNF24
RNF24
SOD2
RELB
RLF
NUPL1
EIF2C2
FOSL1
RELA
ETNK1
MMP12
AKR1C1
TNMD
CYTL1
SOX5
MIA
CHST3
PDLIM4
PDPN
FZD10
WISP1
C1QTNF3
THBS3
COL10A1
COL10A1
COL11A1
COL11A1
EPYC
MATN3
MAST4
NGF
EDIL3
ITGA10
HAPLN1
HAPLN1
MATN4
ACAN
ACAN
ACAN
LECT1
MATN1
COL9A1
COL11A2
COL11A2
CSPG4
MMP13
NOS2A
LIF
MMP3
BMP2
BMP6

Disease gene characterization through large-scale co-expression analysis.
http://www.ncbi.nlm.nih.gov/pubmed/20046828
FZD10

SLC28A3

BDKRB1

HSPC159

HAS2

RNF24

XYLT1

RNF24

RELB

SOD2

RLF

EIF2C2

NUPL1

FOSL1

ETNK1

RELA

MMP12

TNMD

AKR1C1

CYTL1

MIA

SOX5

CHST3

PDPN

PDLIM4

WISP1

THBS3

C1QTNF3

COL10A1

COL11A1

COL10A1

COL11A1

MATN3

EPYC

MAST4

EDIL3

NGF

ITGA10

HAPLN1

HAPLN1

MATN4

MATN1

LECT1

COL11A2

COL9A1

COL11A2

ACAN

ACAN

ACAN

MMP13

CSPG4

NOS2A

MMP3

LIF

BMP2

BMP6

Co-expression (10K samples) and Linkage
Gene Annotation / Set Completion
SLC28A3
HSPC159
BDKRB1
HAS2
XYLT1
RNF24
RNF24
SOD2
RELB
RLF
NUPL1
EIF2C2
FOSL1
RELA
ETNK1
MMP12
AKR1C1
TNMD
CYTL1
SOX5
MIA
CHST3
PDLIM4
PDPN
FZD10
WISP1
C1QTNF3
THBS3
COL10A1
COL10A1
COL11A1
COL11A1
EPYC
MATN3
MAST4
NGF
EDIL3
ITGA10
HAPLN1
HAPLN1
MATN4
ACAN
ACAN
ACAN
LECT1
MATN1
COL9A1
COL11A2
COL11A2
CSPG4
MMP13
NOS2A
LIF
MMP3
BMP2
BMP6

+

=>

Disease gene characterization through large-scale co-expression analysis.
http://www.ncbi.nlm.nih.gov/pubmed/20046828
Typical Dimensions
in Genetics/Medicine
•
•
•
•

Genotype
Gene Expression
Samples
Phenotypes (traits/behavior)
Typical Dimensions
in Behavioral Data
•
•
•
•

Genotype
Gene Expression
Samples Individuals
Phenotype
– Traits
– Behaviors
Traits and Behaviors
Content Topic Modeling / UX Personalization
Behaviors and Outcomes
Economic Fitness (Korn/Ferry)

=>
Allen

Korn/Ferry ProSpective
http://linkedin.kornferry.com
Behavior of a
crowd helps us
understand what
individuals will do

HOW CROSS-RECOMMENDATIONS
WORK
Example Multi-modal Inputs
•
•
•
•

Overlap in restaurant visits is useful
Big spender cues
Cuisine as an indicator
Review text as an indicator
Too Limited
• People do more than one kind of thing
• Different kinds of behaviors give different quality,
quantity and kind of information
• We don’t have to do co-occurrence
• We can do cross-occurrence
• Result is cross-recommendation
For example
• Users enter queries (A)
– (actor = user, item=query)

• Users view videos (B)
– (actor = user, item=video)

• ATA gives query recommendation
– “did you mean to ask for”

• BTB gives video recommendation
– “you might like these videos”
The punch-line
• BTA recommends videos in response to a
query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
– “hombres del paco” times 400
– not much else

• Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
Real-life example
Hypothetical Example
• Want a navigational ontology?
• Just put labels on a web page with traffic
– This gives A = users x label clicks

• Remember viewing history
– This gives B = users x items

• Cross recommend
– B’A = label to item mapping

• After several users click, results are whatever
users think they should be
Previous Click Histories

user1

user2

user3
user4
user5
1

2

3

4

5

6

7

8
Detect similar content: 2 & 8

user1

user2

user3
user4
user5
1

2

3

4

5

6

7

8
Call to Action – Request Clicks

user1

Show me more:

user2

sports
user3

comedy

technology
user4
user5
1

2

3

4

5

6

7

8

“Under
Construction”
Guess Labels:
4=sports ; 2 & 8=comedy
user1

Show me more:

user2

sports

user4
user5
1

2

3

4

5

6

7

8

comedy

2&8

technology

user3

4

Under
construction
Extrapolate

1
3

Show me more:
userX

comedy

2

8

4
2
Matrices A (U*Q) and B (U*V)

Clicked Videos

Users

Query Term = Clicked Term

Users

Query Terms
Query Terms

Join on dimension U…

Users
Query Terms

Relate Q to V

Users
Relate Q to V

Query Terms

Clicked Videos
Medicine
Forensics

Job Performance

Genes => Traits => Behaviors => Fitness

Psychometrics
Movie Preferences
Genes

Job
Performance
(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com
(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com

= 185cm
Allen
(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com

= 185cm
Allen
(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com

= 185cm
Allen
Medicine
Forensics

Job Performance

Genes => Traits => Behaviors => Fitness

Psychometrics
Movie Preferences

Fitness
Reproductive Outcomes
Thank You!!
Me, Us
• Allen Day, Principal Data Scientist, MapR
Human Genetics PhD, UCLA School of Medicine
6 years Hadoop, 10 years R (Genetics/Biostatistics)

• MapR
Distributes open source components for Hadoop
Adds major technology for performance, HA, industry standard
API’s

• See Also
– @allenday @mapR
– http://slideshare.net/allenday
– “allenday” most places (twitter, github, maprtech.com, etc.)

Mais conteúdo relacionado

Destaque

Whitney Wheeler Resume (1) (4).PDF
Whitney Wheeler Resume (1) (4).PDFWhitney Wheeler Resume (1) (4).PDF
Whitney Wheeler Resume (1) (4).PDFWhitney Wheeler
 
Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...
Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...
Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...Philip Calvert
 
Mapa conceptual. GESTIÓN DE PROYECTO
Mapa conceptual. GESTIÓN DE PROYECTOMapa conceptual. GESTIÓN DE PROYECTO
Mapa conceptual. GESTIÓN DE PROYECTOMarcela Leon
 
App Sharing - Wezeit - LI Yan
App Sharing - Wezeit - LI YanApp Sharing - Wezeit - LI Yan
App Sharing - Wezeit - LI Yan妍 李
 
Linked in is it working for you
Linked in is it working for youLinked in is it working for you
Linked in is it working for youLesley Morrissey
 
Recent Events in Fund History
Recent Events in Fund HistoryRecent Events in Fund History
Recent Events in Fund HistoryKurtosys Systems
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIAllen Day, PhD
 
The Geology of South Raasay Dissertation
The Geology of South Raasay DissertationThe Geology of South Raasay Dissertation
The Geology of South Raasay DissertationJonathan Edwards
 
Smart Systems Revolutionizing Ag - Jason Bull
Smart Systems Revolutionizing Ag - Jason BullSmart Systems Revolutionizing Ag - Jason Bull
Smart Systems Revolutionizing Ag - Jason BullUIResearchPark
 
VR Introduction
VR IntroductionVR Introduction
VR IntroductionAdam Chen
 
How to Create a Content Marketing Tactical Plan for LinkedIn
How to Create a Content Marketing Tactical Plan for LinkedInHow to Create a Content Marketing Tactical Plan for LinkedIn
How to Create a Content Marketing Tactical Plan for LinkedInLinkedIn
 
Service Design meets UX Design
Service Design meets UX DesignService Design meets UX Design
Service Design meets UX DesignFranziska Semer
 
Transforming safe html
Transforming safe htmlTransforming safe html
Transforming safe htmlPrakhar Joshi
 

Destaque (16)

Whitney Wheeler Resume (1) (4).PDF
Whitney Wheeler Resume (1) (4).PDFWhitney Wheeler Resume (1) (4).PDF
Whitney Wheeler Resume (1) (4).PDF
 
PDHPE
PDHPEPDHPE
PDHPE
 
Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...
Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...
Social Media and LinkedIn for IFAs and Financial Planners - Full-day workshop...
 
Mapa conceptual. GESTIÓN DE PROYECTO
Mapa conceptual. GESTIÓN DE PROYECTOMapa conceptual. GESTIÓN DE PROYECTO
Mapa conceptual. GESTIÓN DE PROYECTO
 
Svarka ageev
Svarka ageevSvarka ageev
Svarka ageev
 
App Sharing - Wezeit - LI Yan
App Sharing - Wezeit - LI YanApp Sharing - Wezeit - LI Yan
App Sharing - Wezeit - LI Yan
 
Linked in is it working for you
Linked in is it working for youLinked in is it working for you
Linked in is it working for you
 
Recent Events in Fund History
Recent Events in Fund HistoryRecent Events in Fund History
Recent Events in Fund History
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
 
Sixth sense technology
Sixth sense technologySixth sense technology
Sixth sense technology
 
The Geology of South Raasay Dissertation
The Geology of South Raasay DissertationThe Geology of South Raasay Dissertation
The Geology of South Raasay Dissertation
 
Smart Systems Revolutionizing Ag - Jason Bull
Smart Systems Revolutionizing Ag - Jason BullSmart Systems Revolutionizing Ag - Jason Bull
Smart Systems Revolutionizing Ag - Jason Bull
 
VR Introduction
VR IntroductionVR Introduction
VR Introduction
 
How to Create a Content Marketing Tactical Plan for LinkedIn
How to Create a Content Marketing Tactical Plan for LinkedInHow to Create a Content Marketing Tactical Plan for LinkedIn
How to Create a Content Marketing Tactical Plan for LinkedIn
 
Service Design meets UX Design
Service Design meets UX DesignService Design meets UX Design
Service Design meets UX Design
 
Transforming safe html
Transforming safe htmlTransforming safe html
Transforming safe html
 

Semelhante a 20131212 - Sydney - Garvan Institute - Human Genetics and Big Data

アイ・トレーニング10点)
アイ・トレーニング10点)アイ・トレーニング10点)
アイ・トレーニング10点)kenji sakuma
 
Pocket dot grid pages
Pocket dot grid pagesPocket dot grid pages
Pocket dot grid pagesHIKOO
 
Fairisle knitting
Fairisle knittingFairisle knitting
Fairisle knittingzafiro555
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analyticsDatadog
 
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.questRCN
 
Optimal Nudging. Presentation UD.
Optimal Nudging. Presentation UD.Optimal Nudging. Presentation UD.
Optimal Nudging. Presentation UD.r-uribe
 
2018 jsm vancouver
2018 jsm vancouver2018 jsm vancouver
2018 jsm vancouverBin Chen
 
The Ecology of Forage Fish in the Salish Sea
The Ecology of Forage Fish in the Salish SeaThe Ecology of Forage Fish in the Salish Sea
The Ecology of Forage Fish in the Salish SeaTessa Francis
 
Comparing public RNA-seq data
Comparing public RNA-seq dataComparing public RNA-seq data
Comparing public RNA-seq datamikaelhuss
 
A Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage EvolutionA Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage Evolutionjon_bell
 
Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Laura Berry
 
Consumer Preferences in Real Estate Markets
Consumer Preferences in Real Estate MarketsConsumer Preferences in Real Estate Markets
Consumer Preferences in Real Estate MarketsDominik Kalisch
 
Unit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth RoundUnit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth RoundSebastiano Panichella
 
データ社会を生きる技術〜人工知能のHypeとHope〜
データ社会を生きる技術〜人工知能のHypeとHope〜データ社会を生きる技術〜人工知能のHypeとHope〜
データ社会を生きる技術〜人工知能のHypeとHope〜Ichigaku Takigawa
 
Innovation & Issue process
Innovation & Issue processInnovation & Issue process
Innovation & Issue processKatsuhito Okada
 
Advanced Procedural Rendering in DirectX11 - CEDEC 2012
Advanced Procedural Rendering in DirectX11 - CEDEC 2012 Advanced Procedural Rendering in DirectX11 - CEDEC 2012
Advanced Procedural Rendering in DirectX11 - CEDEC 2012 smashflt
 

Semelhante a 20131212 - Sydney - Garvan Institute - Human Genetics and Big Data (20)

アイ・トレーニング10点)
アイ・トレーニング10点)アイ・トレーニング10点)
アイ・トレーニング10点)
 
Pocket dot grid pages
Pocket dot grid pagesPocket dot grid pages
Pocket dot grid pages
 
Fairisle knitting
Fairisle knittingFairisle knitting
Fairisle knitting
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.Aiello-Lammens:  Global Sensitivity Analysis for Impact Assessments.
Aiello-Lammens: Global Sensitivity Analysis for Impact Assessments.
 
Optimal Nudging. Presentation UD.
Optimal Nudging. Presentation UD.Optimal Nudging. Presentation UD.
Optimal Nudging. Presentation UD.
 
2018 jsm vancouver
2018 jsm vancouver2018 jsm vancouver
2018 jsm vancouver
 
Rgraphics
RgraphicsRgraphics
Rgraphics
 
The Ecology of Forage Fish in the Salish Sea
The Ecology of Forage Fish in the Salish SeaThe Ecology of Forage Fish in the Salish Sea
The Ecology of Forage Fish in the Salish Sea
 
17 polishing
17 polishing17 polishing
17 polishing
 
Comparing public RNA-seq data
Comparing public RNA-seq dataComparing public RNA-seq data
Comparing public RNA-seq data
 
A Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage EvolutionA Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage Evolution
 
Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?
 
Tokyor16
Tokyor16Tokyor16
Tokyor16
 
Consumer Preferences in Real Estate Markets
Consumer Preferences in Real Estate MarketsConsumer Preferences in Real Estate Markets
Consumer Preferences in Real Estate Markets
 
Unit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth RoundUnit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth Round
 
データ社会を生きる技術〜人工知能のHypeとHope〜
データ社会を生きる技術〜人工知能のHypeとHope〜データ社会を生きる技術〜人工知能のHypeとHope〜
データ社会を生きる技術〜人工知能のHypeとHope〜
 
14 case-study
14 case-study14 case-study
14 case-study
 
Innovation & Issue process
Innovation & Issue processInnovation & Issue process
Innovation & Issue process
 
Advanced Procedural Rendering in DirectX11 - CEDEC 2012
Advanced Procedural Rendering in DirectX11 - CEDEC 2012 Advanced Procedural Rendering in DirectX11 - CEDEC 2012
Advanced Procedural Rendering in DirectX11 - CEDEC 2012
 

Mais de Allen Day, PhD

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...Allen Day, PhD
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...Allen Day, PhD
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser UniversityAllen Day, PhD
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - WageningenAllen Day, PhD
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMAllen Day, PhD
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIAllen Day, PhD
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Allen Day, PhD
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseAllen Day, PhD
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't SpecialAllen Day, PhD
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsAllen Day, PhD
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...Allen Day, PhD
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Allen Day, PhD
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersAllen Day, PhD
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 

Mais de Allen Day, PhD (20)

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data Engineers
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 

Último

Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?bkling
 
Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!ibtesaam huma
 
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATROApril 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATROKanhu Charan
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfDivya Kanojiya
 
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...sdateam0
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxDr. Dheeraj Kumar
 
SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptMumux Mirani
 
POST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptxPOST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptxvirengeeta
 
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfLippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfSreeja Cherukuru
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt downloadAnkitKumar311566
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptxMohamed Rizk Khodair
 
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxPERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxdrashraf369
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurNavdeep Kaur
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdfDolisha Warbi
 

Último (20)

Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?
 
Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!
 
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATROApril 2024 ONCOLOGY CARTOON by  DR KANHU CHARAN PATRO
April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATRO
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdf
 
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...Big Data Analysis Suggests COVID  Vaccination Increases Excess Mortality Of  ...
Big Data Analysis Suggests COVID Vaccination Increases Excess Mortality Of ...
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptx
 
SWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.pptSWD (Short wave diathermy)- Physiotherapy.ppt
SWD (Short wave diathermy)- Physiotherapy.ppt
 
POST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptxPOST NATAL EXERCISES AND ITS IMPACT.pptx
POST NATAL EXERCISES AND ITS IMPACT.pptx
 
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfLippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt download
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptx
 
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxPERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptx
 
Epilepsy
EpilepsyEpilepsy
Epilepsy
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
 

20131212 - Sydney - Garvan Institute - Human Genetics and Big Data

  • 1. Human Genetics & Big Data
  • 2. Human Genetics & Big Data Human Genetics & Ethics Today we talk about technology and methodology
  • 3. Me, Us • Allen Day, Principal Data Scientist, MapR Human Genetics PhD, UCLA School of Medicine 6 years Hadoop, 10 years R (Genetics/Biostatistics) • MapR Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • See Also – @allenday @mapR – http://slideshare.net/allenday – “allenday” most places (twitter, github, maprtech.com, etc.)
  • 4. What Does Machine Learning Look Like?
  • 5. What Does Machine Learning Look Like Under the Covers? é T é A A ù é A A ù=ê 2 û ë 1 2 û ë 1 ê ë é =ê ê ë é r ù é ê 1 ú=ê ê r2 ú ê ë û ë O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality Here’s how to keep it simple yet powerful… T ù A1 úé A1 AT úë 2 û A2 ù û ù T T A1 A1 A1 A 2 ú AT A1 AT A 2 ú 2 2 û ù T T A1 A1 A1 A 2 úé h1 ê T T úê h 2 A 2 A1 A 2 A 2 ûë é é T ùê h1 T r1 = ê A1 A1 A1 A 2 ú ë ûê h 2 ë ù ú ú û ù ú ú û
  • 6. Behavior of a crowd helps us understand what individuals will do HOW RECOMMENDATIONS WORK
  • 7. Recommendations Alice Charles Alice got an apple and a puppy Charles got a bicycle
  • 8. Recommendations Alice Bob Charles Alice got an apple and a puppy Bob got an apple Charles got a bicycle
  • 11. Recommendations Alice What if everybody gets a pony? Bob Charles ? Now what does Bob want?
  • 14. Log Files and Dimensions u1 t1 u2 t2 u2 t3 Things t1 u1 t4 t2 u1 t3 t3 u3 t3 t4 u3 t1 Users u1 Alice u2 Charles u3 Bob
  • 18. Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1)
  • 19. Problems with Raw Co-occurrence • Very popular items co-occur with everything – Welcome document – Elevator music – Everybody wants a pony • That isn’t interesting – We want anomalous co-occurrence
  • 22. Spot the Anomaly A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 0.90 4.52 A not A B 1 0 not B 0 2 A not A B 10 0 not B 0 100,000 1.95 14.3 • LLR (log likelihood ratio) is roughly like standard deviations
  • 23. Genes => Traits => Behaviors => Fitness
  • 25. Incidence/Co-occurrence in Genetics/Medicine • Genotype * Phenotype • Genotype * Genotype (sample similarity) • Sample * Sample (gene expression similarity) – Known genes => Sample annotation – Expression Level * Expression Level (sample similarity) – Known samples => Gene annotation • Gene expression * Phenotype – Etiological subtypes & re-diagnosis • Phenotype * Phenotype – (expression distance OR genotype distance) Etiological reclassification
  • 26. DTRA102-007 – Forensic DNA Analysis Kit for Genetic Intelligence • • • • • • • • Sex Blood type Ancestry Hair morphology Dimples Freckles Shoe size Flat-footedness • • • • • Vision correction Ear lobe attachment Ear lobe crease 5th digit clinodactyly Eye color, hair color, skin color • Height, handedness • Etc https://sbirsource.com/grantiq#/topics/85383
  • 28. Genotype and Phenotypes & GWAS DTRA102-007: chr7 Earlobe Morphology
  • 29. SNPs and SNPs HapMap: Genotype call / spatial ordering This is the essence of the HapMap Project
  • 30. Samples and Samples Label sex based on expression ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ●●● ●● ●● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●●●●● ● ●● ● ● ●●●●● ● ●● ● ●●●●●● ●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●●●● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●●●●●●●●● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ●●● ● ● ●●● ●● ● ●● ●●● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●●●●● ●● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●●● ●●● ● ● ●●●●●●●●●●●●●● ● ●● ● ● ● ● ●● ● ● ●● ●● ●●● ●●● ● ● ● ● ●● ● ● ● ● ●●● ●●●●● ●●●●●●●●●●●●●● ● ● ● ●● ● ● ● ● ●● ●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●● ●●● ● ●●● ●● ●● ●● ●● ● ● ● ● ●●●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ●● ●●●●● ●● ●● ●●●●● ● ● ● ●● ●● ● ● ● ● ●●●●●●●●● ●●● ●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●●● ●●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●●●●●●● ●● ● ●●● ●●●● ● ● ● ● ● ●●●● ●●●●●●●●●●● ●●●●● ●●● ● ●●●●● ●● ● ●●●● ●●● ● ●●●● ● ●● ● ●●●●● ● ● ●● ●● ● ● ●●● ● ● ● ● ●● ●● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ●●●●● ●●●●●● ● ● ●●● ●●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ●●●●●●●●●● ●●● ●●●●● ●● ● ● ● ● ● ●● ●●●●●● ● ● ●● ●●●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●●● ●●●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●● ● ● ●●● ● ●●●●●●●●●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ●● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ●● ● ● ●●●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ●●●●●●●●●●●● ● ● ● ● ● ●●●● ●●●●●●●●●●● ●● ●● ● ● ● ●●●●● ●●●● ●● ●● ● ● ● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●● ●● ●●●● ●● ● ●●● ●●● ● ● ●● ● ● ●●● ●● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ●●● ●●● ● ● ●● ●● ●●●●● ●●●●●●●●●● ●● ●● ● ● ● ●●●● ● ● ●●● ●● ● ● ● ● ● ● ● ●●●●●●●●●● ● ●●● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ●●●●●●●●●●●● ●● ● ● ● ●●● ● ●● ● ●● ● ● ●●● ● ●● ●●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ● ● ● ● ●●● ●●● ●●●●●● ●● ● ● ● ●● ●● ●●●●●●● ● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ●● ● ●●●●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●●●●●●●●● ● ● ● ●● ●● ●● ●●●●●●● ●●●●●●●●●●● ●●●● ●●●●● ●● ● ● ●●● ● ●●● ● ●● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●●● ●●●●●●●● ●● ●● ●●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ●●●●●●● ● ●● ● ● ● ● ●●●●●●●●●●●●●● ●● ● ● ●● ● ● ● ●● ●●●●●●● ●●●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ●● ●●● ●● ● ●●● ●●●●●● ●●●●●● ● ● ● ● ● ● ● ● ●●●● ●●●●●●●●●●●● ● ●● ●●●● ● ● ● ● ● ●● ●●●●● ● ●● ●●●● ● ● ●●● ● ●●●●● ●● ●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●● ●● ● ● ●● ● ● ● ●●●● ● ● ● ●●●●●●● ●●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●● ●● ●●● ●● ●● ● ●●● ●● ● ● ● ●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ●●●● ●● ● ●● ●●●●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●●●● ● ●● ●● ●● ● ●●● ●● ● ● ● ● ● ●●●● ● ●●● ● ●●●●●● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ●●●● ●●●●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●●●●●●● ● ●●●● ●●●●●●● ● ● ● ● ●● ●● ● ●● ●●●●●● ● ● ● ● ● ● ●● ●● ●● ●●● ● ● ● ●● ●● ● ●● ● ●●●● ●● ●●● ●● ●●● ●●●●●●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●● ●●●●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ●●● ● ● ●● ● ●●●●●●●● ●●●● ●●●●● ● ● ● ● ●●●●●●●● ●●●●●●●●●●●● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●●●●●● ●●●● ●●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●●● ●●●●● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ● ●● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ●●●●●●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●●●●●●●●● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●●●●●●● ● ● ● ● ●●●●● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ●●● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●●●● ● ●● ●● ● ● ●●● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ●● ●●●●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●●●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ●● ●●● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ●●● ●●● ● ●●●●●●● ● ●● ● ●●●●●● ●● ● ● ●● ● ●●● ● ● ● ● ● ●●●●●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●●●●● ● ●●●●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●●● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ●●● ●● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●●●● ● ● ● ● ● ●● ●● ●●●●● ●●●●●●● ●● ● ● ● ●●● ● ●● ●●●● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●●●●●●●●●●●●●● ●● ● ● ● ●●●●●● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ●●● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ●●●●●●●●●●●●●● ● ●● ● ● ●● ●● ● ●● ● ● ●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●●● ●● ● ●●● ● ●●●●● ●● ●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ● ●● ●●● ●●●●●● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ●●●● ● ●●● ● ●● ● ●●●● ● ● ●●● ● ● ●● ●●●● ● ● ● ●●●●●● ●● ●●● ● ●●●●● ●● ● ●●● ●● ●●● ●●●●● ● ●● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●●● ●●●●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●●●●●● ●●●●●●●●● ●●●● ●●● ● ●●●●● ●●●●●●●● ●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ●● ● ● ●●●●●●● ● ●●● ● ● ●●●●●●●●● ● ● ● ●●● ●●●● ●● ● ● ● ●●●● ●● ●●●●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●●● ●●● ●●●● ● ●● ● ● ● ●●●●●●●● ●●●●●●● ●●● ●● ●●● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ●● ●●●●●● ●●●● ● ● ● ● ● ●● ● ●● ● ●● ●●●●● ●●● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ●● ● ●●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ●●●●● ● ●● ● ● ●●● ● ● ●●● ● ●●● ●● ● ●● ●● ● ●●●●●●● ● ●●●●●●● ● ● ●●● ●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●● ●●● ● ● ● ● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ●●● ● ●● ●● ● ● ●●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●●●● ●●●●● ●●●●● ●●● ● ● ●● ● ● ● ● ● ● ●●● ●●●●●●●● ●●●●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●●●●●●●●●●●●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ●●●●●●● ● ● ● ●●●●●● ● ●●● ●● ● ●●●●●●●● ● ● ●●●●●●●● ● ●● ● ●●●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●●●●● ● ● ● ●● ● ●● ● ● ●● ●●● ●● ●● ● ● ● ●●● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ●● ● ● ● ● ●●● ●● ● ● ●● ●● ● ●● ●● ● ● ●●● ●● ● ● ● ● ● ●● ● ●●● ●●● ● ● ●● ●●●● ● ● ●●● ●●●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ●●●● ● ●●● ●●●● ●● ● ● ● ● ● ● ● ●●● ●●●●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●● ●●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●●● ●●●●● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ●●● ● ●●● ● ● ●● ● ●●●●●●● ● ● ●● ●●●● ●● ● ● ● ● ● ●●●●● ●●● ●●●● ● ●●●●●●● ●●● ●● ●●●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ●● ● ●●●●● ● ●●●●● ● ● ● ●● ●●●●●● ●●●● ● ●● ●●●● ● ● ●● ●●●●● ●● ●●● ● ●● ● ●● ●●●●●● ● ● ●●●●●●●●● ●● ● ● ● ● ● ●●●●●●●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ●● ● ●● ●● ●●●●● ● ●●● ● ●● ● ●●● ●●●●● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●●●● ●●●● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ●●●●● ●● ●● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ●●●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●●● ●● ●● ● ●●● ● ● ●●● ●● ● ● ● ● ●●●●● ● ● ● ● ●●● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●●●● ●● ● ● ● ●● ●● ●● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● 3.0 2.5 1.5 2.0 RPS4Y1 log(RMA) 3.5 ● ● 1.5 2.0 2.5 3.0 XIST log10(RMA) Celsius: a community resource for Affymetrix microarray data. http://www.ncbi.nlm.nih.gov/pubmed/17570842 3.5
  • 31. FZD10 SLC28A3 HSPC159 BDKRB1 HAS2 XYLT1 RNF24 RNF24 SOD2 RELB RLF NUPL1 EIF2C2 FOSL1 RELA ETNK1 MMP12 AKR1C1 TNMD CYTL1 SOX5 MIA CHST3 PDLIM4 PDPN WISP1 C1QTNF3 THBS3 COL10A1 COL10A1 COL11A1 COL11A1 EPYC MATN3 MAST4 NGF EDIL3 ITGA10 HAPLN1 HAPLN1 MATN4 LECT1 MATN1 COL9A1 COL11A2 COL11A2 ACAN ACAN ACAN CSPG4 MMP13 NOS2A LIF MMP3 BMP2 BMP6 Expression and Expression (10K+ samples) Gene Annotation (co-expression) SLC28A3 HSPC159 BDKRB1 HAS2 XYLT1 RNF24 RNF24 SOD2 RELB RLF NUPL1 EIF2C2 FOSL1 RELA ETNK1 MMP12 AKR1C1 TNMD CYTL1 SOX5 MIA CHST3 PDLIM4 PDPN FZD10 WISP1 C1QTNF3 THBS3 COL10A1 COL10A1 COL11A1 COL11A1 EPYC MATN3 MAST4 NGF EDIL3 ITGA10 HAPLN1 HAPLN1 MATN4 ACAN ACAN ACAN LECT1 MATN1 COL9A1 COL11A2 COL11A2 CSPG4 MMP13 NOS2A LIF MMP3 BMP2 BMP6 Disease gene characterization through large-scale co-expression analysis. http://www.ncbi.nlm.nih.gov/pubmed/20046828
  • 32. FZD10 SLC28A3 BDKRB1 HSPC159 HAS2 RNF24 XYLT1 RNF24 RELB SOD2 RLF EIF2C2 NUPL1 FOSL1 ETNK1 RELA MMP12 TNMD AKR1C1 CYTL1 MIA SOX5 CHST3 PDPN PDLIM4 WISP1 THBS3 C1QTNF3 COL10A1 COL11A1 COL10A1 COL11A1 MATN3 EPYC MAST4 EDIL3 NGF ITGA10 HAPLN1 HAPLN1 MATN4 MATN1 LECT1 COL11A2 COL9A1 COL11A2 ACAN ACAN ACAN MMP13 CSPG4 NOS2A MMP3 LIF BMP2 BMP6 Co-expression (10K samples) and Linkage Gene Annotation / Set Completion SLC28A3 HSPC159 BDKRB1 HAS2 XYLT1 RNF24 RNF24 SOD2 RELB RLF NUPL1 EIF2C2 FOSL1 RELA ETNK1 MMP12 AKR1C1 TNMD CYTL1 SOX5 MIA CHST3 PDLIM4 PDPN FZD10 WISP1 C1QTNF3 THBS3 COL10A1 COL10A1 COL11A1 COL11A1 EPYC MATN3 MAST4 NGF EDIL3 ITGA10 HAPLN1 HAPLN1 MATN4 ACAN ACAN ACAN LECT1 MATN1 COL9A1 COL11A2 COL11A2 CSPG4 MMP13 NOS2A LIF MMP3 BMP2 BMP6 + => Disease gene characterization through large-scale co-expression analysis. http://www.ncbi.nlm.nih.gov/pubmed/20046828
  • 33. Typical Dimensions in Genetics/Medicine • • • • Genotype Gene Expression Samples Phenotypes (traits/behavior)
  • 34. Typical Dimensions in Behavioral Data • • • • Genotype Gene Expression Samples Individuals Phenotype – Traits – Behaviors
  • 35. Traits and Behaviors Content Topic Modeling / UX Personalization
  • 36. Behaviors and Outcomes Economic Fitness (Korn/Ferry) => Allen Korn/Ferry ProSpective http://linkedin.kornferry.com
  • 37. Behavior of a crowd helps us understand what individuals will do HOW CROSS-RECOMMENDATIONS WORK
  • 38. Example Multi-modal Inputs • • • • Overlap in restaurant visits is useful Big spender cues Cuisine as an indicator Review text as an indicator
  • 39. Too Limited • People do more than one kind of thing • Different kinds of behaviors give different quality, quantity and kind of information • We don’t have to do co-occurrence • We can do cross-occurrence • Result is cross-recommendation
  • 40. For example • Users enter queries (A) – (actor = user, item=query) • Users view videos (B) – (actor = user, item=video) • ATA gives query recommendation – “did you mean to ask for” • BTB gives video recommendation – “you might like these videos”
  • 41. The punch-line • BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 42. Real-life example • Query: “Paco de Lucia” • Conventional meta-data search results: – “hombres del paco” times 400 – not much else • Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 44. Hypothetical Example • Want a navigational ontology? • Just put labels on a web page with traffic – This gives A = users x label clicks • Remember viewing history – This gives B = users x items • Cross recommend – B’A = label to item mapping • After several users click, results are whatever users think they should be
  • 46. Detect similar content: 2 & 8 user1 user2 user3 user4 user5 1 2 3 4 5 6 7 8
  • 47. Call to Action – Request Clicks user1 Show me more: user2 sports user3 comedy technology user4 user5 1 2 3 4 5 6 7 8 “Under Construction”
  • 48. Guess Labels: 4=sports ; 2 & 8=comedy user1 Show me more: user2 sports user4 user5 1 2 3 4 5 6 7 8 comedy 2&8 technology user3 4 Under construction
  • 50. Matrices A (U*Q) and B (U*V) Clicked Videos Users Query Term = Clicked Term Users Query Terms
  • 51. Query Terms Join on dimension U… Users
  • 52. Query Terms Relate Q to V Users
  • 53. Relate Q to V Query Terms Clicked Videos
  • 54. Medicine Forensics Job Performance Genes => Traits => Behaviors => Fitness Psychometrics Movie Preferences
  • 56. (Traits/Behaviors) and Outcomes Reproductive Fitness (eHarmony) eHarmony @ Hadoop World: Data Science of Love http://eharmony.com
  • 57. (Traits/Behaviors) and Outcomes Reproductive Fitness (eHarmony) eHarmony @ Hadoop World: Data Science of Love http://eharmony.com = 185cm Allen
  • 58. (Traits/Behaviors) and Outcomes Reproductive Fitness (eHarmony) eHarmony @ Hadoop World: Data Science of Love http://eharmony.com = 185cm Allen
  • 59. (Traits/Behaviors) and Outcomes Reproductive Fitness (eHarmony) eHarmony @ Hadoop World: Data Science of Love http://eharmony.com = 185cm Allen
  • 60. Medicine Forensics Job Performance Genes => Traits => Behaviors => Fitness Psychometrics Movie Preferences Fitness Reproductive Outcomes
  • 62. Me, Us • Allen Day, Principal Data Scientist, MapR Human Genetics PhD, UCLA School of Medicine 6 years Hadoop, 10 years R (Genetics/Biostatistics) • MapR Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • See Also – @allenday @mapR – http://slideshare.net/allenday – “allenday” most places (twitter, github, maprtech.com, etc.)

Notas do Editor

  1. Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  2. Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.
  3. I suppressed slide and added a duplicate with arrow to show that the line from the indicator matrix goes into indicator field of the same Solr index that stores meta data for each item.
  4. Allen: I suppressed slide and added a duplicate with arrow to show that the line from the indicator matrix goes into indicator field of the same Solr index that stores meta data for each item.May want to explain that the model to produce indicator matrix can be done with Apache Mahout or other approaches. A nifty way to deploy it is to use Apache Solr (such as LucidWorks) to build an index for metadata for the items (shown here). Then the output of the ML model, the indicator data (also shown here) goes into a field in the same index. All this done offline ahead of time, so that makes the actual step of recommendation fast. A new user arrives, interacts and that event triggers a Solr search to find matching ID’s in indicator fields of different documents, hence the source of recommendation. Because only that part is done live, the response can be FAST
  5. Point out what matters is SIGNIFICANT or interesting co-occurrence (meaning anomalous co-occurrence). Ponies don’t help because everybody wants a pony
  6. Human HG-U133A CELs are automatically classified for sex of the tissue or cell line of origin. Orange points are manually curated as male and are also correctly classified as male. Red points are manually curated male that are falsely classified as female. Wheat points are classified as male but do not have manually curated results. These three types of points are also denoted by different shapes in the order of triangle, filled triangle, and circle respectively. All points are classified by assigning two clusters in five-dimensional probeset space, two of which are shown. x-axis, 221728_x_at, XIST; y-axis, 201909_at, RPS4Y1.
  7. The genomic position (x-axis) of probesets within a 6 megabase region centered at the location of TTN, a gene known to be associated with LMGD2, is plotted versus the Pearson correlation coefficient An external file that holds a picture, illustration, etc.Object name is pone.0008491.e023.jpg (y-axis) to a list of probesets targeting other genes known to be associated with LGMD2 (excluding TTN) across 11636 HG-U133_Plus_2 microarrays. Solid circles: probesets targeting TTN, An external file that holds a picture, illustration, etc.Object name is pone.0008491.e024.jpg: probesets that are for genes of unknown function and, open circles: probesets for known genes in interval.
  8. Allen: What do you plan to say about this? General example without anything proprietary?
  9. Allen: What do you plan to say about this? General example without anything proprietary?
  10. Allen: this is the transitional slide from talking about more than one input to one step further: cross recommendation. I doubt you want to use it as it, but I’ve included it FYI
  11. Allen: additional transitional slide
  12. Allen: What do you plan to say about this? General example without anything proprietary?
  13. Allen: What do you plan to say about this? General example without anything proprietary?
  14. Allen: What do you plan to say about this? General example without anything proprietary?
  15. Allen: What do you plan to say about this? General example without anything proprietary?