4. Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and
computational genomics
10 “genome hackers”
mostly engineers (statistics)
42 scientists
technicians, geneticists, clinicians
>100 people
Hardware/software engineers,
mathematicians, molecular biologists
5. What is Bioinformatics ?
• Application of information technology to the
storage, management and analysis of biological
information (Facilitated by the use of
computers)
– Sequence analysis?
– Molecular modeling (HTX) ?
– Phylogeny/evolution?
– Ecology and population studies?
– Medical informatics?
– Image Analysis ?
– Statistics ? AI ?
– Sterkstroom of zwakstroom ?
6. • Medicine (Pharma)
– Genome analysis allows the targeting of genetic
diseases
– The effect of a disease or of a therapeutic on RNA and
protein levels can be elucidated
– Knowledge of protein structure facilitates drug design
– Understanding of genomic variation allows the tailoring
of medical treatment to the individual’s genetic make-
up
• The same techniques can be applied to crop (Agro) and
livestock improvement (Animal Health)
Promises of genomics and bioinformatics
7. Bioinformatics: What’s in a name ?
• Begin 1990’s
• “Bio-informatics”:
Computing Power
Genbank
(Log)
Time (years)
8. Bioinformatics: What’s in a name ?
• Begin 1990’s
• “Bio-informatics”:
– convergence of explosive growth in
biotechnology, paralled by the explosive growth
in information technology
• Not new: > 30 years that people use
“computers” in biology
• In silico biology, database biology, ...
12. PCR + dye termination
Suddenly, a flash of insight caused him to pull the car
off the road and stop. He awakened his friend
dozing in the passenger seat and excitedly
explained to her that he had hit upon a solution -
not to his original problem, but to one of even
greater significance. Kary Mullis had just conceived
of a simple method for producing virtually unlimited
copies of a specific DNA sequence in a test tube -
the polymerase chain reaction (PCR)
16. Doel van de cursus
• Meer dan een inleiding tot ... het is de
bedoeling van de cursus een onderliggend
inzicht te verschaffen achter de
verschillende technieken.
• Naast het gebruik van recepten, wat terug
te vinden is in delen van de syllabus laat
een inzicht in
– de werking van databanken
– en de achterliggende algoritmen
• toe
– om wisselende interfaces op nieuwe
problemen toe te passen.
19. Examen
• Theorie
– Vier inzichtsvragen over de cursus (inclusief
!!)
• Practicum (“open-book”)
– Viertal oefeningen die meestal het schrijven
van een programma veronderstellen
• Puntenverdeling 50/50
28. Genome Size
DOGS: Database Of Genome Sizes
E. coli = 4.2 x 106
Yeast = 18 x 106
Arabidopsis = 80 x 106
C.elegans = 100 x 106
Drosophila = 180 x 106
Human/Rat/Mouse = 3000 x 106
Lily = 300 000 x 106
With ... : 99.9 %
To primates: 99%
31. And this is just the beginning ….
Next Generation Sequencing is here
32. Basics of the “old” technology
• Clone the DNA.
• Generate a ladder of labeled (colored) molecules
that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
33. Basics of the “new” technology
• Get DNA.
• Attach it to something.
• Extend and amplify signal with some color
scheme.
• Detect fluorochrome by microscopy.
• Interpret series of spots as short strings of DNA.
• Strings are 30-300 letters long
• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day).
• Map or align strings to one or many genome.
34. Next Generation Technologies
• 454
–Emulsion PCR
–Polymerase
–Natural Nucleotides
• 20-100Mb for 5-15k
–1% error rate
–Homopolymers
41. Read Length is Not As Important For Resequencing
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
%ofPairedK-merswithUniquely
AssignableLocation
E.COLI
HUMAN
Jay Shendure
55. Paired End Reads are Important!
Repetitive DNA
Unique DNA
Single read maps to
multiple positions
Paired read maps uniquely
Read 1 Read 2
Known Distance
63. 107 106 105 104 103 102 101 1108109
Full genome bp
G
E
N
E
T
I
C
Whole-genome
sequencing
Enrichment seq
(Exome)
PCR
Enrichment
Targeted Panels
Instrument and Assay providers
CLIA Lab service providers
69. Weblems
• What ?
– Web-based problemes (over de huidige les
en/of voorbereiding op volgende les)
• When ?
– Einde van elke les
• How ?
– Oplossingen online via screencasts
– Practicum
– Voorbedereiding op het practicum examen ...
Niet alle problemen vereisen noodzakelijk
programmacode ...
70. Weblems
W1.1: To which phyla do the following species belong (a)
starfish (b) ginko tree (c) scorpion
W1.2: What are the common names for the following
species (a) Orycterophus afer (b) Beta vulagaris (c)
macrocystis pyrifera
W1.3: What species has the smallest known genome ? And
is genome size related to number of genes ?
W1.4: What are the 5 latest genomes published ? How
complete is “coverage” ?
W1.5: For approximately 10% of europeans, the painkiller
codeine is ineffective because the patients lack the
enzyme that converts codeine into the active molecule,
morphine. What is the most common mutation that
causes this condition ?