SlideShare uma empresa Scribd logo
1 de 59
Baixar para ler offline
Pravech Ajawatanawong, Ph.D.
Department of Microbiology
Faculty of Science
Mahidol University
Wisdom of the Land
Mahidol University
BIOINFORMATICS
in a Nutshell
Biology Is Extremely Complex, Indeed!!!
“… We think that physics is complicated
because it is hard for us to understand, and
because physics books are full of difficult
mathematics. But the objects that physicists
study are still basically simple objects.
…
The objects and phenomena that a physic
book describes are simpler than a single cell
in the body of its author. …”
Hierarchy of Organization
molecule
(chlorophyll)
organelle
(chloroplast)
cell
(plant cell)
organ
(leave)
tissue
(plant epithelial)
organism
(maple tree)
population
(maple population)
ecosystem
biome
Amazing of Organization
Emergent Properties
molecule
(chlorophyll)
organelle
(chloroplast)
cell
(plant cell)
organ
(leave)
tissue
(plant epithelial)
organism
(maple tree)
population
(maple population)
ecosystem
biome
“Each level of biological
organization has
emergent properties.”
We know very little
about the whole biology.
Bioinformatics is an interdisciplinary field that develops methods and software
tools for understanding biological data. As an interdisciplinary field of science,
bioinformatics combines computer science, statistics, mathematics, and engineering
to study and process biological data.
— Wikipedia —
Bioinformatics derives knowledge from computer analysis of biological data.
These can consist of the information stored in the genetic code, but also
experimental results from various sources, patient statistics, and scientific
literature. Research in bioinformatics includes method development for storage,
retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of
biology and is highly interdisciplinary, using techniques and concepts from
informatics, statistics, mathematics, chemistry, biochemistry, physics, and
linguistics. It  has many practical applications in different areas of biology and
medicine.
— Michael Nilges & Jens P. Linge, Institut Pasteur —
What is Bioinformatics?
Bioinformatics in My Opinion!!!
Bioinformatics is an interdisciplinary subject that uses knowledges
and techniques from computer science, mathematics, statistics,
information technologies and linguistics to get some informations from
the massive biological data.
Synonyms of BIOINFORMATICS
computational biology
biocomputing
computational molecular biology
Computer Scientist Biologist
Bioinformatics ≠ Computer + Biology
Bioinformatician
Bioinformaticians are
the bridge between these groups
Central Dogma
How information flow?
Ref: http://genius.com/Biology-genius-the-central-dogma-annotated
DNA = Coca Cola
phosphoric acidphosphate backbone
sugardeoxyribose
waterwater
caffeineA, T, C and G
Sequence = Strings
DNA = {A, T, C, G}
protein = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}
RNA = {A, U, C, G}
CATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACT
ACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACTA
CGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGA
CTACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATGACT
MSFQDIQQSEHFLLRPSEKVQKLETSQWPLLLKNFDKLNVLTNHYVPIPSGCSPLKRSIEDYVKSGFINLDKPA
NPSSHEVVAWAKRILKVDKTGHSGTLDPKVTGCLIVCIERATRLVKSQQGAGKEYVCIFHLHSPVEDEQKVAKN
IERLTGALFQRPPLISAVKRQLRVRTVYESKMLEYDKDKGMGVFWVSCEAGTYIRTMCVHLGLFLGVGGQMQEL
RRVRSGINSEKEGLVTMHDILDAQWLYENHKDESYLRRAIKPLEALLTSHKRVIMKDTAVNALCYGAKIMLPGV
Main Types of Biological Data
Sequence Data
Structural Data
Profile Data
(Some) Areas of Bioinformatics
Biodatabase
Sequence Analysis
Structural Bioinformatics
Microarray Data Analysis
Systems Biology
Biodatabase
Why Biologists Needs Database?
PubMed
The World Largest Biodatabases
http://www.ncbi.nlm.nih.gov
Ref: http://www.nlm.nih.gov/about/2015CJ.html
Growth of GenBank
PDBJ
KEGG Database
Pfam Database
Sequence Analysis—a rosetta stone of life
“SEQUENCE ANALYSIS is the process of subjecting a DNA, RNA or
peptide sequence to any of a wide range of analytical methods to
understand its features, function, structure, or evolution.”
— Wikipedia —
Charles Darwin
1809–1882
Darwin also spent much time thinking about geology. De-
spite bouts of seasickness, he read Lyell’s Principles of Geology
isms that enhance their survival and reproduction in specific
environments. Later, as he reassessed his observations, he be-
PACIFIC
OCEANPinta
Genovesa
The
Galápagos
Islands
EquatorMarchena
Fernandina Pinzón
Santa
Fe San
Cristobal
Florenza
Isabela Santa
Cruz
Daphne
Islands
Santiago
Española
0 4020
Kilometers
ATLANTIC
OCEAN
PACIFIC
OCEAN
NORTH
AMERICA
Darwin in 1840,
after his return
from the
voyage
SOUTH
AMERICA
Great
Britain
AndesMtns.
Cape Horn
Cape of
Good Hope
Brazil
Argentina
Chile
Equator
Malay Archipelago
AFRICA
EUROPE
HMS Beagle in port
Tasmania
AUSTRALIA
New
Zealand
PACIFIC
OCEAN
᭡ Figure 22.5 The voyage of HMS Beagle. His Voyage with HMS Beagle
On the Origin of Species
“… It is a truly wonderful fact—the
wonder of which we are apt to overlook
from familiarity—that all animals and all
plants throughout all time and space
should be related to each other in group
subordinate to group, in the manner
which we everywhere behold namely,
varieties of the same species most
closely related together, species of the
same genus less closely and unequally
related together, forming sections and
sub-genera, species of distinct genera
much less closely related, and genera
related in different degrees, forming sub-
families, families, orders, subclasses,
and classes. …”
— Charles Darwin —
Darwin’s Tree
Visualization of Phylogeny
shorter than 250 amino acid residues were discarded because they are too
short for reliable control trees. All proteins in the resulting clusters are re-
ferred to here as seed orthologs.
Figure 6. Evolutionary relationships among the 35 eukaryotes used in this thesis
(Hejnol et al., 2009; Parfrey et al., 2010).
Ajawatanawong P. (2014) Mine the gaps, Uppsala University.
Carl Richard Woese
1928–2012
professor of microbiology at the
University of Illinois at Urbana–
Champaign
famous for defining the Archaea by
using 16S rRNA phylogeny
originated the idea of RNA world
hypothesis
SSU rDNA Structure of Bacteria
SSU—Ideal Molecular Marker
Nucleic acid sequencing—16S rRNA gene
(rDNA), oligonucleotide signature (e.g. indels)
Since genome sequencing becomes cheaper,
bacterial systematic using genome-based
method is coming
Norman R Pace
Deep Evolution of Bacteria
Aquificae
Thermotogae
Chloroflexi
Deinococcus-Thermus
Thermophiles
Archaea
Thermophilic (optimum temperature
around 85ºC)
paraphyletic group
non-peptidoplycan cell wall
similar to Archaea
Bacteria with Photosynthesis
Chloroflexi
Firmicutes
Archaea
Cyanobacteria
Chlorobi
α-proteobacteria
PhotosyntheticBacteria
γ-proteobacteria
green non-sulfur bacteria
heliobacteria
cyanobacteria
green sulfur bacteria
purple sulfur bacteria
β-proteobacteria
purple non-sulfur bacteria
purple non-sulfur bacteria
Proteobacteria
β-proteobacteria
γ-proteobacteria
Archaea
ε-proteobacteria
δ-proteobacteria
α-proteobacteria
Proteobacteria
Proteus—God of the Ocean
Gram Positive Bacteria
Actinobacteria
Firmicutes
Gram Positive Bacteria
Jim Henson
Genome Is a Book of Life
ATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAG
CGACTACACATTCGACTCAGCATCGACTACGCATCAGCTCC
ACGCATCAGCGACTACACATTCGACTCAGCATCGACTACGC
ATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCAT
CGACTACGCATCAGCTCCACGCATCAGCGACTACACATTCG
ACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACT
ACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCA
TCAGCGACTACACATTCGACTCAGCATCGACTACGCATCAG
CTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACT
ACGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCA
GCATCGACTACGCATCAGCTCCACGCATCAGCGACTACACA
TTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGC
GACTACACATTCGACTCAGCATGACTACACATTCGACTCAG
CATCGACTACGCATCAGCTCCACGCATCAGCGACTACACAT
TCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCG
ACTACACATTCGACTCAGCATCGACTACGCATCAGCTCCAC
GCATCAGCGACTACACATTCGACTCAGCATCGACTACGCAT
CAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCG
ACTACGCATCAGCTCCACGCATCAGCGACTACACATTCGAC
TCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACTAC
ACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATC
AGCGACTACACATTCGACTCAGCATCGACTACGCATCAGCT
CCACGCATCAGCGACTAAAAACTCGCGCCTACAGCGCATCA
GCATACGACTACAACGACAGCAGCAGCAGCAGCAGCAGCAG
CAGCGCCCCAGAAGAGAGAGAACACATTCGACTCAGCATCG
ACTACGCATCAGCTCCACGCATTCAGCTCCACTACCGACGA
TTAATCTACTACTACTCCCCTATTTCACCTATTTACATCAC
AAAACCGACTCGACATCAGCTCTTCGCATCAGCTACGACGC
ATCAAGCAGACGACTACGACCGCGCGACAGCAGCGACACTC
CCGCGCAACCAACAGATAGATAGATAGAAAAACCGACTCGA
CATCAGCTCTTCGCATCAGCTACGACGCATCAAGCAGACGA
CTACGACCGCGCGACAGCAGCGACACTCCCGCGCAACCAAC
AGATAGATAGATAGAAAAACCGACTCGACATCAGCTCTTCG
CATCAGCTACGACGCATCAAGCAGACGACTACGACCGCGCG
ACAGCAGCGACACTCCCGCGCAACCAACAGATAGATAGATA
GAAAAACCGACTCGACATCAGCTCTTCGCATCAGCTACGAC
GCATCAAGCAGACGACTACGACCGCGCGACAGCAGCGACAC
TCCCGCGCAACCAACAGATAGATAGATAGAAAAACCGACTC
GACATCAGCTCTTCGCATCAGCTACGACGCATCAAGCAGAC
GACTACGACCGCGCGACAGCAGCGACACTCCCGCGCAACCA
ACAGATAGATAGATAGAAAACCGACTCGACATCAGCTCTTC
GCATCAGCTACGACGCATCAAGCAGACGACTACGACCGCGC
GACAGCAGCGACACTCCCGCGCAACCAACAGATAGATAGAT
AGAAAAACCGACTCGACATCAGCTCTTCGCATCAGCTACGA
CGCATCAAGCAGACGACTACGACCGCGCGACAGCAGCGACA
CTCCCGCGCAACCAACAGATAGATAGATAGAAAAACCGACT
CGACATCAGCTCTTCGCATCAGCTACGACGCATCAAGCAGA
CGACTACGACCGCGCGACAGCAGCGACACTCCCGCGCAACC
AACAGATAGATAGATAGAAAAACCGACTCGCTACGACGCAT
CAAGCAGACGACTACGACCGCGCGACAGCAGCGACACTCCC
GCGCAACCAACAGATAGATAGATAGAAAAACCGACTCATCC
GCCCCCCCCCCGCGCGCCGAACTAGACATCAGCTCTTCGCA
TCAGCTACGACGCATCAAGCAGACGACTACGACCGCGCGAC
AGCAGCGACACTCCCGCGCAACCAACAGATAGATAGATAGA
Genome Sequencing
think big!!!
The first bacterial genome
(Haemophilus influenzae)
The first eukaryotic genome
(Saccharomyces cerevisiae)
The first archaea genome
(Methanococcus jannaschii)
Homo), plants (Zea), and fungi (Coprinus)
constitute small and peripheral branches of
even eukaryotic cellular diversity. If the
animals, plants, and fungi are taken to com-
prise taxonomic “kingdoms,” then we must
recognize as kingdoms at least a dozen other
eucaryotic groups, all microbial, with as
much or more independent evolutionary
history than that which separates the three
traditional eukaryotic kingdoms (13).
The rRNA and other molecular data
solidly confirm the notion stemming from
the last century that the major organelles of
eukaryotes—mitochondria and chloro-
plasts—are derived from bacterial symbi-
onts that have undergone specialization
through coevolution with the host cell. Se-
quence comparisons establish mitochondria
as representatives of Proteobacteria (the
group in Fig. 1 including Escherichia and
Agrobacterium) and chloroplasts as derived
from cyanobacteria (Synechococcus and
Gloeobacter in Fig. 1) (14). Thus, all respi-
ratory and photosynthetic capacity of eu-
karyotic cells was obtained from bacterial
symbionts; the “endosymbiont hypothesis”
for the origin of organelles is no longer
hypothesis but well-grounded fact. The nu-
clear component of the modern eukaryotic
cell did not derive from one of the pro-
karoytic lineages, however. The rRNA and
other molecular trees show that the eukary-
otic nuclear line of descent extends as deep-
ly into the history of life as do the bacterial
and archaeal lineages. The mitochondrion
and chloroplast came in relatively late. This
late evolution is evidenced by the fact that
mitochondria and chloroplasts diverged
processing mechanisms occurred. Thus,
modern representatives of Eucarya and Ar-
chaea share many properties that differ from
bacterial cells in fundamental ways. One ex-
cleolar structural genes (12). W
tutes a “nucleus?” Certainly the
of the nuclear membrane was
late event in the establishmen
Fig. 1. Un
genetic tre
SSU rRNA
Sixty-four
quences r
of all kno
netic do
aligned, an
produced
NAML (43,
was modi
in the co
shown, by
eages an
branch po
porate res
analyses. T
correspond
changes p
The First Plant Genome
Arabidopsis thaliana
$1000 per Genome in 2015
$1,000
$10,000
$100,000
$10 million
US$100 million
$1 million
2006
2008
2010
2012
2002
2004
As next-generation
sequencers entered
the market, the
price dropped
precipitously.
The price of sequencing
a whole human genome
hovers around $5,000
and is expected to drop
even lower.
Cost of genome
sequencing.
Moore's law for
computing costs.
I
n Silicon Valley,
Moore’s law seems to stand
on equal footing with the natural
laws codified by Isaac Newton. Intel co-founder
Gordon Moore’s iconic observation that computing
power tends to double — and that its price there-
fore halves — every 2 years has held true for nearly
50 years with only minor revision. But as an exemplar
of rapid change, it is the target of playful abuse from
genome researchers.
In dozens of presentations over the past few years,
scientists have compared the slope of Moore’s law with
theswiftlydroppingcostsofDNAsequencing.Forawhile
they kept pace, but since about 2007, it has not even been
close. The price of sequencing an average human genome
hasplummetedfromaboutUS$10milliontoafewthousand
dollars in just six years. That does not just outpace Moore’s
law— it makes theonce-powerfulpredictor of unbridled pro-
gress look downright sedate. And just as the easy availability of
personal computers changed the world, the breakneck pace of
genome-technology development has revolutionized bioscience
research. It is also set to cause seismic shifts in medicine.
In the eyes of many, a fair share of the credit for this success goes
toagrantschemerunbytheUSNationalHumanGenomeResearch
Institute(NHGRI).OfficiallycalledtheAdvancedSequencingTech-
nologyawards,itisknownmorewidelyasthe$1,000and$100,000
genome programmes. Started in 2004, the scheme has awarded
grantsto97groupsofacademicandindustrialscientists,including
some at every major sequencing company.
Ithasencouragedmobilityandcooperationamongtechnologists,
and helped to launch dozens of competing companies, staving off
the stagnation that many feared would take hold after the Human
GenomeProjectwrappedupin2003.“Themajorcompaniesinthe
space have really changed the way people do sequencing, and it all
startedwiththeNHGRIfunding,”saysGinaCosta,whohasworked
forfiveinfluentialcompaniesandisnowavice-presidentatCypher
Genomics, a genome-interpretation firm in San Diego, California.
A GIANT’S LEGACY
The $1,000 genome programme, now close to achieving its goal,
will award its final grants this year. As technology enthusiasts look
to future challenges, the coming milestone raises questions about
how the roughly $230-million government programme managed
to achieve such success, and whether its winning formula can be
appliedelsewhere.Itbenefitedfromfortuitoustimingandthelackof
anentrenchedindustry.ButJefferySchloss,directorofthedivision
ofgenomesciencesattheNHGRIinBethesda,Maryland,whohas
run the programme from its inception, says that its achievements
alsosuggestthattherearewaystonavigatepublic–privatepartner-
ships successfully. “One of our challenges is to figure out what is
therightroleforthegovernment;tonotgetintheway,butfeedthe
pipeline of private-sector technology development,” he says.
The quest to sequence the first human genome was a massive
BY ERIKA CHECK HAYDEN
With a unique programme, the
US government has managed
to drive the cost of genome
sequencing down towards a
much-anticipated target.
The$1,000
genome
2 9 4 | N A T U R E | V O L 5 0 7 | 2 0 M A R C H 2 0 1 4
© 2014 Macmillan Publishers Limited. All rights reserved
modified from: Hayden EC. (2014) Nature 507:294–295.
Human Genome
The human genome is 380,000 longer
than the sequence shown here.
From Gene to Genome
Human Genome Project
Achievements beyond HGP
s et al. 2006). At that time,
d bacterial genomes and only
projects; this represented a
from the mere two genomes
er of sequenced genomes has
ly in the last 10 years (Fig. 1),
published. Today, there are more than 20,000 metagenomic
projects publically available, and many terabytes of se-
quencing data have been produced. The myriad of ecosys-
tems includes numerous animal and human microbiomes,
soils of all types, fresh and salt water samples, and even
plant–microbe interaction systems.
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
Numberofgenomessequenced
Year
20 Years of Bacterial Genome Sequencing
Land M, et al. (2015) Funct Integr Genomics 15,141–161.
Deinococcus radiodurans
Deinococcus radiodurans
deinos—unusual
extraordinarily resistant to oxidative
stress, including desiccation and
radiation
survive under radiation around 3–5
million rad (100 rad can kill human)
genome 1 genome 2
genome 3
core
genes
decorative
genes
Comparative Genomics
based on 16S sequences, DDH and biochemical tests some-
times results in combinations or divisions that are not support-
ed by their genome content. As a result, species, genera, and
complete families are being shifted and reordered, in an ongo-
ing process.
dista
2013
Neg
Acid
quire
of P
Firm
taxo
the a
deve
for f
Ozen
New
Micr
ery eFig. 6 Core and pan-genome of 2085 E. coli genomes. Core gene
decorative genes
core genes
MicrobiomeWe are entering to the new era of omics,
a wide variety of large-scale, multi-dimensional biology.
Features of Omics approach:
high-throughput, data-driven, holistic, top-down methods
understanding cell metabolism in one ‘integrated system’
high-output, requires bioinformatics to analyze & manipulate
From Standalone Biology to ‘Omics’ Study
GENOMICS
TRANSCRIPTOMICS
PROTEOMICS
METABOLOMICS
DNA
Metabolite
Protein
mRNA
transcription
metabolism
translation
Omics Study Relies on Central Dogma
Understand Genome is Not Enough
genomics is static
don’t know the set of genes that express in a particular
condition
some phenotypes are consequent of interaction of gene
interaction (emerging property)
lot of changes happen in the downstream processes of genetic
information (not in DNA)
DNA Chip
Microarray Chip
Microarray Technology
Microarray Data
Clustering of Microarray Data
microarray data clustering
tree on the top and left are
just dendrogram
always plots between genes
versus conditions
intensity of each color
represents level of expression
2D-gel Electrophoresis
http://elte.prompt.hu/sites/default/files/tananyagok/practical_biochemistry/ch07s03.html
Comparative 2D-gel Electrophoresis
http://elte.prompt.hu/sites/default/files/tananyagok/practical_biochemistry/ch07s03.html
modified from Venter et al. (2001) Science 291:1304–1351.
pter 15 Genomics
Transfer/carrier protein (203, 0.7%)
Transcription factor (1850, 6.0%)
Nucleic acid enzyme (2308, 7.5%)
Signaling molecule (376, 1.2%)
Receptor (1543, 5.0%)
Kinase (868, 2.8%)
Select regulatory molecule (988, 3.2%)
Transferase (610, 2.0%)
Synthase and synthetase (313, 1.0%)
Oxidoreductase (656, 2.1%)
Lyase (117, 0.4%)
Ligase (56, 0.2%)
Isomerase (163, 0.5%)
Hydrolase (1227, 4.0%)
Viral protein (100, 0.3%)
Miscellaneous (1318, 4.3%)
Cell adhesion (577, 1.9%) Chaperone (159, 0.5%)
Cytoskeletal structural protein (876, 2.8%)
Extracellular matrix (437, 1.4%)
Immunoglobulin (264, 0.9%)
Ion channel (406, 1.3%)
Motor (376, 1.2%)
Structural protein of muscle (296, 1.0%)
Proto-oncogene (902, 2.9%)
Select calcium-binding protein (34, 0.1%)
Intracellular transporter (350, 1.1%)
Transporter (533, 1.7%)
Molecular function unknown (12,809, 41.7%)
Signaltransduction
Enzym
e
Nucleic
acid
binding
None
᭿ FIGURE 15.10 Functional classification of the 26,383 genes predicted by Celera Genomics’ first draft of the
sequence of the human genome. Each sector gives the number and percentage of gene products in each
functional class in parentheses. Note that some classes overlap: a proto-oncogene, for example, may encode
Not All Proteins Are Enzymes
Microbiome
microbiome = all microbial population localize in a particular habitat
(e.g.: human gut, skin, vagina, etc.)
Human Microbiome Studies
Hand–Skin Microbiome
51 college students (after exam)
targeting V2 region of bacterial 16S rRNA gene
>150 species/palm, intra- and interpersonal variation
hand from the same individual share 17% of species-level phylotype
women have higher diversity than men

Mais conteúdo relacionado

Mais procurados

Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
Abhik Seal
 

Mais procurados (20)

BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Next generation sequencing
Next  generation  sequencingNext  generation  sequencing
Next generation sequencing
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
DNA Barcoding and its application in species identification
DNA Barcoding and its application in species identificationDNA Barcoding and its application in species identification
DNA Barcoding and its application in species identification
 
Sequence database
Sequence databaseSequence database
Sequence database
 
NGS File formats
NGS File formatsNGS File formats
NGS File formats
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Nanotechnology in the diagnosis of malaria
Nanotechnology in the diagnosis of malariaNanotechnology in the diagnosis of malaria
Nanotechnology in the diagnosis of malaria
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
GWAS
GWASGWAS
GWAS
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
BLAST
BLASTBLAST
BLAST
 

Destaque

16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
Abdulrahman Muhammad
 

Destaque (20)

Data Science fuels Creativity
Data Science fuels CreativityData Science fuels Creativity
Data Science fuels Creativity
 
My Spark Journey
My Spark JourneyMy Spark Journey
My Spark Journey
 
Big Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityBig Data Analytics to Enhance Security
Big Data Analytics to Enhance Security
 
Drawing Your career in business analytics and data science
Drawing Your career in business analytics and data scienceDrawing Your career in business analytics and data science
Drawing Your career in business analytics and data science
 
Marketing analytics
Marketing analyticsMarketing analytics
Marketing analytics
 
Using hadoop for big data
Using hadoop for big dataUsing hadoop for big data
Using hadoop for big data
 
Define Your Data (Science) Career
Define Your Data (Science) CareerDefine Your Data (Science) Career
Define Your Data (Science) Career
 
Electronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data InitiativeElectronic Medical Records - Paperless to Big Data Initiative
Electronic Medical Records - Paperless to Big Data Initiative
 
Hr Analytics
Hr AnalyticsHr Analytics
Hr Analytics
 
Text Mining and Thai NLP
Text Mining and Thai NLP Text Mining and Thai NLP
Text Mining and Thai NLP
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Data Science Thailand Meetup#11
Data Science Thailand Meetup#11Data Science Thailand Meetup#11
Data Science Thailand Meetup#11
 
Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of Healthcare
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
CUSTOMER ANALYTICS & SEGMENTATION FOR CUSTOMER CENTRIC ORGANIZATION & MARKETI...
 
Myths of Data Science
Myths of Data ScienceMyths of Data Science
Myths of Data Science
 
Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
 

Semelhante a Bioinformatics in a Nutshell

Ryan’S Bio Final Project
Ryan’S Bio Final ProjectRyan’S Bio Final Project
Ryan’S Bio Final Project
guestf59844
 
Ryan’S Bio Final Project
Ryan’S Bio Final ProjectRyan’S Bio Final Project
Ryan’S Bio Final Project
guestc32ebd
 
What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...
Monica Turner
 
Exam 2 Study Guide. All questions will be over these concepts, voc.docx
Exam 2 Study Guide. All questions will be over these concepts, voc.docxExam 2 Study Guide. All questions will be over these concepts, voc.docx
Exam 2 Study Guide. All questions will be over these concepts, voc.docx
SANSKAR20
 

Semelhante a Bioinformatics in a Nutshell (20)

Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
 
Ryan’S Bio Final Project
Ryan’S Bio Final ProjectRyan’S Bio Final Project
Ryan’S Bio Final Project
 
Ryan’S Bio Final Project
Ryan’S Bio Final ProjectRyan’S Bio Final Project
Ryan’S Bio Final Project
 
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
 
BIO 240 Inspiring Innovation/tutorialrank.com
 BIO 240 Inspiring Innovation/tutorialrank.com BIO 240 Inspiring Innovation/tutorialrank.com
BIO 240 Inspiring Innovation/tutorialrank.com
 
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of LifeMicrobial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
 
What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...What Are Archaea And Bacteria Be Classified As Two...
What Are Archaea And Bacteria Be Classified As Two...
 
bacterial systematics in the diversity of bacteria
bacterial systematics in the diversity  of bacteriabacterial systematics in the diversity  of bacteria
bacterial systematics in the diversity of bacteria
 
Forensic Science
Forensic ScienceForensic Science
Forensic Science
 
Molecular Phylogenetics
Molecular PhylogeneticsMolecular Phylogenetics
Molecular Phylogenetics
 
BIO 240 Enhance teaching - tutorialrank.com
BIO 240  Enhance teaching - tutorialrank.comBIO 240  Enhance teaching - tutorialrank.com
BIO 240 Enhance teaching - tutorialrank.com
 
the others our biased perspective
the others our biased perspectivethe others our biased perspective
the others our biased perspective
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
BIO 240 TUTORIAL Education Planning--bio240tutorial.com
BIO 240 TUTORIAL Education Planning--bio240tutorial.comBIO 240 TUTORIAL Education Planning--bio240tutorial.com
BIO 240 TUTORIAL Education Planning--bio240tutorial.com
 
a.ppt
a.ppta.ppt
a.ppt
 
1.ppt
1.ppt1.ppt
1.ppt
 
introduction to biological classification
introduction to biological classificationintroduction to biological classification
introduction to biological classification
 
Lecture 3 -the diversity of genomes and the tree of life
Lecture 3 -the diversity of genomes and the tree of lifeLecture 3 -the diversity of genomes and the tree of life
Lecture 3 -the diversity of genomes and the tree of life
 
Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5
 
Exam 2 Study Guide. All questions will be over these concepts, voc.docx
Exam 2 Study Guide. All questions will be over these concepts, voc.docxExam 2 Study Guide. All questions will be over these concepts, voc.docx
Exam 2 Study Guide. All questions will be over these concepts, voc.docx
 

Mais de Data Science Thailand

Mais de Data Science Thailand (11)

Predictive Analytics in Manufacturing
Predictive Analytics in ManufacturingPredictive Analytics in Manufacturing
Predictive Analytics in Manufacturing
 
How to hack into the big data team
How to hack into the big data teamHow to hack into the big data team
How to hack into the big data team
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
How big data tranform your business? Data Science Thailand Meet up #6
How big data tranform your business? Data Science Thailand Meet up #6How big data tranform your business? Data Science Thailand Meet up #6
How big data tranform your business? Data Science Thailand Meet up #6
 
Design Your Data Scientist Career
Design Your Data Scientist CareerDesign Your Data Scientist Career
Design Your Data Scientist Career
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Getting Ready For 3rd Generation Platform
Getting Ready For 3rd Generation PlatformGetting Ready For 3rd Generation Platform
Getting Ready For 3rd Generation Platform
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Big Data Analytics and Data Science
Big Data Analytics and Data Science�Big Data Analytics and Data Science�
Big Data Analytics and Data Science
 
Big Data Analytics government healthcare
Big Data Analytics government healthcareBig Data Analytics government healthcare
Big Data Analytics government healthcare
 
Machine Learning and its Use Cases (dsth Meetup#3)
Machine Learning and its Use Cases (dsth Meetup#3)Machine Learning and its Use Cases (dsth Meetup#3)
Machine Learning and its Use Cases (dsth Meetup#3)
 

Último

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Bioinformatics in a Nutshell

  • 1. Pravech Ajawatanawong, Ph.D. Department of Microbiology Faculty of Science Mahidol University Wisdom of the Land Mahidol University BIOINFORMATICS in a Nutshell
  • 2. Biology Is Extremely Complex, Indeed!!! “… We think that physics is complicated because it is hard for us to understand, and because physics books are full of difficult mathematics. But the objects that physicists study are still basically simple objects. … The objects and phenomena that a physic book describes are simpler than a single cell in the body of its author. …”
  • 3. Hierarchy of Organization molecule (chlorophyll) organelle (chloroplast) cell (plant cell) organ (leave) tissue (plant epithelial) organism (maple tree) population (maple population) ecosystem biome
  • 5. Emergent Properties molecule (chlorophyll) organelle (chloroplast) cell (plant cell) organ (leave) tissue (plant epithelial) organism (maple tree) population (maple population) ecosystem biome “Each level of biological organization has emergent properties.” We know very little about the whole biology.
  • 6. Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to study and process biological data. — Wikipedia — Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It  has many practical applications in different areas of biology and medicine. — Michael Nilges & Jens P. Linge, Institut Pasteur — What is Bioinformatics?
  • 7. Bioinformatics in My Opinion!!! Bioinformatics is an interdisciplinary subject that uses knowledges and techniques from computer science, mathematics, statistics, information technologies and linguistics to get some informations from the massive biological data. Synonyms of BIOINFORMATICS computational biology biocomputing computational molecular biology
  • 8. Computer Scientist Biologist Bioinformatics ≠ Computer + Biology Bioinformatician Bioinformaticians are the bridge between these groups
  • 9. Central Dogma How information flow? Ref: http://genius.com/Biology-genius-the-central-dogma-annotated
  • 10. DNA = Coca Cola phosphoric acidphosphate backbone sugardeoxyribose waterwater caffeineA, T, C and G
  • 11. Sequence = Strings DNA = {A, T, C, G} protein = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y} RNA = {A, U, C, G} CATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACT ACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACTA CGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGA CTACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATGACT MSFQDIQQSEHFLLRPSEKVQKLETSQWPLLLKNFDKLNVLTNHYVPIPSGCSPLKRSIEDYVKSGFINLDKPA NPSSHEVVAWAKRILKVDKTGHSGTLDPKVTGCLIVCIERATRLVKSQQGAGKEYVCIFHLHSPVEDEQKVAKN IERLTGALFQRPPLISAVKRQLRVRTVYESKMLEYDKDKGMGVFWVSCEAGTYIRTMCVHLGLFLGVGGQMQEL RRVRSGINSEKEGLVTMHDILDAQWLYENHKDESYLRRAIKPLEALLTSHKRVIMKDTAVNALCYGAKIMLPGV
  • 12. Main Types of Biological Data Sequence Data Structural Data Profile Data
  • 13. (Some) Areas of Bioinformatics Biodatabase Sequence Analysis Structural Bioinformatics Microarray Data Analysis Systems Biology
  • 15. Why Biologists Needs Database?
  • 17. The World Largest Biodatabases http://www.ncbi.nlm.nih.gov
  • 19. PDBJ
  • 22. Sequence Analysis—a rosetta stone of life “SEQUENCE ANALYSIS is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution.” — Wikipedia —
  • 23. Charles Darwin 1809–1882 Darwin also spent much time thinking about geology. De- spite bouts of seasickness, he read Lyell’s Principles of Geology isms that enhance their survival and reproduction in specific environments. Later, as he reassessed his observations, he be- PACIFIC OCEANPinta Genovesa The Galápagos Islands EquatorMarchena Fernandina Pinzón Santa Fe San Cristobal Florenza Isabela Santa Cruz Daphne Islands Santiago Española 0 4020 Kilometers ATLANTIC OCEAN PACIFIC OCEAN NORTH AMERICA Darwin in 1840, after his return from the voyage SOUTH AMERICA Great Britain AndesMtns. Cape Horn Cape of Good Hope Brazil Argentina Chile Equator Malay Archipelago AFRICA EUROPE HMS Beagle in port Tasmania AUSTRALIA New Zealand PACIFIC OCEAN ᭡ Figure 22.5 The voyage of HMS Beagle. His Voyage with HMS Beagle
  • 24. On the Origin of Species “… It is a truly wonderful fact—the wonder of which we are apt to overlook from familiarity—that all animals and all plants throughout all time and space should be related to each other in group subordinate to group, in the manner which we everywhere behold namely, varieties of the same species most closely related together, species of the same genus less closely and unequally related together, forming sections and sub-genera, species of distinct genera much less closely related, and genera related in different degrees, forming sub- families, families, orders, subclasses, and classes. …” — Charles Darwin —
  • 26. Visualization of Phylogeny shorter than 250 amino acid residues were discarded because they are too short for reliable control trees. All proteins in the resulting clusters are re- ferred to here as seed orthologs. Figure 6. Evolutionary relationships among the 35 eukaryotes used in this thesis (Hejnol et al., 2009; Parfrey et al., 2010). Ajawatanawong P. (2014) Mine the gaps, Uppsala University.
  • 27. Carl Richard Woese 1928–2012 professor of microbiology at the University of Illinois at Urbana– Champaign famous for defining the Archaea by using 16S rRNA phylogeny originated the idea of RNA world hypothesis
  • 28. SSU rDNA Structure of Bacteria
  • 29. SSU—Ideal Molecular Marker Nucleic acid sequencing—16S rRNA gene (rDNA), oligonucleotide signature (e.g. indels) Since genome sequencing becomes cheaper, bacterial systematic using genome-based method is coming
  • 31. Deep Evolution of Bacteria Aquificae Thermotogae Chloroflexi Deinococcus-Thermus Thermophiles Archaea Thermophilic (optimum temperature around 85ºC) paraphyletic group non-peptidoplycan cell wall similar to Archaea
  • 32. Bacteria with Photosynthesis Chloroflexi Firmicutes Archaea Cyanobacteria Chlorobi α-proteobacteria PhotosyntheticBacteria γ-proteobacteria green non-sulfur bacteria heliobacteria cyanobacteria green sulfur bacteria purple sulfur bacteria β-proteobacteria purple non-sulfur bacteria purple non-sulfur bacteria
  • 35. Genome Is a Book of Life ATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAG CGACTACACATTCGACTCAGCATCGACTACGCATCAGCTCC ACGCATCAGCGACTACACATTCGACTCAGCATCGACTACGC ATCAGCTCCACGCATCAGCGACTACACATTCGACTCAGCAT CGACTACGCATCAGCTCCACGCATCAGCGACTACACATTCG ACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACT ACACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCA TCAGCGACTACACATTCGACTCAGCATCGACTACGCATCAG CTCCACGCATCAGCGACTACACATTCGACTCAGCATCGACT ACGCATCAGCTCCACGCATCAGCGACTACACATTCGACTCA GCATCGACTACGCATCAGCTCCACGCATCAGCGACTACACA TTCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGC GACTACACATTCGACTCAGCATGACTACACATTCGACTCAG CATCGACTACGCATCAGCTCCACGCATCAGCGACTACACAT TCGACTCAGCATCGACTACGCATCAGCTCCACGCATCAGCG ACTACACATTCGACTCAGCATCGACTACGCATCAGCTCCAC GCATCAGCGACTACACATTCGACTCAGCATCGACTACGCAT CAGCTCCACGCATCAGCGACTACACATTCGACTCAGCATCG ACTACGCATCAGCTCCACGCATCAGCGACTACACATTCGAC TCAGCATCGACTACGCATCAGCTCCACGCATCAGCGACTAC ACATTCGACTCAGCATCGACTACGCATCAGCTCCACGCATC AGCGACTACACATTCGACTCAGCATCGACTACGCATCAGCT CCACGCATCAGCGACTAAAAACTCGCGCCTACAGCGCATCA GCATACGACTACAACGACAGCAGCAGCAGCAGCAGCAGCAG CAGCGCCCCAGAAGAGAGAGAACACATTCGACTCAGCATCG ACTACGCATCAGCTCCACGCATTCAGCTCCACTACCGACGA TTAATCTACTACTACTCCCCTATTTCACCTATTTACATCAC AAAACCGACTCGACATCAGCTCTTCGCATCAGCTACGACGC ATCAAGCAGACGACTACGACCGCGCGACAGCAGCGACACTC CCGCGCAACCAACAGATAGATAGATAGAAAAACCGACTCGA CATCAGCTCTTCGCATCAGCTACGACGCATCAAGCAGACGA CTACGACCGCGCGACAGCAGCGACACTCCCGCGCAACCAAC AGATAGATAGATAGAAAAACCGACTCGACATCAGCTCTTCG CATCAGCTACGACGCATCAAGCAGACGACTACGACCGCGCG ACAGCAGCGACACTCCCGCGCAACCAACAGATAGATAGATA GAAAAACCGACTCGACATCAGCTCTTCGCATCAGCTACGAC GCATCAAGCAGACGACTACGACCGCGCGACAGCAGCGACAC TCCCGCGCAACCAACAGATAGATAGATAGAAAAACCGACTC GACATCAGCTCTTCGCATCAGCTACGACGCATCAAGCAGAC GACTACGACCGCGCGACAGCAGCGACACTCCCGCGCAACCA ACAGATAGATAGATAGAAAACCGACTCGACATCAGCTCTTC GCATCAGCTACGACGCATCAAGCAGACGACTACGACCGCGC GACAGCAGCGACACTCCCGCGCAACCAACAGATAGATAGAT AGAAAAACCGACTCGACATCAGCTCTTCGCATCAGCTACGA CGCATCAAGCAGACGACTACGACCGCGCGACAGCAGCGACA CTCCCGCGCAACCAACAGATAGATAGATAGAAAAACCGACT CGACATCAGCTCTTCGCATCAGCTACGACGCATCAAGCAGA CGACTACGACCGCGCGACAGCAGCGACACTCCCGCGCAACC AACAGATAGATAGATAGAAAAACCGACTCGCTACGACGCAT CAAGCAGACGACTACGACCGCGCGACAGCAGCGACACTCCC GCGCAACCAACAGATAGATAGATAGAAAAACCGACTCATCC GCCCCCCCCCCGCGCGCCGAACTAGACATCAGCTCTTCGCA TCAGCTACGACGCATCAAGCAGACGACTACGACCGCGCGAC AGCAGCGACACTCCCGCGCAACCAACAGATAGATAGATAGA
  • 36. Genome Sequencing think big!!! The first bacterial genome (Haemophilus influenzae) The first eukaryotic genome (Saccharomyces cerevisiae) The first archaea genome (Methanococcus jannaschii) Homo), plants (Zea), and fungi (Coprinus) constitute small and peripheral branches of even eukaryotic cellular diversity. If the animals, plants, and fungi are taken to com- prise taxonomic “kingdoms,” then we must recognize as kingdoms at least a dozen other eucaryotic groups, all microbial, with as much or more independent evolutionary history than that which separates the three traditional eukaryotic kingdoms (13). The rRNA and other molecular data solidly confirm the notion stemming from the last century that the major organelles of eukaryotes—mitochondria and chloro- plasts—are derived from bacterial symbi- onts that have undergone specialization through coevolution with the host cell. Se- quence comparisons establish mitochondria as representatives of Proteobacteria (the group in Fig. 1 including Escherichia and Agrobacterium) and chloroplasts as derived from cyanobacteria (Synechococcus and Gloeobacter in Fig. 1) (14). Thus, all respi- ratory and photosynthetic capacity of eu- karyotic cells was obtained from bacterial symbionts; the “endosymbiont hypothesis” for the origin of organelles is no longer hypothesis but well-grounded fact. The nu- clear component of the modern eukaryotic cell did not derive from one of the pro- karoytic lineages, however. The rRNA and other molecular trees show that the eukary- otic nuclear line of descent extends as deep- ly into the history of life as do the bacterial and archaeal lineages. The mitochondrion and chloroplast came in relatively late. This late evolution is evidenced by the fact that mitochondria and chloroplasts diverged processing mechanisms occurred. Thus, modern representatives of Eucarya and Ar- chaea share many properties that differ from bacterial cells in fundamental ways. One ex- cleolar structural genes (12). W tutes a “nucleus?” Certainly the of the nuclear membrane was late event in the establishmen Fig. 1. Un genetic tre SSU rRNA Sixty-four quences r of all kno netic do aligned, an produced NAML (43, was modi in the co shown, by eages an branch po porate res analyses. T correspond changes p
  • 37. The First Plant Genome Arabidopsis thaliana
  • 38. $1000 per Genome in 2015 $1,000 $10,000 $100,000 $10 million US$100 million $1 million 2006 2008 2010 2012 2002 2004 As next-generation sequencers entered the market, the price dropped precipitously. The price of sequencing a whole human genome hovers around $5,000 and is expected to drop even lower. Cost of genome sequencing. Moore's law for computing costs. I n Silicon Valley, Moore’s law seems to stand on equal footing with the natural laws codified by Isaac Newton. Intel co-founder Gordon Moore’s iconic observation that computing power tends to double — and that its price there- fore halves — every 2 years has held true for nearly 50 years with only minor revision. But as an exemplar of rapid change, it is the target of playful abuse from genome researchers. In dozens of presentations over the past few years, scientists have compared the slope of Moore’s law with theswiftlydroppingcostsofDNAsequencing.Forawhile they kept pace, but since about 2007, it has not even been close. The price of sequencing an average human genome hasplummetedfromaboutUS$10milliontoafewthousand dollars in just six years. That does not just outpace Moore’s law— it makes theonce-powerfulpredictor of unbridled pro- gress look downright sedate. And just as the easy availability of personal computers changed the world, the breakneck pace of genome-technology development has revolutionized bioscience research. It is also set to cause seismic shifts in medicine. In the eyes of many, a fair share of the credit for this success goes toagrantschemerunbytheUSNationalHumanGenomeResearch Institute(NHGRI).OfficiallycalledtheAdvancedSequencingTech- nologyawards,itisknownmorewidelyasthe$1,000and$100,000 genome programmes. Started in 2004, the scheme has awarded grantsto97groupsofacademicandindustrialscientists,including some at every major sequencing company. Ithasencouragedmobilityandcooperationamongtechnologists, and helped to launch dozens of competing companies, staving off the stagnation that many feared would take hold after the Human GenomeProjectwrappedupin2003.“Themajorcompaniesinthe space have really changed the way people do sequencing, and it all startedwiththeNHGRIfunding,”saysGinaCosta,whohasworked forfiveinfluentialcompaniesandisnowavice-presidentatCypher Genomics, a genome-interpretation firm in San Diego, California. A GIANT’S LEGACY The $1,000 genome programme, now close to achieving its goal, will award its final grants this year. As technology enthusiasts look to future challenges, the coming milestone raises questions about how the roughly $230-million government programme managed to achieve such success, and whether its winning formula can be appliedelsewhere.Itbenefitedfromfortuitoustimingandthelackof anentrenchedindustry.ButJefferySchloss,directorofthedivision ofgenomesciencesattheNHGRIinBethesda,Maryland,whohas run the programme from its inception, says that its achievements alsosuggestthattherearewaystonavigatepublic–privatepartner- ships successfully. “One of our challenges is to figure out what is therightroleforthegovernment;tonotgetintheway,butfeedthe pipeline of private-sector technology development,” he says. The quest to sequence the first human genome was a massive BY ERIKA CHECK HAYDEN With a unique programme, the US government has managed to drive the cost of genome sequencing down towards a much-anticipated target. The$1,000 genome 2 9 4 | N A T U R E | V O L 5 0 7 | 2 0 M A R C H 2 0 1 4 © 2014 Macmillan Publishers Limited. All rights reserved modified from: Hayden EC. (2014) Nature 507:294–295.
  • 39. Human Genome The human genome is 380,000 longer than the sequence shown here.
  • 40. From Gene to Genome
  • 43. s et al. 2006). At that time, d bacterial genomes and only projects; this represented a from the mere two genomes er of sequenced genomes has ly in the last 10 years (Fig. 1), published. Today, there are more than 20,000 metagenomic projects publically available, and many terabytes of se- quencing data have been produced. The myriad of ecosys- tems includes numerous animal and human microbiomes, soils of all types, fresh and salt water samples, and even plant–microbe interaction systems. 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 Numberofgenomessequenced Year 20 Years of Bacterial Genome Sequencing Land M, et al. (2015) Funct Integr Genomics 15,141–161.
  • 44. Deinococcus radiodurans Deinococcus radiodurans deinos—unusual extraordinarily resistant to oxidative stress, including desiccation and radiation survive under radiation around 3–5 million rad (100 rad can kill human)
  • 45. genome 1 genome 2 genome 3 core genes decorative genes Comparative Genomics based on 16S sequences, DDH and biochemical tests some- times results in combinations or divisions that are not support- ed by their genome content. As a result, species, genera, and complete families are being shifted and reordered, in an ongo- ing process. dista 2013 Neg Acid quire of P Firm taxo the a deve for f Ozen New Micr ery eFig. 6 Core and pan-genome of 2085 E. coli genomes. Core gene decorative genes core genes
  • 46. MicrobiomeWe are entering to the new era of omics, a wide variety of large-scale, multi-dimensional biology.
  • 47. Features of Omics approach: high-throughput, data-driven, holistic, top-down methods understanding cell metabolism in one ‘integrated system’ high-output, requires bioinformatics to analyze & manipulate From Standalone Biology to ‘Omics’ Study
  • 49. Understand Genome is Not Enough genomics is static don’t know the set of genes that express in a particular condition some phenotypes are consequent of interaction of gene interaction (emerging property) lot of changes happen in the downstream processes of genetic information (not in DNA)
  • 53. Clustering of Microarray Data microarray data clustering tree on the top and left are just dendrogram always plots between genes versus conditions intensity of each color represents level of expression
  • 56. modified from Venter et al. (2001) Science 291:1304–1351. pter 15 Genomics Transfer/carrier protein (203, 0.7%) Transcription factor (1850, 6.0%) Nucleic acid enzyme (2308, 7.5%) Signaling molecule (376, 1.2%) Receptor (1543, 5.0%) Kinase (868, 2.8%) Select regulatory molecule (988, 3.2%) Transferase (610, 2.0%) Synthase and synthetase (313, 1.0%) Oxidoreductase (656, 2.1%) Lyase (117, 0.4%) Ligase (56, 0.2%) Isomerase (163, 0.5%) Hydrolase (1227, 4.0%) Viral protein (100, 0.3%) Miscellaneous (1318, 4.3%) Cell adhesion (577, 1.9%) Chaperone (159, 0.5%) Cytoskeletal structural protein (876, 2.8%) Extracellular matrix (437, 1.4%) Immunoglobulin (264, 0.9%) Ion channel (406, 1.3%) Motor (376, 1.2%) Structural protein of muscle (296, 1.0%) Proto-oncogene (902, 2.9%) Select calcium-binding protein (34, 0.1%) Intracellular transporter (350, 1.1%) Transporter (533, 1.7%) Molecular function unknown (12,809, 41.7%) Signaltransduction Enzym e Nucleic acid binding None ᭿ FIGURE 15.10 Functional classification of the 26,383 genes predicted by Celera Genomics’ first draft of the sequence of the human genome. Each sector gives the number and percentage of gene products in each functional class in parentheses. Note that some classes overlap: a proto-oncogene, for example, may encode Not All Proteins Are Enzymes
  • 57. Microbiome microbiome = all microbial population localize in a particular habitat (e.g.: human gut, skin, vagina, etc.)
  • 59. Hand–Skin Microbiome 51 college students (after exam) targeting V2 region of bacterial 16S rRNA gene >150 species/palm, intra- and interpersonal variation hand from the same individual share 17% of species-level phylotype women have higher diversity than men