Experimental methods and the big data sets

Improved Medical Education in Basic
Sciences
for Better Medical Practicing
ImproveMEd
Systems biology for medicine
II. Experimental methods and the big data sets

Experiments in systems biology do not need to be omic-scale to satisfy criteria of system biology!
Consider experiments focused on subsystems like e.g. tracking energy metabolism relevant mRNA under
different feeding regime, or at multiple time points from onset of feeding.
What system biology differ from common scientific research are:
1. Quantitative data – where, how much and how fast (dynamic, time-dependance) will an entity change
2. Computational models - simulations based on precise quantification and timing
Still need positive and negativ controls and at least 3 replicates!
2. Hypothesis driven studies that
follow targeted subset of molecules
(or targeted organelle) – small scale
systems biology.
1. Hypothesis generating studies!

Molecules:
• DNAs microarrays & sequencing based technologies
• RNAs mRNA sequencing
• Proteins Mass spectrometry-based proteomics
• Lipids liquid chromatography & Mass spectrometry
• Metabolites liquid chromatography Mass spectrometry

Population Average Techniques
• Population of cells or tissue
sample ≈ 1 million cells or more
• Average over many cells
Single cell techniques
• Single cell sample
• Cell-to-cell variability
HepaRG stable cell lineLiver tissue Hepatocyte spheroids

Microarrays – differential
expression
• Dominating technique in 2000s
• Predominantly used for measuring transcript levels
• Other uses: genotyping, DNA mapping (copy number
variation), DNA methylation
• Microarray consists of spots printed with different
oligonucleotide
• Hybridization between fluorescently labeled sample
(probes) and printed oligonucletides
• GEO database
https://www.ncbi.nlm.nih.gov/geo/info/qqtutorial.ht
ml

Array CGH
comparative genomic hybridization
• molecular cytogenetic method/ cariotypisation
• CNV relativ to test sample
• Based on assumtion that 2 samples from closely
related individuals differ (healthy and sick) in gain
or loss of chromosome or a chromosomal region
• Large scale analysis of tumor-specific DNA
unbalanced rearangements with resolutio of 5-10
megabases
• OMIM database
https://www.ncbi.nlm.nih.gov/omim

Microarrays Methylation Assay -
epigenome
• Epigenetic regulation of gene expression important in development, genetic
imprinting, tumorogenesis…
• Genome wide CpG methylation level at 30 000 till 500 000 CpG loci covering
whole genome
• The first step is bisulfite conversion of samples – unmethylated sites convert C
into U, while methylated lose methylation
• Converted DNA is further amplified (and U is exchanged for T)
• Fragmented and denatured oligonucleotides are hybridized with two types of
allel specific beads for each locus
• The final nucleotide of anneiled allel specific oligonucleotide is futher
extended by labeled dDNT
• Software calculate relative fluorecence of each locus and grade as 0, 0.5 or 1
(homozygous unmethylated, heterozygous, or homozygous methylated)
• ENCODE
https://genome.ucsc.edu/encode/dataMatrix/encodeChipMatrixHuman.html

Sequencing-based technologies
• Starts with DNA or RNA isolation followed by
either DNA fragmentation or cDNA synthesis
• Next step is amplification (clone fragments
into vector, trasform bacteria with vectors
and amplify)
• Parallel sequencing of many short DNA
pieces
• Assemble of contiguous fragments

Whole-genome sequencing (WGS)
vs. whole-exome sequencing (WES)
• WGS attempts to sequence entire genome. Some
sequences are technically challanging to sequence
(telomeres, centromeres, high CG content or
repeating loci) and they been missed by common
sequencing platforms what results in 95-98% of
genome coverage, but on uniformed way.
• WES sequence just exomes (coding sequences) or 2%
of genome, by each exomes is sequences 30-100x
(high depth). DNA-RNA hybridization is used for
selection of coding regions what causes over-
representation of so called ’hot-spots’ and unde-
representation of missed variants. Quicker, chipper
and easier data analysis, but low overall coverage.
uniform
bias

Exome sequencing use an enrichment
step to select targeted DNA
• Mendelian disorders often disrupt protein-coding
regions
• Exom is a good source of rare disease variants
• Microarreys can be used for fragment isolation
• WES gives high depth of sequencing
• Bamshat et al. (2011) Nature Review Genetics

Cost per genome won the race and WGS
becomes a preferable method, especially for
tumor-specific rearrangements
Evolution of sequencing methodes
1. The first generation Sanger sequencing;
chain termination technology – ddNTPs used
to terminate chain synthesis further
separated by capillary electrophoreses (one
strand at a time, slow, accurate, expensive)
2. The second (next) generation sequencing –
sequencing by synthesis – introduction of
nano-technologie - parallel sequencing and
no need for separation step – limits in read
length
3. The third generation sequencing –
immobilized polymerase + fluorescent dNTP
+ excelent optics – detection of base
incorporation and base modification

Data coming out WGS or WES need
verification done over a large
population! (more than 1000
individuals)
The database of Genotypes and
Phenotypes (dbGaP)
https://www.ncbi.nlm.nih.gov/gap
Genome-wide association studies
Foo et al. (2012) Nature Review Neurology

RNA-Seq
transcriptome
• Parallel sequencing of mRNA, rRNA, miRNA…
• Gene expression profile
• Quantitative method ideal for systems
biology experiments
• Identification of alternative splicing variants
and new transcript variants
• GEO database
https://www.ncbi.nlm.nih.gov/geo/info/qqt
utorial.html
• Target Scan databas (miRNA)
http://www.targetscan.org/vert_71/ Wang et al.(2009) Nature Reviews Genetics

RNA-seq become a dominate transcriptome
methode and replaced microarray
Wang et al.(2009) Nature
Reviews Genetics

ChIP-seq
Chromatin Immunoprecipitation
Sequencing
• Combines precipitation of transcription
factors or other DNA binding proteins (using
specific antibodies) or histone modifications
and deep sequencing of co-
immunoprecipitated DNA
• Discovers regulatory sequences, promoters,
enhancers, silencers, spacers…
• Functional organization of genome

Western Blot was the first method toward
quantification and identification of more
than one protein
• Semi- quantitative because of non-linear
kinetics behind enzyme –labeling of
secondary antibodies and substrate reaction
• Also has non-linear kinetics in the case of
chemiluminescence reaction or X-ray film
exposure
• LICOR is infrared fluorescence that gives
little background and fluorescent signal is
linearly proportional to amount of antibody
• https://www.licor.com/bio/applications/quan
titative_western_blots/

Forward and Reverse Phase
Protein Array (FPPA &RPPA)
• Attempt to transforme low throughput Western blot
methode (8-16 lanes for samples) into high
throughput method
• In forward version one antibodie is spotted on slide
and mane samples are probed for presence of epitop
• In reverse version many antibodies (with high
specificity) are spoted on slide and one sample is
probed for all antibodies.
• Requires high specificity antibodies, expensive.

Mass Spectrometry
proteomics, lipidomics,
metabolomics
• Proteomics often use tandem mass spectrometry (two mass
specs done in paralle)
• Quantitative because measures total level of proteins
• Gives detailes about post-translational modifications
• If combinened with immunoprecipitation gives information
about protein interactions
• Involves many steps: separation, digestion, enrichment,
repeated separation, ionization, mass filtering (MS1),
fragmentation and mass analysis (MS2), identification,
quantification
• Many variations

Mass Spectrometry
metabolomics
The final step is based on the large data base
of known peptides and bioinformatics search
for matches.
Different version of MS and different data
bases are used for lipidomics and
metabolomics.
UniProtKB database
https://web.expasy.org/docs/swiss-
prot_guideline.html

Proteomics analysis decoded differences between MAPK
pathway in normal and tumor cell.
Choudhary & Mann (2010) Nature Reviews Molecular and Cell Biology

metabolomics
The final step is based on the large data base
of known peptides and bioinformatics search
for matches.
Different version of MS and different data
bases are used for lipidomics and
metabolomics.
UniProtKB database
https://web.expasy.org/docs/swiss-
prot_guideline.html

Liquid chromatography (LC) &
LC/MS
lipidomics, metabolomics
Palermo et al. (2017) Analytica Chimica Acta
Very often LC & MS are combined
and used in sequence. Also,
lipidomic & metabolomics could be
done in sequence on the same
samples.
LC is used to divide sample in
fractions, as an separation
technique, while MS is used for
identification of molecules.

Systems biology experiments collect the
big data using a high throughput methods:
• DNAs microarrays & sequencing based technologies (NGS) exome/genome
• DNA + regulatory proteins ChIP-seq ReMap
• RNAs RNA-seq transcriptome
• Proteins MS proteome
• Lipids LC/MS lipidom
• Metabolites LC/MS metabolom

Experimental methods and the big data sets

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Experimental methods and the big data sets

Semelhante a Experimental methods and the big data sets (20)

Mais de improvemed

Mais de improvemed (20)

Último

Último (20)

Experimental methods and the big data sets