Scaling API-first – The story of a global engineering organization
Bioinformatics.pptx
1. BIOINFORMATICS
Introduction
Bioinformatics is an integration of computer science, mathematical and statistical methods to
manage and analyze the biological information. The three important subdisciplines within
bioinformatics involving computational biology would include:
1. The development and implementation of tools that enable efficient access and management of
different types of information.
2. The analysis and interpretation of various types of data including nucleotide and amino acid
sequences, protein domains, and protein structures.
3. The development of new algorithms and statistics with which to assess relationships among
members of large data sets.
1
2. DNA(DiriboNuclicAcid)
Devil lies iN detAils
Being the core of our say Blood-ie existence the
bioinformatics is turning it into our Blood - E existence.
A few strands of our DNA can hold up to 10^8 Terabytes of
Data upon it and can be used in various ways.
From finding the roots of our existence to curing the incurable
diseases like AIDS or Parkinson’s DNA sequences can be of
great use.
2
3. Sequence formats – FASTA , FASTQ
Alignment formats – BAM, SAM,
CRAM
Stockholm formats – VCF
Generic feature formats – GFF, GTF
Unlabeled formats – BED, Tar.gz,
PDB, PED, MAP, CSV, JSON
FILE FORMATS
Why so many Formats
In modern-day, the many different ways of
generating and using sequencing data have
given rise to the sequence file formats
described above. These file formats have
their own specific use cases depending on:
Compatibility with specific software
Data processing, parsing, and human
readability needs
Efficiency for storage
3
4. DATABASES
The databases may also be divided into:
1) Primary databases
2) Secondary databases or value added databases
The Primary databases contain the data in their original form, taken as such
from the source, e.g. GenBank, DDBJ, EMBL and PIR, PDB, SWISS-PROT,
etc.
The Secondary databases or value added databases contain subclassified annotated data
and information, e.g., OMIM(Online Mendelian Inheritance in Mangene and clinical data),
GDB(Genome Data Base-Human), PROSITE, BLOCKS(Protein Motifs, Metabolism-KEGG,
EcoCye, etc.
4
6. NUCLEOTIDE SEQUENCE ANALYSIS
Sequence Analysis
a) Once a DNA sequence is available in a computer file, computer can
analyze it. The first thing which is done with a sequence is usually a
database similarity search.
b) This technique is the most commonly and most powerful used technique
in bioinformatics. It allows searching a sequence database (e.g., Genbank)
for sequences, which are similar to a given sequence.
6
7. BIOINFORMATICS SOFTWARES AND ITS
APPLICATION
SWISS-MODEL – An automated knowledge based protein modelling server.
3Djigsaw – Three dimensional models for proteins based on homologues of
the known structure.
CPH Models – Automated neural-network based protein modelling server.
Esypred3D – Automated homology modelling program neural networks.
Geno3d – Automatic modelling of protein three-dimensional structure.
SDSCI – Protein Structure Homology Modelling Server.
UgenePlot – An open source software to make workflows related to
Bioinformatics.
7
8. INSILLICO DRUG DESIGN
One of the major thrusts of current bioinformatics approaches is the prediction and
identification of biologically active candidates, and mining and storage of related
information.
Drugs are usually only developed when the particular drug target for those drugs’
actions have been identified and studied.
The number of potential targets for drug discovery process is increasing exponentially.
Mining and warehousing of the human genome sequence using bioinformatics has
helped to define and classify the nucleotide compositions of those genes, which are
responsible for the coding of target proteins, in addition to identifying new targets that
offer more potential for new drugs.
8
9. BIO-COMPUTERS
Biocomputers utilize systems of biologically derived molecules, such as
DNA and proteins, to perform computational calculations involving
storing, retrieving, and processing data.
The implementation of nanobiotechnology, as defined in this narrower
sense, provides scientists with the ability to engineer biomolecular
systems specifically so that they interact in a fashion that can ultimately
result in the computational functionality of a computer.
The promising field of biocomputer research utilizes the science behind
nano-sized biomaterials to create various forms of computational
devices, which may have many potential applications in the future.
9
10. CRISPR
Clustered Regularly Interspaced Short Palindromic Repeats, which are the hallmark of a bacterial
defence system that forms the basis for CRISPR-Cas9 genome editing technology.
CRISPRs were first discovered in Archaea (and later in bacteria) by Francisco
Mojica, a scientist at the University of Alicante in Spain. He proposed that CRISPRs serve
as part of the bacterial immune system, defending against invading viruses.
CRISPR “spacer” sequences are transcribed into short RNA sequences (“CRISPR RNAs”
or “crRNAs”) capable of guiding the system to matching sequences of DNA. When the target DNA
is found, Cas9 – one of the enzymes produced by the CRISPR system – binds to the DNA and
cuts it, shutting the targeted gene off. Using modified versions of Cas9, researchers can activate
gene expression instead of cutting the DNA. These techniques allow researchers to study the
gene’s function.
Research also suggests that CRISPR-Cas9 can be used to target and modify “typos” in the
three-billion-letter sequence of the human genome in an effort to treat genetic disease.
10
11. CRISPR AND ITS APPLICATION
Using CRISPR for genome editing
CRISPR stands for Clustered Regularly Interspaced Short Palindromic
Repeats. CRISPR sequences were originally identified in the Escherichia coli
(E. coli) genome, and were found to function as part of an RNA-based adaptive
immune system to target and destroy genetic parasites at the DNA level
11
12. Once Cas9 nucleases are guided to the target DNA and create a double strand
break 3-4 bases upstream from the PAM sequences, there are two ways the
double strand break (DSB) can be repaired. If there is no donor DNA present,
resolution will occur by error-prone non-homologous end joining (NHEJ),
resulting in an indel that effectively knocks out protein function
12
13. Live Imaging of DNA/mRNA with CRISPR/Cas9
13
DNA visualization is an important application in understanding a variety of cellular
processes, such as replication, transcription, and recombination, and the
interactions between DNA and associated proteins and RNA. Two techniques are
commonly used for DNA imaging, fluorescence in situ hybridization (FISH) and
fluorescent tagging of DNA-binding proteins. FISH uses fluorescently tagged
nucleic acid probes to bind and visualize DNA. While this technique offers the
flexibility to target specific sequences through base pairing of the nucleic acid
probes, it cannot be used for live imaging because of the requirement for sample
fixation. Conversely, proteins tagged with a fluorescent label can be used for live
imaging, but are limited by their fixed target sequences, restricting their use
mostly to repetitive DNA sequences, such as telomeres.
14. CRISPR for Transcriptional Activation and Repression
Several research groups have harnessed the specificity and easy re-
programmability of the CRISPR/Cas9 system to create targetable CRISPR/Cas9
ribonucleoprotein complexes that can either activate (CRISPRa) or interfere
(CRISPRi) with transcription of any desired coding region within a genome.
These systems fuse dCas9 to a well-characterized transcription-regulatory
domain, using pre-designed guide RNAs to direct the complex upstream of the
transcription start site. By using inactivated dCas9 protein, the complex can be
targeted to specific loci without cleaving or altering the genomic DNA. After Cas9
binds the targeted DNA sequence, the fused transcription-regulatory domains are
then able to recruit repressive or activating effectors to modify gene expression.
14