SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
Bioinformatics – An Overview
Kudipudi.Srinivas

Research Scholar, Dept of Computer Science, S.V.K.P & Dr.K.S Raju Atrs & Science College,Penugonda-534320, India
Kudipudi_sri@yahoo.com

ABSTRACT : This presentation gives an overview of Bioinformatics covering major databases
available online as well as at major research centers. The major databases called mother databases
are the nucleic acid databases and protein sequence databases. Bioinformatics has been visualized
as an interface between biological information and information technology that are employed for
Protein sequencing, DNA sequencing etc. The concept of Transcription and Translation processes
are explained by the central dogma of molecular biology, which states that the sequences of a strand
of DNA correspond to the amino acid sequence of a protein. Representation of two or more
sequences can be compared by alignment methods such as Pairwise and Multiple alignments. Some
database search tools like BLAST, FASTA are some of the programs which do intensive pairwise
alignment of our query sequence to all the database sequence entries and gives out the sequences
with best scores. Phylogenetic methods are used to reconstruct the relationships between
macromolecular sequences finding the genetic connections and relationships between species. The
paper also explains the application of bioinformatics in the various industries e.g. Food,
Pharmaceutical, Agricultural, Medical, etc., and the technologies that have enabled the analysis of
biological problems in multiple dimensions.


Keywords: Protein, DNA, FASTA, BLAST, Phylogenetic Tree, Orthologus



Introduction:
    •    Bioinformatics is the application of computational techniques to the management and analysis
         of biological information.


    •    Bioinformatics describes using computational techniques to access, analyze, and interpret the
         biological information in any of the available biological databases.
1. DATABASES:
  1.1. Primary Databases
  Sequences obtained by various sequencing techniques like
  •   EST: Expressed Sequence Tags
  •   GSS: Genome Survey Sequences
  •   STS: Sequence Tagged Sites and
  •   HTG: High Throughput Sequences
      have been put in different nucleic acid and protein databases, which can be accessed by the
      people all over the world through World Wide Web. The major databases called mother
      databases are the nucleic acid and protein sequence.


      1.1.1. Nucleic Acid Databases:
             The nucleic acid sequence databases consists of complete annotation of all the
      nucleic acid sequences (DNA and RNA) like information of organism (source) from regions,
      date on which it is sequenced etc.,
      The major nucleic acid data bases are:
      •   European Molecular biology laboratory(EMBL)
          http://www.ebi.ac.uk/
      •   GenBank (National center for Biotechnology Information ,NCBI)
          http://www.ncbi.nlm.nih.gov/
      •   DNA databank of Japan (DDBJ).
      http://www.ddbj.nig.ac.jp/
      These are three databases under mutual collaboration facilitate the mutual exchange of data
      everyday.


      1.1.2. Protein Sequence Databases:
             A protein sequence database consists of information of all the proteins that have been
      translated from the RNA sequences and the proteins sequenced by methods like N-terminal
      sequencing.
      The major protein sequence databases are
      •   Protein Information Resource(PIR)
          http://pir.georgetown.edu/
      •   Swiss-Prot
          http://us.expasy.org/sprot/
1.2. Secondary Databases:


       The derived databases which are obtained by making use of the sequence information
   available in the primary databases are called secondary databases. Databases like,
   CUTG: Codon Usage Database of Japan
   COGS: Cluster of Orthologus Groups of Protein from NCBI
   PROSITE for regular expressions
   PRINTS having aligned motifs and
   BLOCKS having aligned motifs as blocks are fine examples of secondary databases.


   1.3. Structure Databases:
       The major structure databases consist of the structural data of the proteins or DNA whose
structure has been determined by either X-ray crystallography or NMR (Nuclear Magnetic
Resonance). Protein Data Bank gives details of the coordinates bond angles, torsion angles of
various proteins and nucleic acid database gives the same details about DNA and its types i.e., A-
DNA or B-DNA etc.,
Protein Data Bank (PDB)
http://www.resb.org/pdb/
The Nucleic Acid Databases (NDB)
http://ndbserver.rutgers.edu/NDB/ndb.html
Cambridge Structural Databases (CSD)
http://www.ccdc.cam.ac.uk/


       These databases are an organized way to store the tremendous amount of sequence
information that accumulates from laboratories worldwide. Each database has its own specific
format. Three major database organizations around the world are responsible for maintaining most of
this data; they largely ‘mirror’ one another.
2. The Central Dogma of Biology:




                     Central Dogma: Flow of Information


       This concept is explained by the central dogma of molecular biology, which states that the
sequences of a strand of DNA correspond to the amino acid sequence of a protein.




   2.1. Transcription


          Transcription is the process where messenger RNA (mRNA) molecules are synthesized
   from DNA molecules. Transcription takes place in the nucleus. During transcription only one of
   the strands of DNA corresponding to a gene (template strand) is copied into mRNA. This mRNA
   molecule will be complementary to the bases that compose the template strand. The mRNA
   molecules have short lives. They travel out to the cytoplasm where they direct the synthesis of a
   Protein              and             then             they              are             destroyed.
Transcription depends on complementary base pairings. A pairs with U, U with A, C with
G and G with C. Only one of the DNA molecules is transcribed and therefore the resulting mRNA
molecule is single stranded. The amount of transcription of any given gene can be directly
controlled by the cell. Once the mRNA molecules leave the nucleus and enter the cytoplasm, they
are loaded onto the ribosome. It is at the ribosomes that protein synthesis occurs by a process
called translation. The ribosomes are composed of ribosomal RNA (rRNA) proteins and ribosomal
proteins.


2.2. Translation


       Translation is the process where mRNA molecules
are translated into proteins at the ribosome. The nucleotides
of the mRNA molecule are read by the ribosome so that
each set of three nucleotides called a codon, specifies a
single amino acid. Therefore, the first three nucleotides of
the mRNA will encode the first amino acid, the second three
bases the second amino acid and so on. The rules by which
the base sequence of the mRNA molecule is translated into
the   primary   amino   acid   sequence     of   a   protein    are   called   the   genetic   code.
       There are 64 different possible codons (this is because there are 4 bases: A, U, C, G, and
each codon has 3 bases, so 43 = 64) and 20 amino acids. Some codons code for more than one
amino acid and therefore the genetic code is said to be degenerate. No codon codes for more
than one amino acid.
       Three of the codons do not specify the incorporation of any amino acids. These are known
as the stop codons - UAA, UAG and UGA. They are found at the end of the mRNA coding
sequence and they tell the ribosome to stop translating the message and release the protein. The
mRNA is translated from the 5' end and read one codon at a time to the 3' end. Translation
usually starts at a start codon (AUG) which codes for methionine.
       Each successive codon is read and the amino acid incorporated into the protein chain until
a stop codon is encountered. The codons in a mRNA molecule do not directly recognize the
amino acids that must be incorporated. Instead this process is directed by a group of adapter
proteins called transfer RNAs (tRNAs). Every codon, except the stop codons, has its own tRNA
molecule. A tRNA molecule has an anti-codon end, which is made of a set of three base pairs.
These base pairs can base pair with the complementary codon in the mRNA. The 3' end of a
tRNA molecule is attached to an amino acid. In the translation process, a ribosome reads a
   mRNA molecule codon by codon.


           At each codon, a tRNA molecule with an anti-codon complementary to that codon attaches
   to the mRNA. It brings with it the appropriate amino acid that is then incorporated into the growing
   polypeptide chain. Once the amino acid has been added, the tRNA molecule is released and the
   ribosome moves onto reading the next codon in the mRNA chain. This process continues until the
   ribosome reads a stop codon. At this point the ribosome releases the mRNA molecule and the
   completed protein. The tRNA molecule functions as an interpreter reading codons in the mRNA
   molecule and translating them into amino acids. In this way, the sequence of base pairs in a given
   gene determines the amino acid sequence of the protein.


3. Alignment:
        Representation of two or more protein or nucleotide sequences where homologous amino
acids or nucleotides are in the same columns while missing amino acids or nucleotides replaced with
gaps.


   3.1. Pair wise Alignment:
        Pairwise alignment, in which only two sequences are compared. Two sequences can be
   compared either by global alignment or local alignment. In global alignment the sequences are
   stretched over the entire length to get the maximum number of matches and minimum number of
   gaps. In local alignment, the alignment is restricted or stopped at the region, which is having the
   number of matches of similarity. Local alignment uses Smith and Waterman algorithms and
   Global alignment uses Needleman and Wunsch algorithms. The best alignment is chosen by the
   alignment having maximum score, which is obtained for matches and negative scores for gaps
   and mismatches.
        Pairwise alignment is used to find the function of unknown genes or proteins by finding similar
   sequences of known function. Comparing the unknown sequence with that of the whole nucleic
   acid or protein databases does this. Some database search tools like BLAST, FASTA are some of
   the programs which do intensive pairwise alignment of our query sequence to all the database
   sequence entries and gives out the sequences with best scores.
3.2. Multiple Alignment :
      Multiple alignment , in which more than two sequences are compared, is used for finding
conserved regions among gene sequences and protein sequences, to study phylogenetic
relationship of macromolecular sequences i.e., to find evolutionarily related organisms. The major
multiple alignment software are clustalW, clustalX and Tcofee.


ClustalW: It is a general purpose multiple sequence alignments program for DNA or proteins
sequences. It gives biologically meaningful multiple sequence alignments of divergent sequences
and calculates the best match for the selected sequences, and lines them up so that the identities,
similarities and differences can be seen. Cladograms or Phylograms obtained is used to see the
evolutionary relationships between species. This can be either downloaded are used online at
http://www.ebi.ac.uk/clustalW/. ClustalX is the X-window based user-friendly version of clustalW,
which can be downloaded and used locally on our machine. Tcofee is more accurate than clustalW
for    sequences       with     less    than         30%   identity,   but      it   is    slower.
http://www.ch.embnet.org/software/TCoffee.html


  Basic Local Alignment Search Tool (BLAST):
         BLAST is the heuristic search algorithm for sequence similarity searching – for example to
  identify homologs to a query sequence. If a particular sequence is submitted to BLAST program, it
  searches with the whole database sequences of users’ choice and in the result produces those
  sequences that are showing percent identity of more than a particular threshold value. The
  threshold value is set depending on user choice.
  BLASTing Protein sequences:
         BLASTing protein sequences is what we want to do if we already have a protein sequence
  and we want to find other similar protein sequences in a sequence database. Two flavors of
  BLAST that exist and deal with proteins are
            blastp : Compares a protein sequence with a protein database.
            tblastn : Compares a protein sequence with a nucleotide database.


  FASTA:
         FASTA is the first widely used program for database similarity searching. For nucleotide
  searches, FastA may be more sensitive than BLAST. FastA can be very specific when identifying
  long regions of low similarity especially for highly diverged sequences. FastA submission form
  can be obtained at http://www.ebi.ac.uk/fasta33/
4. Phylogenetic Analysis:
       Phylogenetic methods are used to reconstruct the relationships between macromolecular
sequences finding the genetic connections and relationships between species. The results of
phylogenetic analysis may be depicted as a hierarchical branching diagram, a ‘cladogram’ or
‘phylogenetic    tree’.    Programs        for    Phylogenetic       analysis     are     available     at
http://evolution.genetics.washington.edu/phylip.html. This software can be downloaded free of cost
and used locally or it can be used online at http://bioportal.bic.nus.edu.sg/phylip/. Tree view and
phylodraw are the major user – friendly software to show the hierarchical clustering in different
formats used for publishing and easy analyzing. Other than this phylip software there are other
software like PAUP, Mega, TreeconW and Winboot popular for phylogenetic analysis.


5. Applications of Bioinformatics
   5.1. Food Industry:
          Functional genomics is playing a major role in food biotechnology industry. The complete
   genome sequence information available in different databases generates information that can be
   used for finding metabolic pathways, various digestive enzymes, improving cell factories and
   development of novel presentation methods. The information about the various microbes, which
   assist in food digestion like E.coli, also plays a vital role in the major achievements of the food
   industry using Bioinformatics.


   5.2. Agriculture:
          Crops are improved by producing plants that have disease resistant genes to pathogens
   like fungui and bacteria. Homology searches, finding conserved motifs, and molecular modeling is
   useful in identifying disease resistant genes. Pesticides and insecticides that can efficiently kill the
   pathogens and pests are designed by molecular modeling.


   5.3. Pharmaceutical industry and Medical science:
          Bioinformatics, computational biology and cheminformatics are playing a key role in
   pharmaceutical industry to design new drug targets from genomic data at a very faster rate.
   Disease causing genes are identified using the tools of genomics and proteomics. Drug lead
   identification and drug optimization became easy using the tools of genomics and proteomics. Not
   only drugs, pharmaceutical industry is using the sequence information in the production of
   vaccines and therapeutic proteins. The processes of designing a new drug using bioinformatics
tools has been of great help in identifying Target Disease, interesting lead compounds, and by
    docking studies finding the effective interaction between the drug and the compound.
            Pharmacoinformatics is the area of Medical Informatics concerned with modeling and
    simulation of the behavior of drugs, and control of such behavior by individualized dosage
    regimens for each patient to achieve explicitly chosen therapeutic goals. The credibility of serum
    concentration data is a major factor in such modeling.
            Medical informatics is a scientific discipline, which is concerned with the systematic
    processing of data, information and knowledge in medicine and health care. Computerization of
    the patient record is expected to resolve long – standing problems with the current paper – based
    system.


6. Bioinformatics in India


        In India there are various research and development units, centers and sub centers,
pharmaceuticals industries doing research on various aspects of           bioinformatics like proteomics,
genomics, developing sequence analysis tools, molecular modeling, drug designing etc. Department
of Biotechnology(DBT), New Delhi have emphasized on starting Bioinformatics centers with the help
of BTISnet (Biotechnology Information System) for the proper application of Bioinformatics in various
sectors of science and technology for the benefit of researchers. DBT has sponsored various
Bioinformatics Distributed Information Centers (DICs) and Distributed Information sub Centers (Sub –
DICs) all over India.


           The list of the DICs and the Sub DICs can be seen in the following websites.
      http://dbtindia.nic.in/btis/dic.html
      http://dbtindia.nic.in/bits/subdic.html




References:


1. Bioinformatics – A Beginner’s Guide by Jean - Michel Claverie, PhD & Cedric Notredame, PhD
2. Introduction to Bioinformatics by Arthu

Mais conteúdo relacionado

Mais procurados

Dna structure for learners to understand better
Dna structure for learners to understand betterDna structure for learners to understand better
Dna structure for learners to understand betterCharles Monaledi
 
2 introduction to cell biology
2 introduction to cell biology2 introduction to cell biology
2 introduction to cell biologysaveena solanki
 
protein-protein interaction
protein-protein  interactionprotein-protein  interaction
protein-protein interactionZeshan Haider
 
Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004Robin Gutell
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics IINikolay Vyahhi
 
Internship Report
Internship ReportInternship Report
Internship ReportNeha Gupta
 
Mitochondrial gene expression
Mitochondrial gene expression Mitochondrial gene expression
Mitochondrial gene expression Ibad khan
 
DNA and Protein Synthesis
DNA and Protein SynthesisDNA and Protein Synthesis
DNA and Protein SynthesisNarendra Manwar
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Lec 12 level 3-nu (gene expression and synthesis of protein)
Lec 12 level 3-nu (gene expression and synthesis of protein)Lec 12 level 3-nu (gene expression and synthesis of protein)
Lec 12 level 3-nu (gene expression and synthesis of protein)dream10f
 
Chapter 12.3 dna,rna and protein
Chapter 12.3  dna,rna and proteinChapter 12.3  dna,rna and protein
Chapter 12.3 dna,rna and proteinValerie Evans
 

Mais procurados (20)

Dna structure for learners to understand better
Dna structure for learners to understand betterDna structure for learners to understand better
Dna structure for learners to understand better
 
2 introduction to cell biology
2 introduction to cell biology2 introduction to cell biology
2 introduction to cell biology
 
protein-protein interaction
protein-protein  interactionprotein-protein  interaction
protein-protein interaction
 
Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004
 
Unit 1 transcription
Unit 1 transcriptionUnit 1 transcription
Unit 1 transcription
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Biomedical genomics lecture
Biomedical genomics lectureBiomedical genomics lecture
Biomedical genomics lecture
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics II
 
Gene control and function
Gene control and functionGene control and function
Gene control and function
 
Internship Report
Internship ReportInternship Report
Internship Report
 
Ppi
PpiPpi
Ppi
 
Mitochondrial gene expression
Mitochondrial gene expression Mitochondrial gene expression
Mitochondrial gene expression
 
Central dogma
Central dogmaCentral dogma
Central dogma
 
DNA and Protein Synthesis
DNA and Protein SynthesisDNA and Protein Synthesis
DNA and Protein Synthesis
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Lec 12 level 3-nu (gene expression and synthesis of protein)
Lec 12 level 3-nu (gene expression and synthesis of protein)Lec 12 level 3-nu (gene expression and synthesis of protein)
Lec 12 level 3-nu (gene expression and synthesis of protein)
 
Chapter 12.3 dna,rna and protein
Chapter 12.3  dna,rna and proteinChapter 12.3  dna,rna and protein
Chapter 12.3 dna,rna and protein
 

Destaque

FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochureAffymetrix
 
Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Affymetrix
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsAndrea Ujvari
 
Solutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureSolutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureAffymetrix
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Ed Dodds
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNAUlises Urzua
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisYaoyu Wang
 
Methods in molecular_biology
Methods in molecular_biologyMethods in molecular_biology
Methods in molecular_biologyDr. Khuram Aziz
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 

Destaque (10)

FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochure
 
Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
 
Solutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureSolutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochure
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...
 
Synthetic Biology
Synthetic BiologySynthetic Biology
Synthetic Biology
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression Analysis
 
Methods in molecular_biology
Methods in molecular_biologyMethods in molecular_biology
Methods in molecular_biology
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Semelhante a 916215 bioinformatics-over-view

L-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxL-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxMithilaBanik
 
Dna and protein synthesis
Dna and protein synthesisDna and protein synthesis
Dna and protein synthesisPaula Mills
 
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and TranslationChapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translationj3di79
 
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterAnalysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
 
Provide an in depth description of biological information transfer (.pdf
Provide an in depth description of biological information transfer (.pdfProvide an in depth description of biological information transfer (.pdf
Provide an in depth description of biological information transfer (.pdfMALASADHNANI
 
IB Biology 2.7 Slides: Transcription & Translation
IB Biology 2.7 Slides: Transcription & TranslationIB Biology 2.7 Slides: Transcription & Translation
IB Biology 2.7 Slides: Transcription & TranslationJacob Cedarbaum
 
1. Explain how a gene directs the synthesis of a protein. Give the l.pdf
1. Explain how a gene directs the synthesis of a protein. Give the l.pdf1. Explain how a gene directs the synthesis of a protein. Give the l.pdf
1. Explain how a gene directs the synthesis of a protein. Give the l.pdfarjunanenterprises
 
RNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONSRNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONSSushrutMohapatra
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30Ahmadreza Rafati Roudsari
 
7.3 translation
7.3 translation 7.3 translation
7.3 translation dabagus
 
CELL REPLICATION.pptx
CELL REPLICATION.pptxCELL REPLICATION.pptx
CELL REPLICATION.pptxRizaCatli2
 
• Define transcription• Define translation• What are the 3 steps.pdf
• Define transcription• Define translation• What are the 3 steps.pdf• Define transcription• Define translation• What are the 3 steps.pdf
• Define transcription• Define translation• What are the 3 steps.pdfarihantelehyb
 

Semelhante a 916215 bioinformatics-over-view (20)

L-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxL-1_Nucleic acid.pptx
L-1_Nucleic acid.pptx
 
Introduction
IntroductionIntroduction
Introduction
 
Lesson 13.2
Lesson 13.2Lesson 13.2
Lesson 13.2
 
Dna and protein synthesis
Dna and protein synthesisDna and protein synthesis
Dna and protein synthesis
 
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and TranslationChapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
 
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterAnalysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir Filter
 
Provide an in depth description of biological information transfer (.pdf
Provide an in depth description of biological information transfer (.pdfProvide an in depth description of biological information transfer (.pdf
Provide an in depth description of biological information transfer (.pdf
 
Microbial genetics lectures 10, 11, and 12
Microbial genetics lectures 10, 11, and 12 Microbial genetics lectures 10, 11, and 12
Microbial genetics lectures 10, 11, and 12
 
IB Biology 2.7 Slides: Transcription & Translation
IB Biology 2.7 Slides: Transcription & TranslationIB Biology 2.7 Slides: Transcription & Translation
IB Biology 2.7 Slides: Transcription & Translation
 
1. Explain how a gene directs the synthesis of a protein. Give the l.pdf
1. Explain how a gene directs the synthesis of a protein. Give the l.pdf1. Explain how a gene directs the synthesis of a protein. Give the l.pdf
1. Explain how a gene directs the synthesis of a protein. Give the l.pdf
 
Da2 (1)
Da2 (1)Da2 (1)
Da2 (1)
 
Genetics
GeneticsGenetics
Genetics
 
RNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONSRNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONS
 
Nucleic acids
Nucleic   acidsNucleic   acids
Nucleic acids
 
protein synthesis
protein synthesisprotein synthesis
protein synthesis
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30
 
7.3 translation
7.3 translation 7.3 translation
7.3 translation
 
CELL REPLICATION.pptx
CELL REPLICATION.pptxCELL REPLICATION.pptx
CELL REPLICATION.pptx
 
• Define transcription• Define translation• What are the 3 steps.pdf
• Define transcription• Define translation• What are the 3 steps.pdf• Define transcription• Define translation• What are the 3 steps.pdf
• Define transcription• Define translation• What are the 3 steps.pdf
 

916215 bioinformatics-over-view

  • 1. Bioinformatics – An Overview Kudipudi.Srinivas Research Scholar, Dept of Computer Science, S.V.K.P & Dr.K.S Raju Atrs & Science College,Penugonda-534320, India Kudipudi_sri@yahoo.com ABSTRACT : This presentation gives an overview of Bioinformatics covering major databases available online as well as at major research centers. The major databases called mother databases are the nucleic acid databases and protein sequence databases. Bioinformatics has been visualized as an interface between biological information and information technology that are employed for Protein sequencing, DNA sequencing etc. The concept of Transcription and Translation processes are explained by the central dogma of molecular biology, which states that the sequences of a strand of DNA correspond to the amino acid sequence of a protein. Representation of two or more sequences can be compared by alignment methods such as Pairwise and Multiple alignments. Some database search tools like BLAST, FASTA are some of the programs which do intensive pairwise alignment of our query sequence to all the database sequence entries and gives out the sequences with best scores. Phylogenetic methods are used to reconstruct the relationships between macromolecular sequences finding the genetic connections and relationships between species. The paper also explains the application of bioinformatics in the various industries e.g. Food, Pharmaceutical, Agricultural, Medical, etc., and the technologies that have enabled the analysis of biological problems in multiple dimensions. Keywords: Protein, DNA, FASTA, BLAST, Phylogenetic Tree, Orthologus Introduction: • Bioinformatics is the application of computational techniques to the management and analysis of biological information. • Bioinformatics describes using computational techniques to access, analyze, and interpret the biological information in any of the available biological databases.
  • 2. 1. DATABASES: 1.1. Primary Databases Sequences obtained by various sequencing techniques like • EST: Expressed Sequence Tags • GSS: Genome Survey Sequences • STS: Sequence Tagged Sites and • HTG: High Throughput Sequences have been put in different nucleic acid and protein databases, which can be accessed by the people all over the world through World Wide Web. The major databases called mother databases are the nucleic acid and protein sequence. 1.1.1. Nucleic Acid Databases: The nucleic acid sequence databases consists of complete annotation of all the nucleic acid sequences (DNA and RNA) like information of organism (source) from regions, date on which it is sequenced etc., The major nucleic acid data bases are: • European Molecular biology laboratory(EMBL) http://www.ebi.ac.uk/ • GenBank (National center for Biotechnology Information ,NCBI) http://www.ncbi.nlm.nih.gov/ • DNA databank of Japan (DDBJ). http://www.ddbj.nig.ac.jp/ These are three databases under mutual collaboration facilitate the mutual exchange of data everyday. 1.1.2. Protein Sequence Databases: A protein sequence database consists of information of all the proteins that have been translated from the RNA sequences and the proteins sequenced by methods like N-terminal sequencing. The major protein sequence databases are • Protein Information Resource(PIR) http://pir.georgetown.edu/ • Swiss-Prot http://us.expasy.org/sprot/
  • 3. 1.2. Secondary Databases: The derived databases which are obtained by making use of the sequence information available in the primary databases are called secondary databases. Databases like, CUTG: Codon Usage Database of Japan COGS: Cluster of Orthologus Groups of Protein from NCBI PROSITE for regular expressions PRINTS having aligned motifs and BLOCKS having aligned motifs as blocks are fine examples of secondary databases. 1.3. Structure Databases: The major structure databases consist of the structural data of the proteins or DNA whose structure has been determined by either X-ray crystallography or NMR (Nuclear Magnetic Resonance). Protein Data Bank gives details of the coordinates bond angles, torsion angles of various proteins and nucleic acid database gives the same details about DNA and its types i.e., A- DNA or B-DNA etc., Protein Data Bank (PDB) http://www.resb.org/pdb/ The Nucleic Acid Databases (NDB) http://ndbserver.rutgers.edu/NDB/ndb.html Cambridge Structural Databases (CSD) http://www.ccdc.cam.ac.uk/ These databases are an organized way to store the tremendous amount of sequence information that accumulates from laboratories worldwide. Each database has its own specific format. Three major database organizations around the world are responsible for maintaining most of this data; they largely ‘mirror’ one another.
  • 4. 2. The Central Dogma of Biology: Central Dogma: Flow of Information This concept is explained by the central dogma of molecular biology, which states that the sequences of a strand of DNA correspond to the amino acid sequence of a protein. 2.1. Transcription Transcription is the process where messenger RNA (mRNA) molecules are synthesized from DNA molecules. Transcription takes place in the nucleus. During transcription only one of the strands of DNA corresponding to a gene (template strand) is copied into mRNA. This mRNA molecule will be complementary to the bases that compose the template strand. The mRNA molecules have short lives. They travel out to the cytoplasm where they direct the synthesis of a Protein and then they are destroyed.
  • 5. Transcription depends on complementary base pairings. A pairs with U, U with A, C with G and G with C. Only one of the DNA molecules is transcribed and therefore the resulting mRNA molecule is single stranded. The amount of transcription of any given gene can be directly controlled by the cell. Once the mRNA molecules leave the nucleus and enter the cytoplasm, they are loaded onto the ribosome. It is at the ribosomes that protein synthesis occurs by a process called translation. The ribosomes are composed of ribosomal RNA (rRNA) proteins and ribosomal proteins. 2.2. Translation Translation is the process where mRNA molecules are translated into proteins at the ribosome. The nucleotides of the mRNA molecule are read by the ribosome so that each set of three nucleotides called a codon, specifies a single amino acid. Therefore, the first three nucleotides of the mRNA will encode the first amino acid, the second three bases the second amino acid and so on. The rules by which the base sequence of the mRNA molecule is translated into the primary amino acid sequence of a protein are called the genetic code. There are 64 different possible codons (this is because there are 4 bases: A, U, C, G, and each codon has 3 bases, so 43 = 64) and 20 amino acids. Some codons code for more than one amino acid and therefore the genetic code is said to be degenerate. No codon codes for more than one amino acid. Three of the codons do not specify the incorporation of any amino acids. These are known as the stop codons - UAA, UAG and UGA. They are found at the end of the mRNA coding sequence and they tell the ribosome to stop translating the message and release the protein. The mRNA is translated from the 5' end and read one codon at a time to the 3' end. Translation usually starts at a start codon (AUG) which codes for methionine. Each successive codon is read and the amino acid incorporated into the protein chain until a stop codon is encountered. The codons in a mRNA molecule do not directly recognize the amino acids that must be incorporated. Instead this process is directed by a group of adapter proteins called transfer RNAs (tRNAs). Every codon, except the stop codons, has its own tRNA molecule. A tRNA molecule has an anti-codon end, which is made of a set of three base pairs. These base pairs can base pair with the complementary codon in the mRNA. The 3' end of a
  • 6. tRNA molecule is attached to an amino acid. In the translation process, a ribosome reads a mRNA molecule codon by codon. At each codon, a tRNA molecule with an anti-codon complementary to that codon attaches to the mRNA. It brings with it the appropriate amino acid that is then incorporated into the growing polypeptide chain. Once the amino acid has been added, the tRNA molecule is released and the ribosome moves onto reading the next codon in the mRNA chain. This process continues until the ribosome reads a stop codon. At this point the ribosome releases the mRNA molecule and the completed protein. The tRNA molecule functions as an interpreter reading codons in the mRNA molecule and translating them into amino acids. In this way, the sequence of base pairs in a given gene determines the amino acid sequence of the protein. 3. Alignment: Representation of two or more protein or nucleotide sequences where homologous amino acids or nucleotides are in the same columns while missing amino acids or nucleotides replaced with gaps. 3.1. Pair wise Alignment: Pairwise alignment, in which only two sequences are compared. Two sequences can be compared either by global alignment or local alignment. In global alignment the sequences are stretched over the entire length to get the maximum number of matches and minimum number of gaps. In local alignment, the alignment is restricted or stopped at the region, which is having the number of matches of similarity. Local alignment uses Smith and Waterman algorithms and Global alignment uses Needleman and Wunsch algorithms. The best alignment is chosen by the alignment having maximum score, which is obtained for matches and negative scores for gaps and mismatches. Pairwise alignment is used to find the function of unknown genes or proteins by finding similar sequences of known function. Comparing the unknown sequence with that of the whole nucleic acid or protein databases does this. Some database search tools like BLAST, FASTA are some of the programs which do intensive pairwise alignment of our query sequence to all the database sequence entries and gives out the sequences with best scores.
  • 7. 3.2. Multiple Alignment : Multiple alignment , in which more than two sequences are compared, is used for finding conserved regions among gene sequences and protein sequences, to study phylogenetic relationship of macromolecular sequences i.e., to find evolutionarily related organisms. The major multiple alignment software are clustalW, clustalX and Tcofee. ClustalW: It is a general purpose multiple sequence alignments program for DNA or proteins sequences. It gives biologically meaningful multiple sequence alignments of divergent sequences and calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Cladograms or Phylograms obtained is used to see the evolutionary relationships between species. This can be either downloaded are used online at http://www.ebi.ac.uk/clustalW/. ClustalX is the X-window based user-friendly version of clustalW, which can be downloaded and used locally on our machine. Tcofee is more accurate than clustalW for sequences with less than 30% identity, but it is slower. http://www.ch.embnet.org/software/TCoffee.html Basic Local Alignment Search Tool (BLAST): BLAST is the heuristic search algorithm for sequence similarity searching – for example to identify homologs to a query sequence. If a particular sequence is submitted to BLAST program, it searches with the whole database sequences of users’ choice and in the result produces those sequences that are showing percent identity of more than a particular threshold value. The threshold value is set depending on user choice. BLASTing Protein sequences: BLASTing protein sequences is what we want to do if we already have a protein sequence and we want to find other similar protein sequences in a sequence database. Two flavors of BLAST that exist and deal with proteins are blastp : Compares a protein sequence with a protein database. tblastn : Compares a protein sequence with a nucleotide database. FASTA: FASTA is the first widely used program for database similarity searching. For nucleotide searches, FastA may be more sensitive than BLAST. FastA can be very specific when identifying long regions of low similarity especially for highly diverged sequences. FastA submission form can be obtained at http://www.ebi.ac.uk/fasta33/
  • 8. 4. Phylogenetic Analysis: Phylogenetic methods are used to reconstruct the relationships between macromolecular sequences finding the genetic connections and relationships between species. The results of phylogenetic analysis may be depicted as a hierarchical branching diagram, a ‘cladogram’ or ‘phylogenetic tree’. Programs for Phylogenetic analysis are available at http://evolution.genetics.washington.edu/phylip.html. This software can be downloaded free of cost and used locally or it can be used online at http://bioportal.bic.nus.edu.sg/phylip/. Tree view and phylodraw are the major user – friendly software to show the hierarchical clustering in different formats used for publishing and easy analyzing. Other than this phylip software there are other software like PAUP, Mega, TreeconW and Winboot popular for phylogenetic analysis. 5. Applications of Bioinformatics 5.1. Food Industry: Functional genomics is playing a major role in food biotechnology industry. The complete genome sequence information available in different databases generates information that can be used for finding metabolic pathways, various digestive enzymes, improving cell factories and development of novel presentation methods. The information about the various microbes, which assist in food digestion like E.coli, also plays a vital role in the major achievements of the food industry using Bioinformatics. 5.2. Agriculture: Crops are improved by producing plants that have disease resistant genes to pathogens like fungui and bacteria. Homology searches, finding conserved motifs, and molecular modeling is useful in identifying disease resistant genes. Pesticides and insecticides that can efficiently kill the pathogens and pests are designed by molecular modeling. 5.3. Pharmaceutical industry and Medical science: Bioinformatics, computational biology and cheminformatics are playing a key role in pharmaceutical industry to design new drug targets from genomic data at a very faster rate. Disease causing genes are identified using the tools of genomics and proteomics. Drug lead identification and drug optimization became easy using the tools of genomics and proteomics. Not only drugs, pharmaceutical industry is using the sequence information in the production of vaccines and therapeutic proteins. The processes of designing a new drug using bioinformatics
  • 9. tools has been of great help in identifying Target Disease, interesting lead compounds, and by docking studies finding the effective interaction between the drug and the compound. Pharmacoinformatics is the area of Medical Informatics concerned with modeling and simulation of the behavior of drugs, and control of such behavior by individualized dosage regimens for each patient to achieve explicitly chosen therapeutic goals. The credibility of serum concentration data is a major factor in such modeling. Medical informatics is a scientific discipline, which is concerned with the systematic processing of data, information and knowledge in medicine and health care. Computerization of the patient record is expected to resolve long – standing problems with the current paper – based system. 6. Bioinformatics in India In India there are various research and development units, centers and sub centers, pharmaceuticals industries doing research on various aspects of bioinformatics like proteomics, genomics, developing sequence analysis tools, molecular modeling, drug designing etc. Department of Biotechnology(DBT), New Delhi have emphasized on starting Bioinformatics centers with the help of BTISnet (Biotechnology Information System) for the proper application of Bioinformatics in various sectors of science and technology for the benefit of researchers. DBT has sponsored various Bioinformatics Distributed Information Centers (DICs) and Distributed Information sub Centers (Sub – DICs) all over India. The list of the DICs and the Sub DICs can be seen in the following websites. http://dbtindia.nic.in/btis/dic.html http://dbtindia.nic.in/bits/subdic.html References: 1. Bioinformatics – A Beginner’s Guide by Jean - Michel Claverie, PhD & Cedric Notredame, PhD 2. Introduction to Bioinformatics by Arthu