SlideShare uma empresa Scribd logo
1 de 3
Data mining in Bioinformatics:
Data Mining is the process of automatic discovery of novel and understandable models and
patterns from large amounts of data involving methods at the intersection of machine learning,
statistics and database systems. Bioinformatics is the science of storing, analyzing, and utilizing
information from biological data such as sequences, molecules, gene expressions, and
pathways. Development of novel data mining methods will play a fundamental role in
understanding these rapidly expanding sources of biological data.
Data mining is an interdisciplinary subfield of computer science and statistics with an overall
goal to extract information from a large set of data and transform the information into a
comprehensible structure for further use. Data mining is the analysis step of the "knowledge
discovery in databases" process or KDD (Fig.1). Aside from the raw analysis step, it also
involves database and data management aspects, data pre-processing, model and inference
considerations, interestingness metrics, complexity considerations, post-processing of
discovered structures, visualization, and online updating.
Fig. 1: The process of KDD and the steps involved.
Data mining approaches seem ideally suited in the field of bioinformatics with enormous
volumes of data deposited at every second. The extensive databases of biological information
create both challenges and opportunities for developing novel data mining methods. Every
year, workshop on Data Mining in Bioinformatics (BIOKDD) is held since 2001 with a goal
to encourage the KDD researchers worldwide to take on the numerous challenges that
Bioinformatics offers.
The difference between data analysis and data mining is that data analysis is used to test models
and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign,
regardless of the amount of data; in contrast, data mining uses machine learning and statistical
models to uncover clandestine or hidden patterns in a large volume of data.
BOTMT:604
Bioinformatics and Biophysics
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
Data mining Tools in Bioinformatics:
Various tools for data mining are used in bioinformatics. The following are the tools for
nucleotide sequence analysis:
1. BLAST:
The Basic Local Alignment Search Tool (BLAST) for comparing gene and protein sequences
against others in public databases, now comes in several types including PSI-BLAST, PHI-
BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human,
microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins,
and tentative human consensus sequences.
2. Electronic PCR:
This tool allows to search the target DNA sequence for sequence tagged sites (STSs) that have
been used as landmarks in various types of genomic maps. It compares the query sequence
against data in NCBI’s UniSTS, a unified, non-redundant view of STSs from a wide range of
sources.
3. Entrez:
The Entrez is Global Query Cross-Database Search System is a federated search engine, or
web portal that allows users to search many discrete health sciences databases at the National
Center for Biotechnology Information (NCBI) website. The name "Entrez" (meaning "Come
in" in French) was chosen to reflect the spirit of welcoming the public to search the content
available from the National Library of Medicine (NLM).
Entrez Global Query is an integrated search and retrieval system that provides access to all
databases simultaneously with a single query string and user interface. Entrez can efficiently
retrieve related sequences, structures, and references. The Entrez system can provide views of
gene and protein sequences and chromosome maps. Some textbooks are also available online
through the Entrez system. Entrez searches the databases such as PubMed, PubMed Central,
Site Search, online Books, Online Mendelian Inheritance in Man (OMIM), Nucleotide
sequence database (GenBank), Protein sequence database, Genome Project, UniGene, NLM
Catalog, etc.
Each Entrez Gene record encapsulates a wide range of information for a given gene and
organism. When possible, the information includes results of analyses that have been done on
the sequence data. The amount and type of information presented depend on what is available
for a particular gene and organism and includes:
(1) graphic summary of the genomic context, intron/exon structure, and flanking genes
(2) link to a graphic view of the mRNA sequence, which in turn shows biological features such
as CDS, SNPs, etc.
(3) links to gene ontology and phenotypic information
(4) links to corresponding protein sequence data and conserved domains
(5) links to related resources, such as mutation databases. Entrez Gene is a successor to
LocusLink.
4. Model Maker:
It allows to view the evidence (mRNAs, ESTs, and gene predictions) that was aligned to
assembled genomic sequence to build a gene model and to edit the model by selecting or
BOTMT:604
Bioinformatics and Biophysics
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
removing putative exons. Model Maker is accessible from sequence maps that were analyzed
at NCBI and displayed in Map Viewer.
5. ORF (Open Reading Frame) Finder:
ORF Finder identifies all possible ORFs in a DNA sequence by locating the standard and
alternative stop and start codons. The deduced amino acid sequences can then be used to
BLAST against GenBank. ORF finder is also packaged in the sequence submission software
Sequin.
6. SAGEMAP:
It is a tool for performing statistical tests designed specifically for differential-type analyses of
SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated
by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP),
which have been submitted to Gene Expression Omnibus (GEO). Gene expression profiles that
compare the expression in different SAGE libraries are also available on the Entrez GEO
Profiles pages. It is possible to enter a query sequence in the SAGEmap resource to determine
what SAGE tags are in the sequence, then map to associated SAGEtag records and view the
expression of those tags in different CGAP SAGE libraries.
7. Spidey:
It aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to
determine the exon/intron structure, returning one or more models of the genomic structure,
including the genomic/mRNA alignments for each exon.
8. VecScreen:
It is a tool for identifying segments of a nucleic acid sequence that may be of vector, linker, or
adapter origin prior to sequence analysis or submission. VecScreen was developed to combat
the problem of vector contamination in public sequence databases.
BOTMT:604
Bioinformatics and Biophysics
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.

Mais conteúdo relacionado

Mais procurados

Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPuneet Kulyana
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentationRida Khalid
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBIgeetikaJethra
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fastaALLIENU
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentSaramita De Chakravarti
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMariya Raju
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjKAUSHAL SAHU
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 

Mais procurados (20)

Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
BLAST
BLASTBLAST
BLAST
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Cath
CathCath
Cath
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 

Semelhante a Bioinformatics data mining

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
database retrival.pdf
database retrival.pdfdatabase retrival.pdf
database retrival.pdfSrimathideviJ
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 

Semelhante a Bioinformatics data mining (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
database retrival.pdf
database retrival.pdfdatabase retrival.pdf
database retrival.pdf
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
D1803012022
D1803012022D1803012022
D1803012022
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
B.3.5
B.3.5B.3.5
B.3.5
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Article
ArticleArticle
Article
 

Mais de Sangeeta Das

Human Impact on Forests.pptx
Human Impact on Forests.pptxHuman Impact on Forests.pptx
Human Impact on Forests.pptxSangeeta Das
 
Women in NE India-A Holistic Approach
Women in NE India-A Holistic ApproachWomen in NE India-A Holistic Approach
Women in NE India-A Holistic ApproachSangeeta Das
 
Can organic feed the world
Can organic feed the worldCan organic feed the world
Can organic feed the worldSangeeta Das
 
Evolution of sporophyte in bryotphytes
Evolution of sporophyte in bryotphytesEvolution of sporophyte in bryotphytes
Evolution of sporophyte in bryotphytesSangeeta Das
 
Herbarium Techniques
Herbarium TechniquesHerbarium Techniques
Herbarium TechniquesSangeeta Das
 
Numerical taxonomy_Plant Taxonomy
Numerical taxonomy_Plant TaxonomyNumerical taxonomy_Plant Taxonomy
Numerical taxonomy_Plant TaxonomySangeeta Das
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisSangeeta Das
 
Chemotaxonomy-Plant Taxonomy
Chemotaxonomy-Plant TaxonomyChemotaxonomy-Plant Taxonomy
Chemotaxonomy-Plant TaxonomySangeeta Das
 
Cytotaxonomy plant taxonomy
Cytotaxonomy plant taxonomyCytotaxonomy plant taxonomy
Cytotaxonomy plant taxonomySangeeta Das
 
Rosaceae family-Plant Taxonomy
Rosaceae family-Plant TaxonomyRosaceae family-Plant Taxonomy
Rosaceae family-Plant TaxonomySangeeta Das
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
Documentation in plant taxonomy
Documentation in plant taxonomyDocumentation in plant taxonomy
Documentation in plant taxonomySangeeta Das
 
Aims and objectives of plant taxonomy
Aims and objectives of plant taxonomyAims and objectives of plant taxonomy
Aims and objectives of plant taxonomySangeeta Das
 
History and development of plant taxonomy
History and development of plant taxonomyHistory and development of plant taxonomy
History and development of plant taxonomySangeeta Das
 

Mais de Sangeeta Das (20)

Cyanophyta
CyanophytaCyanophyta
Cyanophyta
 
Human Impact on Forests.pptx
Human Impact on Forests.pptxHuman Impact on Forests.pptx
Human Impact on Forests.pptx
 
Women in NE India-A Holistic Approach
Women in NE India-A Holistic ApproachWomen in NE India-A Holistic Approach
Women in NE India-A Holistic Approach
 
Can organic feed the world
Can organic feed the worldCan organic feed the world
Can organic feed the world
 
Chlamydomonas
ChlamydomonasChlamydomonas
Chlamydomonas
 
Evolution of sporophyte in bryotphytes
Evolution of sporophyte in bryotphytesEvolution of sporophyte in bryotphytes
Evolution of sporophyte in bryotphytes
 
Botanical garden
Botanical gardenBotanical garden
Botanical garden
 
Herbarium Techniques
Herbarium TechniquesHerbarium Techniques
Herbarium Techniques
 
Numerical taxonomy_Plant Taxonomy
Numerical taxonomy_Plant TaxonomyNumerical taxonomy_Plant Taxonomy
Numerical taxonomy_Plant Taxonomy
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
Chemotaxonomy-Plant Taxonomy
Chemotaxonomy-Plant TaxonomyChemotaxonomy-Plant Taxonomy
Chemotaxonomy-Plant Taxonomy
 
Cytotaxonomy plant taxonomy
Cytotaxonomy plant taxonomyCytotaxonomy plant taxonomy
Cytotaxonomy plant taxonomy
 
Rosaceae family-Plant Taxonomy
Rosaceae family-Plant TaxonomyRosaceae family-Plant Taxonomy
Rosaceae family-Plant Taxonomy
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Cytokinin
CytokininCytokinin
Cytokinin
 
Documentation in plant taxonomy
Documentation in plant taxonomyDocumentation in plant taxonomy
Documentation in plant taxonomy
 
Aims and objectives of plant taxonomy
Aims and objectives of plant taxonomyAims and objectives of plant taxonomy
Aims and objectives of plant taxonomy
 
History and development of plant taxonomy
History and development of plant taxonomyHistory and development of plant taxonomy
History and development of plant taxonomy
 
Archegoniates
ArchegoniatesArchegoniates
Archegoniates
 
Pellia
PelliaPellia
Pellia
 

Último

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Bioinformatics data mining

  • 1. Data mining in Bioinformatics: Data Mining is the process of automatic discovery of novel and understandable models and patterns from large amounts of data involving methods at the intersection of machine learning, statistics and database systems. Bioinformatics is the science of storing, analyzing, and utilizing information from biological data such as sequences, molecules, gene expressions, and pathways. Development of novel data mining methods will play a fundamental role in understanding these rapidly expanding sources of biological data. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a large set of data and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process or KDD (Fig.1). Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Fig. 1: The process of KDD and the steps involved. Data mining approaches seem ideally suited in the field of bioinformatics with enormous volumes of data deposited at every second. The extensive databases of biological information create both challenges and opportunities for developing novel data mining methods. Every year, workshop on Data Mining in Bioinformatics (BIOKDD) is held since 2001 with a goal to encourage the KDD researchers worldwide to take on the numerous challenges that Bioinformatics offers. The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data. BOTMT:604 Bioinformatics and Biophysics Prepared By- Dr. Sangeeta Das. Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
  • 2. Data mining Tools in Bioinformatics: Various tools for data mining are used in bioinformatics. The following are the tools for nucleotide sequence analysis: 1. BLAST: The Basic Local Alignment Search Tool (BLAST) for comparing gene and protein sequences against others in public databases, now comes in several types including PSI-BLAST, PHI- BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences. 2. Electronic PCR: This tool allows to search the target DNA sequence for sequence tagged sites (STSs) that have been used as landmarks in various types of genomic maps. It compares the query sequence against data in NCBI’s UniSTS, a unified, non-redundant view of STSs from a wide range of sources. 3. Entrez: The Entrez is Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The name "Entrez" (meaning "Come in" in French) was chosen to reflect the spirit of welcoming the public to search the content available from the National Library of Medicine (NLM). Entrez Global Query is an integrated search and retrieval system that provides access to all databases simultaneously with a single query string and user interface. Entrez can efficiently retrieve related sequences, structures, and references. The Entrez system can provide views of gene and protein sequences and chromosome maps. Some textbooks are also available online through the Entrez system. Entrez searches the databases such as PubMed, PubMed Central, Site Search, online Books, Online Mendelian Inheritance in Man (OMIM), Nucleotide sequence database (GenBank), Protein sequence database, Genome Project, UniGene, NLM Catalog, etc. Each Entrez Gene record encapsulates a wide range of information for a given gene and organism. When possible, the information includes results of analyses that have been done on the sequence data. The amount and type of information presented depend on what is available for a particular gene and organism and includes: (1) graphic summary of the genomic context, intron/exon structure, and flanking genes (2) link to a graphic view of the mRNA sequence, which in turn shows biological features such as CDS, SNPs, etc. (3) links to gene ontology and phenotypic information (4) links to corresponding protein sequence data and conserved domains (5) links to related resources, such as mutation databases. Entrez Gene is a successor to LocusLink. 4. Model Maker: It allows to view the evidence (mRNAs, ESTs, and gene predictions) that was aligned to assembled genomic sequence to build a gene model and to edit the model by selecting or BOTMT:604 Bioinformatics and Biophysics Prepared By- Dr. Sangeeta Das. Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
  • 3. removing putative exons. Model Maker is accessible from sequence maps that were analyzed at NCBI and displayed in Map Viewer. 5. ORF (Open Reading Frame) Finder: ORF Finder identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin. 6. SAGEMAP: It is a tool for performing statistical tests designed specifically for differential-type analyses of SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP), which have been submitted to Gene Expression Omnibus (GEO). Gene expression profiles that compare the expression in different SAGE libraries are also available on the Entrez GEO Profiles pages. It is possible to enter a query sequence in the SAGEmap resource to determine what SAGE tags are in the sequence, then map to associated SAGEtag records and view the expression of those tags in different CGAP SAGE libraries. 7. Spidey: It aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon. 8. VecScreen: It is a tool for identifying segments of a nucleic acid sequence that may be of vector, linker, or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases. BOTMT:604 Bioinformatics and Biophysics Prepared By- Dr. Sangeeta Das. Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.