SlideShare uma empresa Scribd logo
1 de 17
DATABASE TECHNOLOGIES
IN BIOINFORMATICS
GLEB SKLYR
THE PROBLEM
• Bioinformatics research produces highly irregular and
unstructured data
• Example: gene EGFR
THE PROBLEM
• New emerging technologies allow data to be generated quicker, cheaper, and in
larger quantities
• Example:
Gebelhoff, Robert. "Sequencing the genome creates so much data we don’t know what to do with it." The Washington Post. WP Company, 07 July 2015.
Web. 01 May 2017.
THE PROBLEM
• Bioinformatics data is generated globally and is stored and
processed in multiple site around the world. Each research
center and university have their own data storage solutions and
many different centralized repositories exist
• Examples:
THE PROBLEM
• Additionally, data analysis algorithms are complex
• Examples:
- Global alignment used by BLAST O(NM)
- Multiple Sequence Alignment O(2 𝑁 𝐿 𝑁)
…Most algorithms use heuristic approaches
MOTIVATION
• Understand the “secret of life”. How biology works
• Replicate biological processes
• Cure disease
• Much more
MOTIVATION
• Every paper repeats the 3 points: data is unstructured,
scattered, and growing fast (“data tsunami”)
• This field has a lot of problems that individual companies do
not have and make it unique
• What solutions exist? What solutions are proposed?
• As a database administrator/designer how can you alleviate the
hard work that goes into bioinformatics?
EXISTING WORK – XML IN RDBMS
EXISTING WORK –
ORACLE RDBMS
• Offer XML data type
• Have data mining libraries
• Continuously working to
adapt to standards in
industry
• ACID – Atomicity,
Consistency, Isolation,
Durability
PROBLEM
• Relational databases are constrained by schema and
relationships – all columns are same in a table, foreign key
constraints
• Performance is degraded with increasing schema complexity,
data volume and data distribution
SOLUTION – NOSQL SYSTEMS
• Are not restricted by schema or relationships
• Designed with performance in mind
• Designed with data distribution in mind
• Highly scalable
SOLUTIONS – MONGODB
UNSTRUCTURED DATA
SOLUTIONS – CASSANDRA
FOR COMPUTATIONALLY INTENSIVE DATA
CASE STUDY - BIGNASIM
CONCLUSION
• NoSQL technologies are the future of bioinformatics
• In a field of unstructured, distributed, and rapidly growing
data, it is important to be able to pick the right system for your
application
BIBLIOGRAPHY
• Blackwell, Bruce, and Siva Ravada. "Oracle's technology for bioinformatics and future directions." ACM Digital
Library. Australian Computer Society, Inc., n.d. Web. 03 May 2017.
• Alger, Abdullah. "Redis and MongoDB in the biomedical domain." Compose Articles. Compose Articles, 03 Feb.
2017. Web. 03 May 2017.
• Aniceto, Rodrigo, Rene Xavier, Maristela Holanda, Maria Emilia Walter, and Sergio Lifschitz. "Genomic data
persistency on a NoSQL database system." 2014 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (2014): n. pag. Web.
• Gebelhoff, Robert. "Sequencing the genome creates so much data we don’t know what to do with it." The
Washington Post. WP Company, 07 July 2015. Web. 01 May 2017.
• Guimaraes, Valeria, Fernanda Hondo, Rodrigo Almeida, Harley Vera, Maristela Holanda, Aleteia Araujo, Maria
Emilia Walter, and Sergio Lifschitz. "A study of genomic data provenance in NoSQL document-oriented database
systems." 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015): n. pag. Web.
• Hospital, Adam, Pau Andrio, Cesare Cugnasco, Laia Codo, Yolanda Becerra, Pablo D. Dans, Federica Battistini,
Jordi Torres, Ramón Goñi, Modesto Orozco, and Josep Ll. Gelpí. "BIGNASim: a NoSQL database structure and
analysis portal for nucleic acids simulation data." Nucleic Acids Research 44.D1 (2015): n. pag. Web.
• Lima, Iasmini, Matheus Oliveira, Diego Kieckbusch, Maristela Holanda, Maria Emilia M. T. Walter, Aleteia Araujo,
Marcio Victorino, Waldeyr M. C. Silva, and Sergio Lifschitz. "An evaluation of data replication for bioinformatics
workflows on NoSQL systems." 2016 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (2016): n. pag. Web.
• Stromback, Lena, and Juliana Freire. "XML Management for Bioinformatics Applications." Computing in Science &
Engineering 13.5 (2011): 12-23. Web.
QUESTIO
NS

Mais conteúdo relacionado

Mais procurados

B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsRai University
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final PresentationShruthi Choudary
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsBivek Rai
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformaticsKamlesh Patade
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsMakarand Bhale
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsAmna Jalil
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureRobert Cormia
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 

Mais procurados (20)

B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final Presentation
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
Biological Database
Biological DatabaseBiological Database
Biological Database
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Bioinformatics Software
Bioinformatics SoftwareBioinformatics Software
Bioinformatics Software
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 

Semelhante a Database Technologies for Storing Bioinformatics Data

Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivanDan Sullivan, Ph.D.
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?Maryann Martone
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Standards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminologyStandards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminologySilje Ljosland Bakke
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsDereck Downing
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Besnik Fetahu
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowBarry Smith
 

Semelhante a Database Technologies for Storing Bioinformatics Data (20)

Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Big Data
Big Data Big Data
Big Data
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
Standards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminologyStandards in health informatics - problem, clinical models and terminology
Standards in health informatics - problem, clinical models and terminology
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In Bioinformatics
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
 

Último

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Último (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

Database Technologies for Storing Bioinformatics Data

  • 2. THE PROBLEM • Bioinformatics research produces highly irregular and unstructured data • Example: gene EGFR
  • 3. THE PROBLEM • New emerging technologies allow data to be generated quicker, cheaper, and in larger quantities • Example: Gebelhoff, Robert. "Sequencing the genome creates so much data we don’t know what to do with it." The Washington Post. WP Company, 07 July 2015. Web. 01 May 2017.
  • 4. THE PROBLEM • Bioinformatics data is generated globally and is stored and processed in multiple site around the world. Each research center and university have their own data storage solutions and many different centralized repositories exist • Examples:
  • 5. THE PROBLEM • Additionally, data analysis algorithms are complex • Examples: - Global alignment used by BLAST O(NM) - Multiple Sequence Alignment O(2 𝑁 𝐿 𝑁) …Most algorithms use heuristic approaches
  • 6. MOTIVATION • Understand the “secret of life”. How biology works • Replicate biological processes • Cure disease • Much more
  • 7. MOTIVATION • Every paper repeats the 3 points: data is unstructured, scattered, and growing fast (“data tsunami”) • This field has a lot of problems that individual companies do not have and make it unique • What solutions exist? What solutions are proposed? • As a database administrator/designer how can you alleviate the hard work that goes into bioinformatics?
  • 8. EXISTING WORK – XML IN RDBMS
  • 9. EXISTING WORK – ORACLE RDBMS • Offer XML data type • Have data mining libraries • Continuously working to adapt to standards in industry • ACID – Atomicity, Consistency, Isolation, Durability
  • 10. PROBLEM • Relational databases are constrained by schema and relationships – all columns are same in a table, foreign key constraints • Performance is degraded with increasing schema complexity, data volume and data distribution
  • 11. SOLUTION – NOSQL SYSTEMS • Are not restricted by schema or relationships • Designed with performance in mind • Designed with data distribution in mind • Highly scalable
  • 13. SOLUTIONS – CASSANDRA FOR COMPUTATIONALLY INTENSIVE DATA
  • 14. CASE STUDY - BIGNASIM
  • 15. CONCLUSION • NoSQL technologies are the future of bioinformatics • In a field of unstructured, distributed, and rapidly growing data, it is important to be able to pick the right system for your application
  • 16. BIBLIOGRAPHY • Blackwell, Bruce, and Siva Ravada. "Oracle's technology for bioinformatics and future directions." ACM Digital Library. Australian Computer Society, Inc., n.d. Web. 03 May 2017. • Alger, Abdullah. "Redis and MongoDB in the biomedical domain." Compose Articles. Compose Articles, 03 Feb. 2017. Web. 03 May 2017. • Aniceto, Rodrigo, Rene Xavier, Maristela Holanda, Maria Emilia Walter, and Sergio Lifschitz. "Genomic data persistency on a NoSQL database system." 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2014): n. pag. Web. • Gebelhoff, Robert. "Sequencing the genome creates so much data we don’t know what to do with it." The Washington Post. WP Company, 07 July 2015. Web. 01 May 2017. • Guimaraes, Valeria, Fernanda Hondo, Rodrigo Almeida, Harley Vera, Maristela Holanda, Aleteia Araujo, Maria Emilia Walter, and Sergio Lifschitz. "A study of genomic data provenance in NoSQL document-oriented database systems." 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015): n. pag. Web. • Hospital, Adam, Pau Andrio, Cesare Cugnasco, Laia Codo, Yolanda Becerra, Pablo D. Dans, Federica Battistini, Jordi Torres, Ramón Goñi, Modesto Orozco, and Josep Ll. Gelpí. "BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data." Nucleic Acids Research 44.D1 (2015): n. pag. Web. • Lima, Iasmini, Matheus Oliveira, Diego Kieckbusch, Maristela Holanda, Maria Emilia M. T. Walter, Aleteia Araujo, Marcio Victorino, Waldeyr M. C. Silva, and Sergio Lifschitz. "An evaluation of data replication for bioinformatics workflows on NoSQL systems." 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2016): n. pag. Web. • Stromback, Lena, and Juliana Freire. "XML Management for Bioinformatics Applications." Computing in Science & Engineering 13.5 (2011): 12-23. Web.