3. THE PROBLEM
• New emerging technologies allow data to be generated quicker, cheaper, and in
larger quantities
• Example:
Gebelhoff, Robert. "Sequencing the genome creates so much data we don’t know what to do with it." The Washington Post. WP Company, 07 July 2015.
Web. 01 May 2017.
4. THE PROBLEM
• Bioinformatics data is generated globally and is stored and
processed in multiple site around the world. Each research
center and university have their own data storage solutions and
many different centralized repositories exist
• Examples:
5. THE PROBLEM
• Additionally, data analysis algorithms are complex
• Examples:
- Global alignment used by BLAST O(NM)
- Multiple Sequence Alignment O(2 𝑁 𝐿 𝑁)
…Most algorithms use heuristic approaches
6. MOTIVATION
• Understand the “secret of life”. How biology works
• Replicate biological processes
• Cure disease
• Much more
7. MOTIVATION
• Every paper repeats the 3 points: data is unstructured,
scattered, and growing fast (“data tsunami”)
• This field has a lot of problems that individual companies do
not have and make it unique
• What solutions exist? What solutions are proposed?
• As a database administrator/designer how can you alleviate the
hard work that goes into bioinformatics?
9. EXISTING WORK –
ORACLE RDBMS
• Offer XML data type
• Have data mining libraries
• Continuously working to
adapt to standards in
industry
• ACID – Atomicity,
Consistency, Isolation,
Durability
10. PROBLEM
• Relational databases are constrained by schema and
relationships – all columns are same in a table, foreign key
constraints
• Performance is degraded with increasing schema complexity,
data volume and data distribution
11. SOLUTION – NOSQL SYSTEMS
• Are not restricted by schema or relationships
• Designed with performance in mind
• Designed with data distribution in mind
• Highly scalable
15. CONCLUSION
• NoSQL technologies are the future of bioinformatics
• In a field of unstructured, distributed, and rapidly growing
data, it is important to be able to pick the right system for your
application
16. BIBLIOGRAPHY
• Blackwell, Bruce, and Siva Ravada. "Oracle's technology for bioinformatics and future directions." ACM Digital
Library. Australian Computer Society, Inc., n.d. Web. 03 May 2017.
• Alger, Abdullah. "Redis and MongoDB in the biomedical domain." Compose Articles. Compose Articles, 03 Feb.
2017. Web. 03 May 2017.
• Aniceto, Rodrigo, Rene Xavier, Maristela Holanda, Maria Emilia Walter, and Sergio Lifschitz. "Genomic data
persistency on a NoSQL database system." 2014 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (2014): n. pag. Web.
• Gebelhoff, Robert. "Sequencing the genome creates so much data we don’t know what to do with it." The
Washington Post. WP Company, 07 July 2015. Web. 01 May 2017.
• Guimaraes, Valeria, Fernanda Hondo, Rodrigo Almeida, Harley Vera, Maristela Holanda, Aleteia Araujo, Maria
Emilia Walter, and Sergio Lifschitz. "A study of genomic data provenance in NoSQL document-oriented database
systems." 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015): n. pag. Web.
• Hospital, Adam, Pau Andrio, Cesare Cugnasco, Laia Codo, Yolanda Becerra, Pablo D. Dans, Federica Battistini,
Jordi Torres, Ramón Goñi, Modesto Orozco, and Josep Ll. Gelpí. "BIGNASim: a NoSQL database structure and
analysis portal for nucleic acids simulation data." Nucleic Acids Research 44.D1 (2015): n. pag. Web.
• Lima, Iasmini, Matheus Oliveira, Diego Kieckbusch, Maristela Holanda, Maria Emilia M. T. Walter, Aleteia Araujo,
Marcio Victorino, Waldeyr M. C. Silva, and Sergio Lifschitz. "An evaluation of data replication for bioinformatics
workflows on NoSQL systems." 2016 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (2016): n. pag. Web.
• Stromback, Lena, and Juliana Freire. "XML Management for Bioinformatics Applications." Computing in Science &
Engineering 13.5 (2011): 12-23. Web.