A Research Paper review presentation on "A High throughput bioinformatics distribute computing platform", presented by Md. Habibur Rahman, BIT0216, Institute of Information Technology University of Dhaka.
A High Throughput Bioinformatics Distributed Computing Platform
1. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
A High-Throughput
Bioinformatics Distributed
Computing Platform
19-09-2012 1
A high-throughput bioinformatics distributed computing platform
2. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Presented by-
Md. Habibur Rahman
BIT 0216
Institute of Information Technology
University of Dhaka
Bangladesh
19-09-2012 2
A high-throughput bioinformatics distributed computing platform
3. The contributors of the paper
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Thomas M. Keane, Andrew J. Page, James O. McInerney,
and Thomas J. Naughton
Bioinformatics and Pharmacogenomics Laboratory,
National University of Ireland, Maynooth, Co. Kildare,
Ireland
Department of Computer Science, National University of
Ireland, Maynooth, Co. Kildare, Ireland
Homepage: http://www.cs.nuim.ie/distibuted
19-09-2012 3
A high-throughput bioinformatics distributed computing platform
4. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Publications
18th IEEE Symposium on Computer-
Based Medical System (CBMS’05)
19-09-2012 4
A high-throughput bioinformatics distributed computing platform
5. Suitability of Bioinformatics to
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Distributed Computing
A Class of Algorithmic Parallelism
referred to as coarse-grained parallelism.
High compute-to-data ratio.
19-09-2012 5
A high-throughput bioinformatics distributed computing platform
6. Topic and Problem Overview
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Demand for high performance computing has increased
dramatically in the area of bioinformatics due to rapid
increase in the size of genomic databases.
Traditional database search algorithm was not feasible to
perform full search of a large database in a reasonable
time.
Feasibility of heuristic algorithm but reduction of
sensitivity of search.
Evolutionary biology, phylogenetic tree and greedy
heuristic algorithm.
19-09-2012 6
A high-throughput bioinformatics distributed computing platform
7. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Proposed solution
o According to the writers of the paper---
“We present a general-
purpose programmable distributed computing platform
suitable for deployment in a typical university environment
where many semi-idle desktop PC’s are connected via a
network”
The system is fully cross-platform.
Two distributed bioinformatics applications:
i) DSEARCH
ii) DPRml
19-09-2012 7
A high-throughput bioinformatics distributed computing platform
8. Proposed solution(cont.)
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
o Java Distributed Computing platform
- Client Server model
- Server controls the resources (database, algorithm
or computer hardware)
- The model is divided into three separate pieces of
software: server, client and remote interface.
19-09-2012 8
A high-throughput bioinformatics distributed computing platform
9. Proposed solution(cont.)
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Fig: Diagram of the complete system
19-09-2012 9
A high-throughput bioinformatics distributed computing platform
10. Proposed solution(cont.)
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Installation and Deployment
- Consists of three executable JAR files corresponding to
the server, client and remote interface.
- Run the client as a low priority background service.
- Hardware specification: At least Pentium IV processor
- OS compatibility: Windows, Sun Solaris, Mac OSX and
Linux.
19-09-2012 10
A high-throughput bioinformatics distributed computing platform
11. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
DPRml
- Distributed Phylogeny Reconstruction by maximum likelihood
Previous situation:
Maximum likelihood evolution is one the most accurate techniques
for reconstructing phylogenies.
Developed parallel ML programs for reconstructing large and
accurate phylogenetic trees.
Implemented in platform specific language
19-09-2012 11
A high-throughput bioinformatics distributed computing platform
12. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
DPRml (cont.)
- Distributed Phylogeny Reconstruction by maximum likelihood
After the development of distributed computing platform:
One of the most general and powerful likelihood-based phylogenetic
tree building program.
Used proven tree building algorithm and phylogenetic Analysis
Library
Possibility of multiple phylogenetic computation.
Platform independent ML program.
19-09-2012 12
A high-throughput bioinformatics distributed computing platform
13. DPRml (cont.)
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
- Distributed Phylogeny Reconstruction by maximum likelihood
Speed up Testing:
Fig. Speedup achieved by running 6 simultaneous DPRml problems
using between 1-40 semi-idle processors.
19-09-2012 13
A high-throughput bioinformatics distributed computing platform
14. DSEARCH
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Fully cross-platform parallel database search program.
Operates in a master slave environment.
Splitting the database into fixed sized units that are subsequently
searched on the donor machines.,
19-09-2012 14
A high-throughput bioinformatics distributed computing platform
15. DSEARCH (cont.)
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
Speed up Testing: Using-
- FASTA database file,
- A FASTA query
sequence file.
- A searching scheme
- A configuration file.
Fig. Speedup achieved by DSEARCH running on
between 1-80 semi-idle processors.
19-09-2012 15
A high-throughput bioinformatics distributed computing platform
16. My criticism and future work to do
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
No detail description about how the applications works on the
distributed computing platform.
If we don’t get the spare clock cycle of the semi-idle pc then the
system will not give us the best result.
Failure of interconnected network of the desktop-pc’s will reduce
the performance.
To improve and expand the range of bioinformatics applications for
the system.
19-09-2012 16
A high-throughput bioinformatics distributed computing platform
17. Conclusion
INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
“There should not have any conclusion of
research work, It is a continual process and it will
be continued for the betterment of the human
being.”
19-09-2012 17
A high-throughput bioinformatics distributed computing platform
18. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
ANY QUESTION?
19-09-2012 18
A high-throughput bioinformatics distributed computing platform
19. INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
19-09-2012 19
A high-throughput bioinformatics distributed computing platform