A High Throughput Bioinformatics Distributed Computing Platform

INSTITUTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF DHAKA
A High-Throughput
Bioinformatics Distributed
Computing Platform

19-09-2012 1

A high-throughput bioinformatics distributed computing platform

Presented by-

Md. Habibur Rahman
BIT 0216
Institute of Information Technology
University of Dhaka
Bangladesh

19-09-2012 2


The contributors of the paper

Thomas M. Keane, Andrew J. Page, James O. McInerney,
and Thomas J. Naughton

Bioinformatics and Pharmacogenomics Laboratory,
National University of Ireland, Maynooth, Co. Kildare,
Ireland

Department of Computer Science, National University of
Ireland, Maynooth, Co. Kildare, Ireland

Homepage: http://www.cs.nuim.ie/distibuted

19-09-2012 3


Publications

18th IEEE Symposium on Computer-
Based Medical System (CBMS’05)

19-09-2012 4


Suitability of Bioinformatics to

Distributed Computing

A Class of Algorithmic Parallelism
referred to as coarse-grained parallelism.
 High compute-to-data ratio.

19-09-2012 5


Topic and Problem Overview

 Demand for high performance computing has increased
dramatically in the area of bioinformatics due to rapid
increase in the size of genomic databases.
 Traditional database search algorithm was not feasible to
perform full search of a large database in a reasonable
time.
 Feasibility of heuristic algorithm but reduction of
sensitivity of search.
 Evolutionary biology, phylogenetic tree and greedy
heuristic algorithm.

19-09-2012 6


Proposed solution
o According to the writers of the paper---

“We present a general-
purpose programmable distributed computing platform
suitable for deployment in a typical university environment
where many semi-idle desktop PC’s are connected via a
network”
 The system is fully cross-platform.
 Two distributed bioinformatics applications:
i) DSEARCH
ii) DPRml
19-09-2012 7


Proposed solution(cont.)

o Java Distributed Computing platform

- Client Server model

- Server controls the resources (database, algorithm
or computer hardware)

- The model is divided into three separate pieces of
software: server, client and remote interface.

19-09-2012 8



Fig: Diagram of the complete system
19-09-2012 9



 Installation and Deployment
- Consists of three executable JAR files corresponding to
the server, client and remote interface.
- Run the client as a low priority background service.
- Hardware specification: At least Pentium IV processor
- OS compatibility: Windows, Sun Solaris, Mac OSX and
Linux.

19-09-2012 10


DPRml
- Distributed Phylogeny Reconstruction by maximum likelihood

Previous situation:
Maximum likelihood evolution is one the most accurate techniques
for reconstructing phylogenies.
Developed parallel ML programs for reconstructing large and
accurate phylogenetic trees.
Implemented in platform specific language

19-09-2012 11


DPRml (cont.)

After the development of distributed computing platform:
One of the most general and powerful likelihood-based phylogenetic
tree building program.
Used proven tree building algorithm and phylogenetic Analysis
Library
Possibility of multiple phylogenetic computation.
Platform independent ML program.

19-09-2012 12


DPRml (cont.)


Speed up Testing:

Fig. Speedup achieved by running 6 simultaneous DPRml problems
using between 1-40 semi-idle processors.
19-09-2012 13


DSEARCH

 Fully cross-platform parallel database search program.
 Operates in a master slave environment.
 Splitting the database into fixed sized units that are subsequently
searched on the donor machines.,

19-09-2012 14


DSEARCH (cont.)

Speed up Testing: Using-

- FASTA database file,
- A FASTA query
sequence file.
- A searching scheme
- A configuration file.

Fig. Speedup achieved by DSEARCH running on
between 1-80 semi-idle processors.

19-09-2012 15


My criticism and future work to do

 No detail description about how the applications works on the
distributed computing platform.
 If we don’t get the spare clock cycle of the semi-idle pc then the
system will not give us the best result.
 Failure of interconnected network of the desktop-pc’s will reduce
the performance.
 To improve and expand the range of bioinformatics applications for
the system.

19-09-2012 16


Conclusion

“There should not have any conclusion of
research work, It is a continual process and it will
be continued for the betterment of the human
being.”

19-09-2012 17


ANY QUESTION?

19-09-2012 18


19-09-2012 19


A High Throughput Bioinformatics Distributed Computing Platform

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (16)

Semelhante a A High Throughput Bioinformatics Distributed Computing Platform

Semelhante a A High Throughput Bioinformatics Distributed Computing Platform (20)

Último

Último (20)

A High Throughput Bioinformatics Distributed Computing Platform