SlideShare a Scribd company logo
1 of 30
Download to read offline
GeneIndex: an open source
parallel program for enumerating
and locating words in a genome

 Huian Li, David Hart, Matthias Mueller, Ulf Markward, Craig Stewart


                            August 3, 2009
                            A    t3
Contents
•   Motivation
•   Serial algorithm
•   Parallel implementation
•   Performance Analysis
•   Conclusion
Motivation

             Question from a Biology professor:
             Given a word length, is th computational
             Gi           dl   th i the         t ti l
             task of scanning a DNA sequence and
             recording th positions of all possible
                  di the      iti     f ll     ibl
             words trivial?

            5    10   15   20   25   30
            *    *    *    *    *    *
   5’   TAGCCGTGGCGGAGCCTCTTGGCTTTGTTTATTC         3’
Serial algorithm
• Straightforward implementation:
     Binary coding for A, C, G, T. For example:
     A:
     A 00
     C: 01
     G: 10
     T: 11

          5         10        15        20
          *         *         *         *
  T A G C C G T G G C G G A G C C T C T T G ...
  110010010110111010011010001001011101111110 ...
Serial algorithm
• Given a sequence and a word length k in order to
                                         k,
  list all possible words, we scan the sequence once
  from left to right
                 g
           5        10        15        20
           *        *         *         *
  T A G C C G T G G C G G A G C C T C T T G       ...
  110010010110111010011010001001011101111110      ...
  T A G C
  11001001
    A G C C
    00100101
      G C C G
      10010110
                                    C T T G       ...
                                    0
                                    011111100     ...
Serial algorithm
           5        10        15        20
           *        *         *         *
  T A G C C G T G G C G G A G C C T C T T G ...
  110010010110111010011010001001011101111110 ...
  T A G C
  11001001
    A G C C
    00100101

  ENCODE( AGCC )
  ENCODE("AGCC") =
  ENCODE("TAGC") & MASK << 2 | ENCODE('C')

  MASK = 4k-1 – 1 = 111111 (in case of k = 4)
Serial algorithm
• This essentially becomes a sorting problem since
                                         problem,
  each word is now converted into an integer.
• Each word is associated with its position
  information: (Encoded Word, Position)
• Sorting has to be stable so that for the same words
                                                  words,
  their positions will be in a certain order.
Serial algorithm
Implementation details:
• Words & positions are stored in a long long
  integer (8 bytes = 64 bits)
• Hash table with a linked list for each entry
• S
  Space required f all words i given sequence i
              i d for ll     d in i             is
  pre-allocated, instead of malloc one by one
• M tl AND OR and SHIFT LEFT operations.
  Mostly AND,         d SHIFT-LEFT           ti
Word frequencies
Word distribution
Motivation for parallel implementation



                  Another question
                  from Biology professor:
                  How about the human genome?



        Fact: Human genome includes
        about 3 billion DNA bases.
Parallel implementation: input
Large dataset input:
• Each process reads its own partition from the input
  file.
  file
• Boundary area between neighboring processes has
  to be considered.
        considered
                agcatgcatgcatcgatcgatcgatgc
                atcgatgcatcgatacgatgcatgcta
                gacgatacgagcatgcatctagcatgc
                      t      t    t t     t
                agtagcatgcatcgatgcattagcatg
                ctagctagcatgctagcatgcatcgat
                g
                gcatgctagcatgctagctagcatgct
                    g   g   g   g     g   g
                atgcatgcatgcatcatgcatcgatcg
                atcgtgcaatgcatgctacgatgcatg
                catcagtcagcatgcatgcatcgatcg
                atgcatcgatcgatgcatgcatgacga
                 t    t   t  t    t     t
                gcaatgatgcagtcatgcatcgacgag
                catcgatcgatgcatgcatgcat
Parallel implementation: load balancing
Computation and load balancing:
1. Each process deals with its own piece of data
2 All processes perform global sorting
2.
        Straightforward implementation: binary tree merge
    X   sorting
       Possible solution but could be problematic
       Ideal solution leading to load balancing
Parallel implementation: load balancing
Straightforward implementation: binary tree merge
  sorting
                                               AAAA:
                                               AAAC:
                                         P0    ...



                                               TTTG:
                                               TTTT:




                      AAAA:
                                                                         AAAA:
                      AAAC:
                                                                         AAAC:
                                                                         AAAC
                P0    ...
                                                                 P2      ...


                      TTTG:
                                                                         TTTG:
                      TTTT:
                                                                         TTTT:




         AAAA: 5, 9                AAAA: 19                 AAAA: 4, 8                AAAA: 35
         AAAC: 22                  AAAC: 12                 AAAC: 67                  AAAC: 46
    P0   ...                  P1   ...                 P2   ...                  P3   ...



         TTTG: 101                 TTTG: 201                TTTG: 88                  TTTG: 40
         TTTT: 80                  TTTT: 26                 TTTT: 53                  TTTT: 30
Parallel implementation: load balancing
Possible solution but could be problematic:
   Straightforward solution: partition word range [0, 4k) equally,
     so each process has
            4 k  i 4 k  i  1 
               ,                  , where i = 0, 1, ..., n-1
             n     n      



    AAAA:               AAAA:         AAAA:             AAAA:
    AAAC:               AAAC:         AAAC:             AAAC:
    CAAC:               CAAC:         CAAC:             CAAC:
    CTTT:               CTTT:         CTTT:             CTTT: `
    GAAA:               GAAA:         GAAA:             GAAA:
    GTTT:               GTTT:         GTTT:             GTTT:
    TTTG:               TTTG:         TTTG:             TTTG:
    TTTT:               TTTT:         TTTT:             TTTT:
Parallel implementation: load balancing
Implementation of the straightforward solution:
• Problem is that some words occur more often than others,
  leading to different memory requests for different p
        g                   y q                      processes



     AAAA: 5, 9     CAAA: 19       GAAA: 4, 8     TAAA: 35, 93
     AAAC: 22, 37   CAAC: 12, 47   GAAC: 67, 72   TAAC: 46
     ...            ...            ...            ...



     ATTG: 101      CTTG: 201      GTTG: 88       TTTG: 40, 87
     ATTT: 80       CTTT: 26       GTTT: 53       TTTT: 15, 30
Parallel implementation: load balancing
Ideal solution leading to load balancing:
•   partition the total number of words L-k+1 equally, so each
    p
    process has (  (L-k+1)/n words, where L is the length of
                          )          ,                  g
    given sequence, k is the given word length.
Implementation:
  p
1. After each process scanned its own piece, we know that:
           4 k 1
                                      L  k 1
           
           x 0
                    f (Wx , Pi ) 
                                         n
                                               , where i = 0 1 ..., n 1
                                                           0, 1,    n-1


                            g [0,4k) into many small divisions
2. We divide the word range [                y
   with total divisions of d (where d>>n):
               4k
                  ( j 1) 1
          d 1 d
                                              L  k 1
          
          j 0
                             f (Wx , Pi ) 
                                                 n
                                                       , where i = 0, 1, ..., n-1
                         4k
                    x      j
                         d
Parallel implementation: load balancing
Implementation:
3. The number of words in each small division as below:
                       4k
                          ( j 1) 1
                       d
         T (i, j )         f (W , P ) x       i       , where i = 0, 1, ..., n-1
                           4k
                         x  j
                           d
                                                            and j = 0, 1, ..., d-1


4. The total number of words in each small division across all
   p
   processes will be:
                       4k
                          ( j 1) 1
                  n 1 d
         T ( j)             f (W , P )   x       i   , where j = 0 1 ..., d-1
                                                                    0, 1,    d 1
                  i 0       4k
                           x  j
                             d
Parallel implementation: load balancing
Implementation:
5. Find an array of boundary B, such that:
           B ( m 1)
                       L  k 1
            mT ( j )  n
           jB( )

    where B(0)= 0,
          B(0<m<n) = any number in (0,d-1),
      and B(n)= d-1

6. Process Pi should have all words in:
      [B(i),
      [B(i) B(i+1)) , where i = 0 1 ..., n-1
                                      0, 1, n 1.
Parallel implementation: load balancing



   AAAA:       AAAA:       AAAA:          AAAA:
   AAAC:       AAAC:       AAAC:          AAAC:
   CAAC:       CAAC:       CAAC:          CAAC:
   CTTT:       CTTT:       CTTT:          CTTT: `
   GAAA:       GAAA:       GAAA:          GAAA:
   GTTT:       GTTT:       GTTT:          GTTT:
   TTTG:       TTTG:       TTTG:          TTTG:
   TTTT:       TTTT:       TTTT:          TTTT:
Parallel implementation: output
Output:
• Each process creates its own output file
• If necessary, all files can concatenate into one
     necessary
  single file, while keeping the order

   AAAA: 5, 9, 10
   AAAC: 22, 37     ...                   ...

   ...                                    TTTG: 15, 30
                                          TTTT: 40, 87
Testbed
Testbed specification
• Consists of 768 IBM JS21 blades
• On each blade:
     • 2 dual core PowerPC CPUs @ 2 5GHz
         dual-core                  2.5GHz
     • 8 GB memory
     • SUSE Linux Enterprise Server 9 (ppc)
• Interconnect: Myrinet
• Parallel environment: MPI
Performance analysis

    Number of    Number of      D. melanogaster       H. sapiens
      nodes      processes         k=6       k=25      k=6       k=25
             1             1        95      23203
             2             2        53       5770
             4             4        29       1500
             8             8        15        386
            16            16          9       107      212     93672
            32            32          6         32     118     24934
            64            64          5         14      73      5998
          128            128          9         11      59      1450
          256            256        11          15      49       558
          512            512        18          22      71       195


  Timings of running against two datasets on BigRed using 1 PPN (SECONDS)
Performance analysis

    Number of    Number of      D. melanogaster       H. sapiens
      nodes      processes         k=6       k=25      k=6       k=25
             1             2        54       5958
             2             4        31       1545
             4             8        17        400
             8            16        11        115
            16            32          8         40     156     25661
            32            64          6         19     101      6233
            64           128          7         12      82      1506
          128            256        25          16      61       576
          256            512        20          27      81       198
          512           1024        34          37     115       170


  Timings of running against two datasets on BigRed using 2 PPN (SECONDS)
Performance analysis

    Number of    Number of      D. melanogaster       H. sapiens
      nodes      processes         k=6       k=25      k=6       k=25
             1             4        37       1738
             2             8        23        453
             4            16        15        130
             8            32        10          46
            16            64          9         23     163      6649
            32           128        11          17     121      1632
            64           256        20          25      96       634
          128            512        39          42     120       233
          256           1024        79          90     180       187
          512           2048       134        131      270       281


  Timings of running against two datasets on BigRed using 4 PPN (SECONDS)
Performance analysis
                             Scalability in terms of node numbers
                40.00

                35.00

                30.00

                25.00
           up
      Speedu



                                                                          1 ppn
                20.00                                                     2 ppn

                15.00                                                     4 ppn
                                                                            pp

                10.00

                 5.00

                 0.00
                        16        32      64     128     256        512
                                       Number of nodes


          Scalability of enumerating 6-mers in H. sapiens
Performance analysis
                          Scalability in terms of node numbers
                600.00


                500.00


                400.00
           up
      Speedu



                                                                       1 ppn
                300.00                                                 2 ppn

                                                                       4 ppn
                                                                         pp
                200.00
                200 00


                100.00


                  0.00
                         16     32      64     128     256       512
                                     Number of nodes


          Scalability of enumerating 25-mers in H. sapiens
Conclusion
• Addressed questions from the biology professor 
• Complicate solution aroused from memory
  restriction.
  restriction
• It can handle words of length up to 30.
• It can find often-repeated words, rarely-occurred , or
          fi d ft         t d      d       l          d
  even non-occurred words.
• It scales relatively well on l
        l      l ti l    ll    large cluster machines.
                                       l t       hi
• We recently developed a Java version for small
  DNA sequences, which was “      “our future work”. It can
                                       f          ”
  zoom in or zoom out to view distribution and
  frequencies interactively.
  f          i i t     ti l
The End




          Thank you

More Related Content

What's hot

Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCSR2011
 
SNM 2007, Washington DC
SNM 2007, Washington DCSNM 2007, Washington DC
SNM 2007, Washington DCgzferl
 
Class3
 Class3 Class3
Class3issbp
 
The H.264 Integer Transform
The H.264 Integer TransformThe H.264 Integer Transform
The H.264 Integer TransformIain Richardson
 
timing-analysis
 timing-analysis timing-analysis
timing-analysisVimal Raj
 
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...SSA KPI
 
Optimum failure truncated testing strategies
Optimum failure truncated testing strategies Optimum failure truncated testing strategies
Optimum failure truncated testing strategies ASQ Reliability Division
 
Class8
 Class8 Class8
Class8issbp
 
Os6 2
Os6 2Os6 2
Os6 2issbp
 
Alba ffs flim 2012
Alba ffs flim 2012Alba ffs flim 2012
Alba ffs flim 2012benbarbieri
 

What's hot (10)

Csr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyenCsr2011 june18 12_00_nguyen
Csr2011 june18 12_00_nguyen
 
SNM 2007, Washington DC
SNM 2007, Washington DCSNM 2007, Washington DC
SNM 2007, Washington DC
 
Class3
 Class3 Class3
Class3
 
The H.264 Integer Transform
The H.264 Integer TransformThe H.264 Integer Transform
The H.264 Integer Transform
 
timing-analysis
 timing-analysis timing-analysis
timing-analysis
 
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
 
Optimum failure truncated testing strategies
Optimum failure truncated testing strategies Optimum failure truncated testing strategies
Optimum failure truncated testing strategies
 
Class8
 Class8 Class8
Class8
 
Os6 2
Os6 2Os6 2
Os6 2
 
Alba ffs flim 2012
Alba ffs flim 2012Alba ffs flim 2012
Alba ffs flim 2012
 

Viewers also liked

Appraisers Direct, Inc.
Appraisers Direct, Inc.Appraisers Direct, Inc.
Appraisers Direct, Inc.ClintCornett
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IUPTIHPA
 
Air Traffic
Air TrafficAir Traffic
Air Traffichumair73
 
A Common Sense Approach Electronic
A Common Sense Approach   ElectronicA Common Sense Approach   Electronic
A Common Sense Approach ElectronicRegina Henderson
 
Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi PresentationPTIHPA
 

Viewers also liked (7)

Appraisers Direct, Inc.
Appraisers Direct, Inc.Appraisers Direct, Inc.
Appraisers Direct, Inc.
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
 
Air Traffic
Air TrafficAir Traffic
Air Traffic
 
A Common Sense Approach Electronic
A Common Sense Approach   ElectronicA Common Sense Approach   Electronic
A Common Sense Approach Electronic
 
Aca Talent
Aca TalentAca Talent
Aca Talent
 
Ciclismo Neiva
Ciclismo NeivaCiclismo Neiva
Ciclismo Neiva
 
Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 

Similar to GeneIndex: an open source parallel program for enumerating and locating words in a genome

Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assemblyZhuyi Xue
 
Horspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTUHorspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTUSaMarthHitnalli
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Flink Forward
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingVasia Kalavri
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptIndra Hermawan
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptxrediet43
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...Positive Hack Days
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big DataPingCAP
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionMengmeng Xu
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
 
Algorithm Analysis
Algorithm AnalysisAlgorithm Analysis
Algorithm AnalysisMegha V
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02mansab MIRZA
 

Similar to GeneIndex: an open source parallel program for enumerating and locating words in a genome (20)

Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
Fine Grained Complexity
Fine Grained ComplexityFine Grained Complexity
Fine Grained Complexity
 
discrete-hmm
discrete-hmmdiscrete-hmm
discrete-hmm
 
Horspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTUHorspool Algorithm in Design and Analysis of Algorithms in VTU
Horspool Algorithm in Design and Analysis of Algorithms in VTU
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Matlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.pptMatlab tutorial and Linear Algebra Review.ppt
Matlab tutorial and Linear Algebra Review.ppt
 
Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
 
Biochip
BiochipBiochip
Biochip
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Lec20 dimension1
Lec20 dimension1Lec20 dimension1
Lec20 dimension1
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
Lec1
Lec1Lec1
Lec1
 
Algorithm Analysis
Algorithm AnalysisAlgorithm Analysis
Algorithm Analysis
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Analysis
AnalysisAnalysis
Analysis
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02
 

More from PTIHPA

2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_onPTIHPA
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace VisualizationPTIHPA
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurementPTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configurationPTIHPA
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indianaPTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program AnalysisPTIHPA
 
Switc Hpa
Switc HpaSwitc Hpa
Switc HpaPTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert HenschelPTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In DetailPTIHPA
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir UsagePTIHPA
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopPTIHPA
 

More from PTIHPA (14)

2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 
Switc Hpa
Switc HpaSwitc Hpa
Switc Hpa
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

GeneIndex: an open source parallel program for enumerating and locating words in a genome

  • 1. GeneIndex: an open source parallel program for enumerating and locating words in a genome Huian Li, David Hart, Matthias Mueller, Ulf Markward, Craig Stewart August 3, 2009 A t3
  • 2. Contents • Motivation • Serial algorithm • Parallel implementation • Performance Analysis • Conclusion
  • 3. Motivation Question from a Biology professor: Given a word length, is th computational Gi dl th i the t ti l task of scanning a DNA sequence and recording th positions of all possible di the iti f ll ibl words trivial? 5 10 15 20 25 30 * * * * * * 5’ TAGCCGTGGCGGAGCCTCTTGGCTTTGTTTATTC 3’
  • 4. Serial algorithm • Straightforward implementation: Binary coding for A, C, G, T. For example: A: A 00 C: 01 G: 10 T: 11 5 10 15 20 * * * * T A G C C G T G G C G G A G C C T C T T G ... 110010010110111010011010001001011101111110 ...
  • 5. Serial algorithm • Given a sequence and a word length k in order to k, list all possible words, we scan the sequence once from left to right g 5 10 15 20 * * * * T A G C C G T G G C G G A G C C T C T T G ... 110010010110111010011010001001011101111110 ... T A G C 11001001 A G C C 00100101 G C C G 10010110 C T T G ... 0 011111100 ...
  • 6. Serial algorithm 5 10 15 20 * * * * T A G C C G T G G C G G A G C C T C T T G ... 110010010110111010011010001001011101111110 ... T A G C 11001001 A G C C 00100101 ENCODE( AGCC ) ENCODE("AGCC") = ENCODE("TAGC") & MASK << 2 | ENCODE('C') MASK = 4k-1 – 1 = 111111 (in case of k = 4)
  • 7. Serial algorithm • This essentially becomes a sorting problem since problem, each word is now converted into an integer. • Each word is associated with its position information: (Encoded Word, Position) • Sorting has to be stable so that for the same words words, their positions will be in a certain order.
  • 8. Serial algorithm Implementation details: • Words & positions are stored in a long long integer (8 bytes = 64 bits) • Hash table with a linked list for each entry • S Space required f all words i given sequence i i d for ll d in i is pre-allocated, instead of malloc one by one • M tl AND OR and SHIFT LEFT operations. Mostly AND, d SHIFT-LEFT ti
  • 11. Motivation for parallel implementation Another question from Biology professor: How about the human genome? Fact: Human genome includes about 3 billion DNA bases.
  • 12. Parallel implementation: input Large dataset input: • Each process reads its own partition from the input file. file • Boundary area between neighboring processes has to be considered. considered agcatgcatgcatcgatcgatcgatgc atcgatgcatcgatacgatgcatgcta gacgatacgagcatgcatctagcatgc t t t t t agtagcatgcatcgatgcattagcatg ctagctagcatgctagcatgcatcgat g gcatgctagcatgctagctagcatgct g g g g g g atgcatgcatgcatcatgcatcgatcg atcgtgcaatgcatgctacgatgcatg catcagtcagcatgcatgcatcgatcg atgcatcgatcgatgcatgcatgacga t t t t t t gcaatgatgcagtcatgcatcgacgag catcgatcgatgcatgcatgcat
  • 13. Parallel implementation: load balancing Computation and load balancing: 1. Each process deals with its own piece of data 2 All processes perform global sorting 2. Straightforward implementation: binary tree merge X sorting  Possible solution but could be problematic  Ideal solution leading to load balancing
  • 14. Parallel implementation: load balancing Straightforward implementation: binary tree merge sorting AAAA: AAAC: P0 ... TTTG: TTTT: AAAA: AAAA: AAAC: AAAC: AAAC P0 ... P2 ... TTTG: TTTG: TTTT: TTTT: AAAA: 5, 9 AAAA: 19 AAAA: 4, 8 AAAA: 35 AAAC: 22 AAAC: 12 AAAC: 67 AAAC: 46 P0 ... P1 ... P2 ... P3 ... TTTG: 101 TTTG: 201 TTTG: 88 TTTG: 40 TTTT: 80 TTTT: 26 TTTT: 53 TTTT: 30
  • 15. Parallel implementation: load balancing Possible solution but could be problematic:  Straightforward solution: partition word range [0, 4k) equally, so each process has  4 k  i 4 k  i  1   ,   , where i = 0, 1, ..., n-1  n n  AAAA: AAAA: AAAA: AAAA: AAAC: AAAC: AAAC: AAAC: CAAC: CAAC: CAAC: CAAC: CTTT: CTTT: CTTT: CTTT: ` GAAA: GAAA: GAAA: GAAA: GTTT: GTTT: GTTT: GTTT: TTTG: TTTG: TTTG: TTTG: TTTT: TTTT: TTTT: TTTT:
  • 16. Parallel implementation: load balancing Implementation of the straightforward solution: • Problem is that some words occur more often than others, leading to different memory requests for different p g y q processes AAAA: 5, 9 CAAA: 19 GAAA: 4, 8 TAAA: 35, 93 AAAC: 22, 37 CAAC: 12, 47 GAAC: 67, 72 TAAC: 46 ... ... ... ... ATTG: 101 CTTG: 201 GTTG: 88 TTTG: 40, 87 ATTT: 80 CTTT: 26 GTTT: 53 TTTT: 15, 30
  • 17. Parallel implementation: load balancing Ideal solution leading to load balancing: • partition the total number of words L-k+1 equally, so each p process has ( (L-k+1)/n words, where L is the length of ) , g given sequence, k is the given word length. Implementation: p 1. After each process scanned its own piece, we know that: 4 k 1 L  k 1  x 0 f (Wx , Pi )  n , where i = 0 1 ..., n 1 0, 1, n-1 g [0,4k) into many small divisions 2. We divide the word range [ y with total divisions of d (where d>>n): 4k ( j 1) 1 d 1 d L  k 1   j 0 f (Wx , Pi )  n , where i = 0, 1, ..., n-1 4k x j d
  • 18. Parallel implementation: load balancing Implementation: 3. The number of words in each small division as below: 4k ( j 1) 1 d T (i, j )   f (W , P ) x i , where i = 0, 1, ..., n-1 4k x  j d and j = 0, 1, ..., d-1 4. The total number of words in each small division across all p processes will be: 4k ( j 1) 1 n 1 d T ( j)    f (W , P ) x i , where j = 0 1 ..., d-1 0, 1, d 1 i 0 4k x  j d
  • 19. Parallel implementation: load balancing Implementation: 5. Find an array of boundary B, such that: B ( m 1) L  k 1 mT ( j )  n jB( ) where B(0)= 0, B(0<m<n) = any number in (0,d-1), and B(n)= d-1 6. Process Pi should have all words in: [B(i), [B(i) B(i+1)) , where i = 0 1 ..., n-1 0, 1, n 1.
  • 20. Parallel implementation: load balancing AAAA: AAAA: AAAA: AAAA: AAAC: AAAC: AAAC: AAAC: CAAC: CAAC: CAAC: CAAC: CTTT: CTTT: CTTT: CTTT: ` GAAA: GAAA: GAAA: GAAA: GTTT: GTTT: GTTT: GTTT: TTTG: TTTG: TTTG: TTTG: TTTT: TTTT: TTTT: TTTT:
  • 21. Parallel implementation: output Output: • Each process creates its own output file • If necessary, all files can concatenate into one necessary single file, while keeping the order AAAA: 5, 9, 10 AAAC: 22, 37 ... ... ... TTTG: 15, 30 TTTT: 40, 87
  • 23. Testbed specification • Consists of 768 IBM JS21 blades • On each blade: • 2 dual core PowerPC CPUs @ 2 5GHz dual-core 2.5GHz • 8 GB memory • SUSE Linux Enterprise Server 9 (ppc) • Interconnect: Myrinet • Parallel environment: MPI
  • 24. Performance analysis Number of Number of D. melanogaster H. sapiens nodes processes k=6 k=25 k=6 k=25 1 1 95 23203 2 2 53 5770 4 4 29 1500 8 8 15 386 16 16 9 107 212 93672 32 32 6 32 118 24934 64 64 5 14 73 5998 128 128 9 11 59 1450 256 256 11 15 49 558 512 512 18 22 71 195 Timings of running against two datasets on BigRed using 1 PPN (SECONDS)
  • 25. Performance analysis Number of Number of D. melanogaster H. sapiens nodes processes k=6 k=25 k=6 k=25 1 2 54 5958 2 4 31 1545 4 8 17 400 8 16 11 115 16 32 8 40 156 25661 32 64 6 19 101 6233 64 128 7 12 82 1506 128 256 25 16 61 576 256 512 20 27 81 198 512 1024 34 37 115 170 Timings of running against two datasets on BigRed using 2 PPN (SECONDS)
  • 26. Performance analysis Number of Number of D. melanogaster H. sapiens nodes processes k=6 k=25 k=6 k=25 1 4 37 1738 2 8 23 453 4 16 15 130 8 32 10 46 16 64 9 23 163 6649 32 128 11 17 121 1632 64 256 20 25 96 634 128 512 39 42 120 233 256 1024 79 90 180 187 512 2048 134 131 270 281 Timings of running against two datasets on BigRed using 4 PPN (SECONDS)
  • 27. Performance analysis Scalability in terms of node numbers 40.00 35.00 30.00 25.00 up Speedu 1 ppn 20.00 2 ppn 15.00 4 ppn pp 10.00 5.00 0.00 16 32 64 128 256 512 Number of nodes Scalability of enumerating 6-mers in H. sapiens
  • 28. Performance analysis Scalability in terms of node numbers 600.00 500.00 400.00 up Speedu 1 ppn 300.00 2 ppn 4 ppn pp 200.00 200 00 100.00 0.00 16 32 64 128 256 512 Number of nodes Scalability of enumerating 25-mers in H. sapiens
  • 29. Conclusion • Addressed questions from the biology professor  • Complicate solution aroused from memory restriction. restriction • It can handle words of length up to 30. • It can find often-repeated words, rarely-occurred , or fi d ft t d d l d even non-occurred words. • It scales relatively well on l l l ti l ll large cluster machines. l t hi • We recently developed a Java version for small DNA sequences, which was “ “our future work”. It can f ” zoom in or zoom out to view distribution and frequencies interactively. f i i t ti l
  • 30. The End Thank you