SlideShare uma empresa Scribd logo
1 de 12
SCABIO – a framework for bioinformatics
          algorithms in Scala


             Markus Gumbel


               Institute for
            Medical Informatics


           Mannheim University
       of Applied Sciences / Germany
                                          Hochschule
                                          Mannheim
SCABIO (Scala algorithms for bioinformatics)
  is...
      a framework for bioinformatics algorithms written in
       Scala
      used primarily for education (since 2010)
           introduced for a bioinformatics lecture & lab
           contains mainly standard algorithms
      open source but not yet released
       (i.e. no announcements etc.)
           premiere today!
           https://github.com/markusgumbel/scalabioalg
           Apache License Version 2.0                          Hochschule
                                                               Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012             2
Why a new framework, why Scala?

      For education
           generic dynamic programming package to demonstrate
           optimization principle of many bioinformatics algorithms
           it’s fun!
      Scala
           seamless integration with other JVM-based frameworks
           like BioJava or bioscala
           combines functional and object-oriented paradigms
           suitable for concurrent algorithms (e.g. AKKA)
           built-in support for domain specific languages etc.
                                                                      Hochschule
                                                                      Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                    3
Modules, Packages and Core-Classes

  Maven
subproject




                                                     Hochschule
                                                     Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012   4
Prerequisites / Installation

      Little dependencies
           Java 5 + Scala 2.9 for all modules
           Apache commons-math library (*.stat.Frequency
           used just one time) in module core
           Java Universal Network/Graph Framework (JUNG) in
           module demo (RNA viewer)
           BioJava 3 in module bioscala
      Right now no real installation procedure for users
      For developers: Maven 2
                                                                  Hochschule
                                                                  Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                5
SCABIO’s dynamic programming (DP) package

      Algorithm paradigm for                       A DP-based algorithm
       optimization (min/max)                        requires the specification of
         Bellman’s Principle of Optimality
                                                     the following information for
                                                     each state
        “If I have an already optimal
        solution at step n - 1 and I choose             What decisions can result in
        a decision for step n which is                  being in this state?
        optimal relative to n - 1 then I get
        the optimal solution for step n.”               What are the previous
                                                        states depending on a
      (intermediate) results of the                    decision?
       steps are stored in a table                      What is the value (cost) to
      Solution (=decisions) can be                     get to this state when a
       derived from the table                           specific decision was made?
                                                                                    Hochschule
                                                                                    Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                                  6
Example: Pairwise alignment
    (global, semi-global, local)
                                                  class Alignment(val s1: String, val s2: String, val mode: AlignmentMode,
                                                  ... extends DynPro[AlignmentStep] ... {

        Table size n              m              def n = s1.length + 1
                                                  def m = s2.length + 1                                      Just to give an
sim = 0.0                                                                                                   idea of the LOCs
                                                  def decisions(idx: Idx) = {
ACG-T.                                               if (idx.i == 0 && idx.j == 0) Array(NOTHING)
||| 
    |
ACGATC
         decision(idx)                               else if (idx.i == 0) Array(INSERT)
                                                     else if (idx.j == 0) Array(DELETE)
                                                     else Array(INSERT, DELETE, BOTH)
     | 0        1      2    3    4     5     6     }
     | -        A      C    G    A     T     C
-   -|----    ----   ---- ---- ---- ----- -----    def prevStates(idx: Idx, d: AlignmentStep) = d match {
0        previousStates(idx, d)
    -| 0.*   -2.    -4. -6. -8. -10. -12.           case NOTHING => Array()
1   A|-2.      1.*   -1. -3. -5.     -7.   -9.       case _ => Array(idx + deltaIdx(d))
2   C|-4.     -1.     2.* 0. -2.     -4.   -6.     }
3   G|-6.     -3.     0.   3.* 1.* -1.     -3.
4   T|-8.     -5.    -2.   1.   2.    2.*   0.*     def value(idx: Idx, d: AlignmentStep) = d match {
                                                      case NOTHING => 0
     
BBBIBI
         value(idx, d)                                case DELETE => if (idx.j == 0 && allowLeftGapsString2) 0 else values(d)
                        Decisions for                 case INSERT => if (idx.i == 0 && allowLeftGapsString1) 0 else values(d)
                       optimal solution               case _ => score(s1(idx.i - 1), s2(idx.j - 1))
                                                    }
                     (B = both, I =Insert)
                                                  ...                                                                           Hochschule
                                                                                                                                Mannheim
                                                  }

Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                                                                         7
Example: Viterbi-algorithm for pattern
  recognition (e.g. CpG islands)
                                                 class Viterbi(val s: Array[Char], val alphabet: Array[Char],
                                                  val states: Array[Char], val transP: Array[Array[Double]],
                                                  val emmP: Array[Array[Double]]) extends DynPro[Int]
                                                  with Backpropagation[Int] with MatrixPrinter[Int] {
      Table size n        m
                                                     def n = s.length + 1
                                                     def m = states.length

                                                     def decisions(idx: Idx) = {
      decision(idx)                                   if (idx.i == 0) (0 to 0).toArray
                                                       else (1 to states.length).toArray
                                                     }

                                                     def prevStates(idx: Idx, d: Int) =
                                                      if (idx.i > 0) Array(Idx(idx.i - 1, d - 1)) else Array()
      previousStates(idx, d)
                                                     def value(idx: Idx, dState: Int) = (idx.i, idx.j) match {
                                                       case (0, jState) => log(transP(0)(jState + 1))
                                                       case (iChar, jState) => {
                                                         val idxS = alphabet.indexOf(s(iChar - 1))
                                                         val e = emmP(jState + 1 - 1)(idxS) // state, char
      value(idx, d)                                     val t = transP(dState)(jState + 1)
                                                         log(e * t)
                                                       }
                                                     }
                                                     ...                                                             Hochschule
                                                                                                                     Mannheim
                                                 }

Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                                                                   8
Example: Nussinov-algorithm for secondary
  RNA-structures
  class NussinovEnergy(val s: String)                                  def prevStates(idx: Idx, d: NussinovDecision) = d.move match {
   extends DynPro[NussinovDecision] {                                    case EXTEND_BEGINNING => Array(Idx(idx.i + 1, idx.j))
                                                                         case EXTEND_END => Array(Idx(idx.i, idx.j - 1))
   def n = s.length                             Again: Just to           case PAIR => Array(Idx(idx.i + 1, idx.j - 1))
   def m = n                                   give an idea of           case SPLIT => {
                                                                           Array(Idx(idx.i, d.k), Idx(d.k + 1, idx.j))
   def decisions(idx: Idx) = {
                                                  the LOCs               }
     if (idx.j - idx.i < 1) Array()                                    }
     else {
       val splitList = if (idx.j - idx.i > 1) {                        def value(idx: Idx, d: NussinovDecision) = d.move match {
         val h = for (k <- (idx.i + 1) to (idx.j - 1)) yield {           case PAIR =>
           new NussinovDecision(SPLIT, idx, k, ' ', ' ')                  val e = (s(idx.i), s(idx.j)) match {
         }                                                                  case ('A', 'U') => -2.0
         h.toList                                                           case ('U', 'A') => -2.0
       } else Nil                                                           case ('C', 'G') => -3.0
       val otherList = for (state <- NussinovState.values;                  case ('G', 'C') => -3.0
                        if (state != SPLIT)) yield {                        case ('G', 'U') => -1.0
         new NussinovDecision(state, idx, 0, s(idx.i), s(idx.j))            case ('U', 'G') => -1.0
       }                                                                    case _ => 0.0
       val u = splitList.toList ::: otherList.toList ::: Nil              }
       u.toArray                                                          if (idx.j - idx.i > hairPinLoopSize) e else 0
     }                                                                   case _ => 0
   }                                                                   }

                                                                       override def extremeFunction = MIN                               Hochschule

                                                                   }                                                                    Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                                                                                      9
Example for BioJava 3 Integration
                                                 cast to BioJava Class
   object DNASeq {
     implicit def seq2string(seq: DNASequence) = seq.toString

       def main(args: Array[String]) {
         // AAI44185: Homo sapiens (human) NR1H4 protein
         // AAL57619: Gallus gallus (chicken) part. orphan nucl. receptor FXR
         val fxr1 = getEMBLBank("AAI44185").values.iterator.next
         val fxr2 = getEMBLBank("AAL57619").values.iterator.next
         val alignFxr = new Alignment(fxr1, fxr2, GLOBAL)            How can a
         println("sim = " + alignFxr.similarity)                        DNA-
         println(alignFxr.makeAlignmentString(alignFxr.solution))    Sequence
       }                                                             be a string?

       def getEMBLBank(accId: String) = {
         val url = new URL("http://www.ebi.ac.uk/ena/data/view/" +
                 accId + "&display=fasta")
         val ios = url.openStream()
         val map = FastaReaderHelper.readFastaDNASequence(ios)
         ios.close()
         map
       }                                                                       Hochschule
                                                                               Mannheim
   }
Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012                            10
Discussion & Outlook

      Source-code very concise
      How will SCABIO compete in real life
       bioinformatics applications?
      Future work
           Make project more open source
           More test cases
           More documentation
                manual in work (> 12 pages)
                scaladoc on-line
           Implement concurrent algorithms            Hochschule
                                                      Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012   11
Thank you! Questions?

      SCABIO
           https://github.com/markusgumbel/scalabioalg
           Apache License Version 2.0
      Contact
           m.gumbel@hs-mannheim.de
           http://www.mi.hs-mannheim.de/gumbel
      I am really looking forward for discussions
       with the BioJava, bioscala, ... folks
      There’s also a poster
                                                              Hochschule
                                                              Mannheim




Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012           12

Mais conteúdo relacionado

Mais procurados

Lecture11
Lecture11Lecture11
Lecture11
Bo Li
 
CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4
zukun
 
Declarative Name Binding and Scope Rules
Declarative Name Binding and Scope RulesDeclarative Name Binding and Scope Rules
Declarative Name Binding and Scope Rules
Eelco Visser
 
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELSEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
grssieee
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
Kensuke Mitsuzawa
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
Jia-Bin Huang
 

Mais procurados (20)

Lecture11
Lecture11Lecture11
Lecture11
 
Intro to threp
Intro to threpIntro to threp
Intro to threp
 
06slide
06slide06slide
06slide
 
05slide
05slide05slide
05slide
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4
 
Getting started with image processing using Matlab
Getting started with image processing using MatlabGetting started with image processing using Matlab
Getting started with image processing using Matlab
 
ICPR 2012
ICPR 2012ICPR 2012
ICPR 2012
 
Declarative Name Binding and Scope Rules
Declarative Name Binding and Scope RulesDeclarative Name Binding and Scope Rules
Declarative Name Binding and Scope Rules
 
Orsi Vldb11
Orsi Vldb11Orsi Vldb11
Orsi Vldb11
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With Scala
 
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELSEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
CS 354 Object Viewing and Representation
CS 354 Object Viewing and RepresentationCS 354 Object Viewing and Representation
CS 354 Object Viewing and Representation
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
Java cơ bản java co ban
Java cơ bản java co ban Java cơ bản java co ban
Java cơ bản java co ban
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
 
Functional Programming in C++
Functional Programming in C++Functional Programming in C++
Functional Programming in C++
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 

Semelhante a M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala

Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
Dmitriy Lyubimov
 
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterStockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
ACSG Section Montréal
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
butest
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 

Semelhante a M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala (20)

Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 
Mining Functional Patterns
Mining Functional PatternsMining Functional Patterns
Mining Functional Patterns
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache Spark
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-final
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Introduction to OpenCV
Introduction to OpenCVIntroduction to OpenCV
Introduction to OpenCV
 
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterStockage, manipulation et analyse de données matricielles avec PostGIS Raster
Stockage, manipulation et analyse de données matricielles avec PostGIS Raster
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
Scala in Places API
Scala in Places APIScala in Places API
Scala in Places API
 
Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011Liszt los alamos national laboratory Aug 2011
Liszt los alamos national laboratory Aug 2011
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its Application
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 

Mais de Jan Aerts

Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Jan Aerts
 

Mais de Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala

  • 1. SCABIO – a framework for bioinformatics algorithms in Scala Markus Gumbel Institute for Medical Informatics Mannheim University of Applied Sciences / Germany Hochschule Mannheim
  • 2. SCABIO (Scala algorithms for bioinformatics) is...  a framework for bioinformatics algorithms written in Scala  used primarily for education (since 2010) introduced for a bioinformatics lecture & lab contains mainly standard algorithms  open source but not yet released (i.e. no announcements etc.) premiere today! https://github.com/markusgumbel/scalabioalg Apache License Version 2.0 Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 2
  • 3. Why a new framework, why Scala?  For education generic dynamic programming package to demonstrate optimization principle of many bioinformatics algorithms it’s fun!  Scala seamless integration with other JVM-based frameworks like BioJava or bioscala combines functional and object-oriented paradigms suitable for concurrent algorithms (e.g. AKKA) built-in support for domain specific languages etc. Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 3
  • 4. Modules, Packages and Core-Classes Maven subproject Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 4
  • 5. Prerequisites / Installation  Little dependencies Java 5 + Scala 2.9 for all modules Apache commons-math library (*.stat.Frequency used just one time) in module core Java Universal Network/Graph Framework (JUNG) in module demo (RNA viewer) BioJava 3 in module bioscala  Right now no real installation procedure for users  For developers: Maven 2 Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 5
  • 6. SCABIO’s dynamic programming (DP) package  Algorithm paradigm for  A DP-based algorithm optimization (min/max) requires the specification of Bellman’s Principle of Optimality the following information for each state “If I have an already optimal solution at step n - 1 and I choose What decisions can result in a decision for step n which is being in this state? optimal relative to n - 1 then I get the optimal solution for step n.” What are the previous states depending on a  (intermediate) results of the decision? steps are stored in a table What is the value (cost) to  Solution (=decisions) can be get to this state when a derived from the table specific decision was made? Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 6
  • 7. Example: Pairwise alignment (global, semi-global, local) class Alignment(val s1: String, val s2: String, val mode: AlignmentMode, ... extends DynPro[AlignmentStep] ... {  Table size n m def n = s1.length + 1 def m = s2.length + 1 Just to give an sim = 0.0 idea of the LOCs def decisions(idx: Idx) = { ACG-T. if (idx.i == 0 && idx.j == 0) Array(NOTHING) |||  | ACGATC decision(idx) else if (idx.i == 0) Array(INSERT) else if (idx.j == 0) Array(DELETE) else Array(INSERT, DELETE, BOTH) | 0 1 2 3 4 5 6 } | - A C G A T C - -|---- ---- ---- ---- ---- ----- ----- def prevStates(idx: Idx, d: AlignmentStep) = d match { 0 previousStates(idx, d) -| 0.* -2. -4. -6. -8. -10. -12. case NOTHING => Array() 1 A|-2. 1.* -1. -3. -5. -7. -9. case _ => Array(idx + deltaIdx(d)) 2 C|-4. -1. 2.* 0. -2. -4. -6. } 3 G|-6. -3. 0. 3.* 1.* -1. -3. 4 T|-8. -5. -2. 1. 2. 2.* 0.* def value(idx: Idx, d: AlignmentStep) = d match { case NOTHING => 0  BBBIBI value(idx, d) case DELETE => if (idx.j == 0 && allowLeftGapsString2) 0 else values(d) Decisions for case INSERT => if (idx.i == 0 && allowLeftGapsString1) 0 else values(d) optimal solution case _ => score(s1(idx.i - 1), s2(idx.j - 1)) } (B = both, I =Insert) ... Hochschule Mannheim } Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 7
  • 8. Example: Viterbi-algorithm for pattern recognition (e.g. CpG islands) class Viterbi(val s: Array[Char], val alphabet: Array[Char], val states: Array[Char], val transP: Array[Array[Double]], val emmP: Array[Array[Double]]) extends DynPro[Int] with Backpropagation[Int] with MatrixPrinter[Int] {  Table size n m def n = s.length + 1 def m = states.length def decisions(idx: Idx) = {  decision(idx) if (idx.i == 0) (0 to 0).toArray else (1 to states.length).toArray } def prevStates(idx: Idx, d: Int) = if (idx.i > 0) Array(Idx(idx.i - 1, d - 1)) else Array()  previousStates(idx, d) def value(idx: Idx, dState: Int) = (idx.i, idx.j) match { case (0, jState) => log(transP(0)(jState + 1)) case (iChar, jState) => { val idxS = alphabet.indexOf(s(iChar - 1)) val e = emmP(jState + 1 - 1)(idxS) // state, char  value(idx, d) val t = transP(dState)(jState + 1) log(e * t) } } ... Hochschule Mannheim } Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 8
  • 9. Example: Nussinov-algorithm for secondary RNA-structures class NussinovEnergy(val s: String) def prevStates(idx: Idx, d: NussinovDecision) = d.move match { extends DynPro[NussinovDecision] { case EXTEND_BEGINNING => Array(Idx(idx.i + 1, idx.j)) case EXTEND_END => Array(Idx(idx.i, idx.j - 1)) def n = s.length Again: Just to case PAIR => Array(Idx(idx.i + 1, idx.j - 1)) def m = n give an idea of case SPLIT => { Array(Idx(idx.i, d.k), Idx(d.k + 1, idx.j)) def decisions(idx: Idx) = { the LOCs } if (idx.j - idx.i < 1) Array() } else { val splitList = if (idx.j - idx.i > 1) { def value(idx: Idx, d: NussinovDecision) = d.move match { val h = for (k <- (idx.i + 1) to (idx.j - 1)) yield { case PAIR => new NussinovDecision(SPLIT, idx, k, ' ', ' ') val e = (s(idx.i), s(idx.j)) match { } case ('A', 'U') => -2.0 h.toList case ('U', 'A') => -2.0 } else Nil case ('C', 'G') => -3.0 val otherList = for (state <- NussinovState.values; case ('G', 'C') => -3.0 if (state != SPLIT)) yield { case ('G', 'U') => -1.0 new NussinovDecision(state, idx, 0, s(idx.i), s(idx.j)) case ('U', 'G') => -1.0 } case _ => 0.0 val u = splitList.toList ::: otherList.toList ::: Nil } u.toArray if (idx.j - idx.i > hairPinLoopSize) e else 0 } case _ => 0 } } override def extremeFunction = MIN Hochschule } Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 9
  • 10. Example for BioJava 3 Integration cast to BioJava Class object DNASeq { implicit def seq2string(seq: DNASequence) = seq.toString def main(args: Array[String]) { // AAI44185: Homo sapiens (human) NR1H4 protein // AAL57619: Gallus gallus (chicken) part. orphan nucl. receptor FXR val fxr1 = getEMBLBank("AAI44185").values.iterator.next val fxr2 = getEMBLBank("AAL57619").values.iterator.next val alignFxr = new Alignment(fxr1, fxr2, GLOBAL) How can a println("sim = " + alignFxr.similarity) DNA- println(alignFxr.makeAlignmentString(alignFxr.solution)) Sequence } be a string? def getEMBLBank(accId: String) = { val url = new URL("http://www.ebi.ac.uk/ena/data/view/" + accId + "&display=fasta") val ios = url.openStream() val map = FastaReaderHelper.readFastaDNASequence(ios) ios.close() map } Hochschule Mannheim } Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 10
  • 11. Discussion & Outlook  Source-code very concise  How will SCABIO compete in real life bioinformatics applications?  Future work Make project more open source More test cases More documentation  manual in work (> 12 pages)  scaladoc on-line Implement concurrent algorithms Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 11
  • 12. Thank you! Questions?  SCABIO https://github.com/markusgumbel/scalabioalg Apache License Version 2.0  Contact m.gumbel@hs-mannheim.de http://www.mi.hs-mannheim.de/gumbel  I am really looking forward for discussions with the BioJava, bioscala, ... folks  There’s also a poster Hochschule Mannheim Markus Gumbel - SCABIO - BOSC '12 - 14.07.2012 12