SlideShare a Scribd company logo
1 of 50
Download to read offline
BioPerl Update 2010:
Towards a Modern BioPerl
Chris Fields (UIUC)
BOSC 7-10-10
Present Day BioPerl



✤   Addressing new bioinformatics problems

✤   Collaborations in Open Bioinformatics Foundation

✤   Google Summer of Code
Towards a Modern BioPerl



✤   Lowering the barrier for new users to become involved

✤   Using Modern Perl language features

✤   Dealing with the BioPerl monolith
BioPerl 2.0?



✤   BioPerl and Modern Perl OOP (Moose)

✤   BioPerl and Perl 6
Background

✤   Started in 1996, many contributors over the years
    ✤   Jason Stajich (UCR)               ✤   Ian Korf (Wash U)

    ✤   Hilmar Lapp (NESCent)             ✤   Chris Mungall (NCBO)

    ✤   Heikki Lehväslaiho (KAUST)        ✤   Brian Osborne (BioTeam)

    ✤   Georg Fuellen (Bielefeld)         ✤   Steve Trutane (Stanford)

    ✤   Ewan Birney (Sanger, EBI)         ✤   Sendu Bala (Sanger)

    ✤   Aaron Mackey (Univ. Virginia)     ✤   Dave Messina (Sonnhammer Lab)

    ✤   Chris Dagdigian (BioTeam)         ✤   Mark Jensen (TCGA)

    ✤   Steven Brenner (UC-Berkeley)      ✤   Rob Buels (SGN)

    ✤   Lincoln Stein (OICR, CSHL)        ✤   Many, many more!
Background


✤   Open source: ‘Released under the same license as Perl itself’ i.e.
    Artistic

✤   http://bioperl.org

✤   Core developers - make releases, drive the project, set vision

✤   Regular contributors - have direct commit access
BioPerl Distributions



✤   BioPerl Core - the main distribution (aka ‘bioperl-live’ if using dev
    version)

✤   BioPerl-Run - Perl ‘wrappers’ for common bioinformatics tools

✤   BioPerl-DB - BioSQL ORM to BioPerl classes
Biological Sequences
✤   Bio::Seq - sequence record class
         #!/bin/perl -w

         use Modern::Perl;
         use Bio::Seq;

         my $seq_obj = Bio::Seq->new(-seq             =>   "aaaatgggggggggggccccgtt",
                                     -display_id      =>   "ABC12345",
                                     -desc            =>   "example 1",
                                     -alphabet        =>   "dna");

         say $seq_obj->display_id;   # ABC12345
         say $seq_obj->desc;         # example 1
         say $seq_obj->seq;          # aaaatgggggggggggccccgtt

         my $revcom = $seq_obj->revcom; # new Bio::Seq, but revcom
         say $revcom->seq;          # aacggggcccccccccccatttt
Sequence I/O
✤   Bio::SeqIO - sequence I/O stream classes (pluggable)
                 #!/usr/bin/perl -w

                 use Modern::Perl;
                 use Bio::SeqIO;

                 my ($infile, $outfile) = @ARGV;

                 my $in = Bio::SeqIO->new(-file => $infile,
                                          -format => 'genbank');
                 my $out = Bio::SeqIO->new(-file => ">$outfile",
                                          -format => 'fasta');

                 while (my $seq_obj = $in->next_seq) {
                     say $seq_obj->display_id;
                     $out->write_seq($seq_obj);
                 }
Sequence Features

✤   Bio::SeqFeature::Generic - generic SF implementation
                                                   GenBank File
use Modern::Perl;                                               source            1..2629
use Bio::SeqIO;                                                                   /organism="Enterococcus faecalis OG1RF"
                                                                                  /mol_type="genomic DNA"
my $in = Bio::SeqIO->new(-file => shift,                                          /strain="OG1RF"
                         -format => 'genbank');                                   /db_xref="taxon:474186"
                                                                gene              25..>2629
while (my $seq_obj = $in->next_seq) {                                             /gene="pyr operon"
    for my $feat_obj ($seq_obj->get_SeqFeatures) {                                /note="pyrimidine biosynthetic operon"
        say "Primary tag: ".$feat_obj->primary_tag;
        say "Location: ".$feat_obj->location->to_FTstring;               Primary tag: source
        for my $tag ($feat_obj->get_all_tags) {                          Location: 1..2629
            say " tag: $tag";                                              tag: db_xref
            for my $value ($feat_obj->get_tag_values($tag)) {                value: taxon:474186
                say "    value: $value";                                   tag: mol_type
            }                                                                value: genomic DNA
        }                                                                  tag: organism
    }                                                                        value: Enterococcus faecalis OG1RF
}                                                                          tag: strain
                                                                             value: OG1RF
Sequence Features

✤   Bio::SeqFeature::Generic - generic SF implementation
                                                   GenBank File
use Modern::Perl;                                               source            1..2629
use Bio::SeqIO;                                                                   /organism="Enterococcus faecalis OG1RF"
                                                                                  /mol_type="genomic DNA"
my $in = Bio::SeqIO->new(-file => shift,                                          /strain="OG1RF"
                         -format => 'genbank');                                   /db_xref="taxon:474186"
                                                                gene              25..>2629
while (my $seq_obj = $in->next_seq) {                                             /gene="pyr operon"
    for my $feat_obj ($seq_obj->get_SeqFeatures) {                                /note="pyrimidine biosynthetic operon"
        say "Primary tag: ".$feat_obj->primary_tag;
        say "Location: ".$feat_obj->location->to_FTstring;               Primary tag: source
        for my $tag ($feat_obj->get_all_tags) {                          Location: 1..2629
            say " tag: $tag";                                              tag: db_xref
            for my $value ($feat_obj->get_tag_values($tag)) {                value: taxon:474186
                say "    value: $value";                                   tag: mol_type
            }                                                                value: genomic DNA
        }                                                                  tag: organism
    }                                                                        value: Enterococcus faecalis OG1RF
}                                                                          tag: strain
                                                                             value: OG1RF
Sequence Features

✤   Bio::SeqFeature::Generic - generic SF implementation
                                                   GenBank File
use Modern::Perl;                                               source            1..2629
use Bio::SeqIO;                                                                   /organism="Enterococcus faecalis OG1RF"
                                                                                  /mol_type="genomic DNA"
my $in = Bio::SeqIO->new(-file => shift,                                          /strain="OG1RF"
                         -format => 'genbank');                                   /db_xref="taxon:474186"
                                                                gene              25..>2629
while (my $seq_obj = $in->next_seq) {                                             /gene="pyr operon"
    for my $feat_obj ($seq_obj->get_SeqFeatures) {                                /note="pyrimidine biosynthetic operon"
        say "Primary tag: ".$feat_obj->primary_tag;
        say "Location: ".$feat_obj->location->to_FTstring;               Primary tag: source
        for my $tag ($feat_obj->get_all_tags) {                          Location: 1..2629
            say " tag: $tag";                                              tag: db_xref
            for my $value ($feat_obj->get_tag_values($tag)) {                value: taxon:474186
                say "    value: $value";                                   tag: mol_type
            }                                                                value: genomic DNA
        }                                                                  tag: organism
    }                                                                        value: Enterococcus faecalis OG1RF
}                                                                          tag: strain
                                                                             value: OG1RF
Report Parsing
     Query= gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I,
     homoserine dehydrogenase I [Escherichia coli]
              (820 letters)

     Database: ecoli.aa
                4289 sequences; 1,358,990 total letters

     Searching..................................................done

                                                                            Score       E
     Sequences producing significant alignments:                            (bits)    Value

     gb|AAC73113.1|   (AE000111)   aspartokinase I, homoserine dehydrogen...   1567   0.0
     gb|AAC76922.1|   (AE000468)   aspartokinase II and homoserine dehydr...    332   1e-91
     gb|AAC76994.1|   (AE000475)   aspartokinase III, lysine sensitive [E...    184   3e-47
     gb|AAC73282.1|   (AE000126)   uridylate kinase [Escherichia coli]           42   3e-04

     >gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia
                coli]
               Length = 820

      Score = 1567 bits (4058), Expect = 0.0
      Identities = 806/820 (98%), Positives = 806/820 (98%)

     Query: 1   MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA 60
                MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA
     Sbjct: 1   MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA 60
Report Parsing
                                                        Query=gi|1786183|gb|AAC73113.1|
✤   Bio::SearchIO                                        Hit=gb|AAC73113.1|
#!/usr/bin/perl -w                                       Length=820
                                                         Percent_id=98.2926829268293
use Modern::Perl;
use Bio::SearchIO;
                                                        Query=gi|1786183|gb|AAC73113.1|
my $in = Bio::SearchIO->new(-format => 'blast',
                            -file   => 'ecoli.bls');
                                                         Hit=gb|AAC76922.1|
                                                         Length=821
while( my $result = $in->next_result ) {                 Percent_id=29.5980511571255
  while( my $hit = $result->next_hit ) {
    while( my $hsp = $hit->next_hsp ) {                 Query=gi|1786183|gb|AAC73113.1|
      say "Query=".$result->query_name;
                                                         Hit=gb|AAC76994.1|
      say " Hit=".$hit->name;
                                                         Length=471
      say " Length=".$hsp->length('total');
      say " Percent_id=".$hsp->percent_identity."n";    Percent_id=30.1486199575372
    }
  }                                                     Query=gi|1786183|gb|AAC73113.1|
}                                                        Hit=gb|AAC73282.1|
                                                         Length=97
                                                         Percent_id=28.8659793814433
Local/Remote Database Interfaces

✤   Bio::DB::GenBank

              #!/bin/perl -w

              use Modern::Perl;
              use Bio::DB::GenBank;

              my $db_obj = Bio::DB::GenBank->new;    # query NCBI nuc db

              my $seq_obj = $db_obj->get_Seq_by_acc('A00002');

              say $seq_obj->display_id;   # A00002
              say $seq_obj->length();     # 194




✤   Also EntrezGene, GenPept, RefSeq, UniProt, EBI, etc.
And Lots More!

✤   Bio::Align/IO            ✤   Bio::Map/IO

✤   Bio::Assembly/IO         ✤   Bio::Restriction/IO

✤   Bio::Tree/IO             ✤   Bio::Structure/IO

✤   Local flatfile databases   ✤   Bio::Factory

✤   Bio::Graphics            ✤   Bio::Tools::Run (catch-all namespace)

✤   SeqFeature databases     ✤   Bio::Factory (create objects)

✤   Bio::Pedigree/IO         ✤   Bio::Range/Location

✤   Bio::Coordinate/IO
Current Development
Next-Gen Sequence



✤   Second-generation/next-generation sequencing

    ✤   This is Lincoln Stein

    ✤   There is a reason he is smiling...
Next-Gen Sequence

✤   Bio-SamTools - support for SAM and BAM data (via SamTools)

✤   Bio-BigFile - support for BigWig/BigBed (via Jim Kent’s UCSC tools)

    ✤   Separate CPAN distributions

✤   GBrowse (Lincoln’s talk this afternoon), BioPerl

    ✤   Via SeqFeatures (high-level API for both modules)

    ✤   Via Bio::Assembly and BioPerl-Run (using the above modules)
Data Courtesy R. Khetani, M. Hudson, G. Robinson
New Tools/Wrappers

✤   BowTie            ✤   Infernal v.1.0
✤   BWA               ✤   NCBI eUtils (SOAP, CGI-based)
✤   MAQ               ✤   TopHat/CuffLinks (upcoming)
✤   BEDTools (beta)   ✤   The Cloud - bioperl-max
✤   SAMTools
                        Mark Jensen,
✤   HMMER3            Thomas Sharpton,
                       Dave Messina,
✤   BLAST+
                         Kai Blin,
✤   PAML               Dan Kortschak
Collaborations

  Published online 16 December 2009                               Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771
                                                                                                  doi:10.1093/nar/gkp1137

  SURVEY AND SUMMARY
  The Sanger FASTQ file format for sequences
  with quality scores, and the Solexa/Illumina
  FASTQ variants
  Peter J. A. Cock1,*, Christopher J. Fields2, Naohisa Goto3, Michael L. Heuer4 and
  Peter M. Rice5
  1
   Plant Pathology, SCRI, Invergowrie, Dundee DD2 5DA, UK, 2Institute for Genomic Biology, 1206 W. Gregory
  Drive, M/C 195, University of Illinois at Urbana-Champaign, IL 61801, USA, 3Genome Information Research
  Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871,
  Japan, 4Harbinger Partners, Inc., 855 Village Center Drive, Suite 356, St. Paul, MN 55127, USA and 5EMBL
  Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
  Cambridge CB10 1SD, UK

  Received October 13, 2009; Revised November 13, 2009; Accepted November 17, 2009



  ABSTRACT                                                         of an explicit standard some parsers will fail to cope with
                                                                   very long ‘>’ title lines or very long sequences without
  FASTQ has emerged as a common file format for                    line wrapping. There is also no standardization for
The Google Summer of Code



✤   O|B|F was accepted this year for the first time

✤   Headed by Rob Buels (SGN), with some help from Hilmar Lapp and
    myself

✤   Six projects, covering BioPerl, BioJava, Biopython, BioRuby
The Google Summer of Code

✤   BioPerl has actually been part of the Google Summer of Code for the
    last three years (as have many other Bio*):

    ✤   NESCent - admin: H. Lapp:

        ✤   2008 - PhyloXML parsing (student: Mira Han)

        ✤   2009 - NeXML parsing (student: Chase Miller)

    ✤   O|B|F - admin: R. Buels:

        ✤   2010 - Alignment subsystem refactoring (student: Jun Yin)
GSoC - Alignment Subsystem

✤   Clean up current code

✤   Include capability of dealing with large datasets

✤   Target next-gen data, very large alignments?

    ✤   Abstract the backend (DB, memory, etc.)

    ✤   SAM/BAM may work (via Bio::DB::SAM)

    ✤   ...but what about protein sequences?
Towards a Modern BioPerl
Towards a Modern BioPerl


✤   BioPerl will be turning 15 soon

✤   What can we improve?

✤   What can we do with the current code?

✤   Maybe some that we can use in a BioPerl 2.0?

✤   Or a BioPerl 6?
What We Can Do Now



✤   Lower the barrier

✤   Use Modern Perl

✤   Deal with the monolith
Lower the Barrier

✤   We have already started on this - May 2010

✤   Migrate source code repository to git and GitHub

✤   Original BioPerl developers are added as collaborators on GitHub...

    ✤   ...but now anyone can now ‘fork’ BioPerl, make changes, submit
        ‘pull requests’, etc.

✤   Since May, have had many forks, pull requests with code reviews (so
    a decent success)
Using Modern Perl

✤   Minimal version of Perl required for BioPerl is v5.6.1

✤   Even v5.8.1 is considered quite old

✤   Both the 5.6.x and 5.8.x releases are EOL (as of Dec. 2008)
Using Modern Perl

✤   Minimal version of Perl required for BioPerl is v5.6.1

✤   Even v5.8.1 is considered quite old

✤   Both the 5.6.x and 5.8.x releases are EOL (as of Dec. 2008)
Using Modern Perl

say                                        defined-or

print "I like newlinesn";                 # work only if false && defined
                                           $foo ||= 'default';
say "I like newlines";
                                           if (!defined($foo)) {
                                               $foo = 'default'
yada yada                                  }

                                           $foo //= 'default';
sub implement_me {
    shift->throw_not_implemented
}

sub implement_me { ... }     # yada yada
Using Modern Perl

Smart Match                          given/when

if ($key ~~ %hash) { # like exists
                                     given ($foo) {
    # do something
                                         when (%lookup) { ... }
}
                                         when (/^(d+)/) { ... }
                                         when (/^[A-Za-z]+/) { ... }
if ($foo ~~ /d+/ ) { # like =~
                                         default { ... }
    # do something
                                     }
}
Dealing with the Monolith

✤   Release manager nightmares:

    ✤   Remote databases disappear (XEMBL)

    ✤   Others change service or URLs (SeqHound)

    ✤   Services become obsolete (Pise)

    ✤   Developers move on, disappear, modules bit-rot (not saying :)

✤   How do we solve this problem?
Dealing with the Monolith

                      Classes        Tests (Files)
    bioperl-live
                        874          23146 (341)
       (Core)
    bioperl-run        123*           2468 (80)

    bioperl-db          72             113 (16)

  bioperl-network        9             327 (9)

 * Had 285 more prior to Pise module removal!
Dealing with the Monolith


✤   Maybe we shouldn’t be friendly to the monolith

✤   Maybe we should ‘blow it up’

✤   (Of course, that means make the code modular)

✤   It was originally designed with that somewhat in mind (interfaces)
Dealing with the Monolith

✤   Separate distributions make it easier to submit fixes as needed

    ✤   However, separate distributions make developing a little trickier

✤   Can we create a distribution that resembles BioPerl as users know it?

✤   Is this something we should worry about?

    ✤   YES

    ✤   Don’t alienate end-users!
Towards BioPerl 2.0?



✤   Biome: BioPerl with Moose

✤   BioPerl6: self-explanatory
Biome

✤   BioPerl classes implemented in Moose

✤   GitHub: http://github.com/cjfields/biome

✤   Implemented: Ranges, Locations, simple PrimarySeq, Annotation,
    SeqFeatures, prototype SeqIO

✤   Interfaces converted to Moose Roles

✤   ‘Type’-checking used for data types
Role
package Biome::Role::Range;
                                                Attributes
use Biome::Role;
use Biome::Types qw(SequenceStrand);

requires 'to_string';                  Class
                                       package Biome::Range;
has strand    =>   (
    isa       =>   SequenceStrand,
                                       use Biome;
    is        =>   'rw',
    default   =>   0,
                                       with 'Biome::Role::Range';
    coerce    =>   1
);
                                       sub to_string {
                                           my ($self) = @_;
has start     => (
                                           return sprintf("(%s, %s) strand=%s",
    is        => 'rw',
                                                          $self->start,
    isa       => 'Int',
                                                          $self->end,
);
                                                          $self->strand);
                                       }
has end       => (
    is        => 'rw',
    isa       => 'Int'
);

sub length {
    $_[0]->end - $_[0]->start + 1;
}
BioPerl 6


✤   BioPerl6: http://github.com/cjfields/bioperl6

✤   Little has been done beyond simple implementations

✤   Code is open to anyone for experimentation

✤   Ex: Philip Mabon donated a FASTA grammar:
Grammar (FASTA)                     Actions (FASTA)
grammar Bio::Grammar::Fasta {
     token TOP {
        ^<fasta>+ $

    }
    token fasta {
        <description_line> <sequence>
    }

    token description_line    {
        ^^> <id> <.ws> <description> n
    }
    token id           {
        | <identifier>
        | <generic_id>
    }
    token identifier   {
        S+
    }
    token generic_id {
        S+
    }

    token description   {
        N+
    }
    token sequence      {
        <-[>]>+
    }
}
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Grammar (FASTA)                                    Actions (FASTA)
grammar Bio::Grammar::Fasta {              class Bio::Grammar::Actions::Fasta {
     token TOP {                               method TOP($/){
        ^<fasta>+ $                                my @matches = gather for $/<fasta> -> $m {
                                                       take $m.ast;
    }                                              };
    token fasta {
        <description_line> <sequence>              make @matches;
    }                                          }
                                               method fasta($/){
    token description_line    {                    my $id =$/<description_line>.ast<id>;
        ^^> <id> <.ws> <description> n           my $desc = $/<description_line>.ast<description>;
    }                                              my $obj = Bio::PrimarySeq.new(
    token id           {                               display_id => $id,
        | <identifier>                                 description => $desc,
        | <generic_id>                                 seq         => $/<sequence>.ast);
    }                                              make $obj;
    token identifier   {                       }
        S+                                    method description_line($/){
    }                                              make $/;
    token generic_id {                         }
        S+                                    method id($/) {
    }                                              make $/;
                                               }
    token description   {                      method description($/){
        N+                                        make $/;
    }                                          }
    token sequence      {                      method sequence($/){
        <-[>]>+                                    make (~$/).subst("n", '', :g);
    }                                          }
}                                          }
Acknowledgements


✤   All BioPerl developers

✤   Chris Dagdigian and Mauricio Herrera Cuadra (O|B|F gurus)

✤   Cross-Collaborative work: Peter Cock (Biopython), Pjotr Prins
    (BioLib, BioRuby), Naohisa Goto (BioRuby), Michael Heuer and
    Andreas Prlic (BioJava), Peter Rice (EMBOSS)

✤   Questions? Do we even have time?

More Related Content

Viewers also liked

System Case Study
System Case StudySystem Case Study
System Case Studylaze
 
Chap009 business marketing channels partnerships for customer service
Chap009 business marketing channels partnerships for customer serviceChap009 business marketing channels partnerships for customer service
Chap009 business marketing channels partnerships for customer serviceHee Young Shin
 
Zorg En Welzijn Projecten In Beeld
Zorg En Welzijn   Projecten In BeeldZorg En Welzijn   Projecten In Beeld
Zorg En Welzijn Projecten In BeeldClairtje01
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosKevin Amboe
 
Hoe schrijf je een brief?
Hoe schrijf je een brief?Hoe schrijf je een brief?
Hoe schrijf je een brief?CVO-SSH
 
2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactoren2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactorenCVO-SSH
 
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Luis Cipriani
 
Graduate Students Workshop
Graduate Students Workshop Graduate Students Workshop
Graduate Students Workshop Naz Torabi
 
Informe anual 2010
Informe anual 2010Informe anual 2010
Informe anual 2010fmercedc
 
Building and publishing e book
Building and publishing e bookBuilding and publishing e book
Building and publishing e bookVera Akpokodje
 
Aliens in Our Uplands: Managing Past Mistakes, Preventing New Recruits
Aliens in Our Uplands: Managing Past Mistakes, Preventing New RecruitsAliens in Our Uplands: Managing Past Mistakes, Preventing New Recruits
Aliens in Our Uplands: Managing Past Mistakes, Preventing New RecruitsCary Institute of Ecosystem Studies
 
Chefs catalog coupon
Chefs catalog couponChefs catalog coupon
Chefs catalog couponMaterazzi3
 
ctrl-EFF Pitch
ctrl-EFF Pitchctrl-EFF Pitch
ctrl-EFF Pitchnubela
 

Viewers also liked (19)

System Case Study
System Case StudySystem Case Study
System Case Study
 
Chap009 business marketing channels partnerships for customer service
Chap009 business marketing channels partnerships for customer serviceChap009 business marketing channels partnerships for customer service
Chap009 business marketing channels partnerships for customer service
 
Zorg En Welzijn Projecten In Beeld
Zorg En Welzijn   Projecten In BeeldZorg En Welzijn   Projecten In Beeld
Zorg En Welzijn Projecten In Beeld
 
Economic and Policy Impacts of Climate Change
Economic and Policy Impacts of Climate ChangeEconomic and Policy Impacts of Climate Change
Economic and Policy Impacts of Climate Change
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videos
 
Hoe schrijf je een brief?
Hoe schrijf je een brief?Hoe schrijf je een brief?
Hoe schrijf je een brief?
 
2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactoren2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactoren
 
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
 
Graduate Students Workshop
Graduate Students Workshop Graduate Students Workshop
Graduate Students Workshop
 
Informe anual 2010
Informe anual 2010Informe anual 2010
Informe anual 2010
 
Building and publishing e book
Building and publishing e bookBuilding and publishing e book
Building and publishing e book
 
INTEF
INTEFINTEF
INTEF
 
Aliens in Our Uplands: Managing Past Mistakes, Preventing New Recruits
Aliens in Our Uplands: Managing Past Mistakes, Preventing New RecruitsAliens in Our Uplands: Managing Past Mistakes, Preventing New Recruits
Aliens in Our Uplands: Managing Past Mistakes, Preventing New Recruits
 
Jayb
JaybJayb
Jayb
 
Small Business Profits Tune-Up
Small Business Profits Tune-UpSmall Business Profits Tune-Up
Small Business Profits Tune-Up
 
Christmasfood
ChristmasfoodChristmasfood
Christmasfood
 
Chefs catalog coupon
Chefs catalog couponChefs catalog coupon
Chefs catalog coupon
 
ctrl-EFF Pitch
ctrl-EFF Pitchctrl-EFF Pitch
ctrl-EFF Pitch
 
My 2d versatility presentation4
My 2d versatility presentation4My 2d versatility presentation4
My 2d versatility presentation4
 

Similar to Fields bosc2010 bio_perl

Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databasesShuichi Kawashima
 
100603_TogoWS_SOAP
100603_TogoWS_SOAP100603_TogoWS_SOAP
100603_TogoWS_SOAPocha_kaneko
 
Modware
ModwareModware
Modwarebosc
 
Formats de données en biologie
Formats de données en biologieFormats de données en biologie
Formats de données en biologiepierrepo
 
BioPerl Project Update
BioPerl Project UpdateBioPerl Project Update
BioPerl Project Updatebosc
 
iExpo Paris 10 juin 2010-Velterop
iExpo Paris 10 juin 2010-VelteropiExpo Paris 10 juin 2010-Velterop
iExpo Paris 10 juin 2010-Velteropvelterop
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Comparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerlComparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerlJason Stajich
 
TYPO3 Flow 2.0 Workshop T3BOARD13
TYPO3 Flow 2.0 Workshop T3BOARD13TYPO3 Flow 2.0 Workshop T3BOARD13
TYPO3 Flow 2.0 Workshop T3BOARD13Robert Lemke
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipelineDan Bolser
 

Similar to Fields bosc2010 bio_perl (20)

Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databases
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014
 
Bioinformatica t2-databases
Bioinformatica t2-databasesBioinformatica t2-databases
Bioinformatica t2-databases
 
第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo
 
100603_TogoWS_SOAP
100603_TogoWS_SOAP100603_TogoWS_SOAP
100603_TogoWS_SOAP
 
Modware
ModwareModware
Modware
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Formats de données en biologie
Formats de données en biologieFormats de données en biologie
Formats de données en biologie
 
BioPerl Project Update
BioPerl Project UpdateBioPerl Project Update
BioPerl Project Update
 
iExpo Paris 10 juin 2010-Velterop
iExpo Paris 10 juin 2010-VelteropiExpo Paris 10 juin 2010-Velterop
iExpo Paris 10 juin 2010-Velterop
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Crispr/cas9 101
Crispr/cas9 101Crispr/cas9 101
Crispr/cas9 101
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Comparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerlComparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerl
 
TYPO3 Flow 2.0 Workshop T3BOARD13
TYPO3 Flow 2.0 Workshop T3BOARD13TYPO3 Flow 2.0 Workshop T3BOARD13
TYPO3 Flow 2.0 Workshop T3BOARD13
 
GoTermsAnalysisWithR
GoTermsAnalysisWithRGoTermsAnalysisWithR
GoTermsAnalysisWithR
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipeline
 
2.CRISPR .pptx
2.CRISPR .pptx2.CRISPR .pptx
2.CRISPR .pptx
 

More from BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

More from BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Fields bosc2010 bio_perl

  • 1. BioPerl Update 2010: Towards a Modern BioPerl Chris Fields (UIUC) BOSC 7-10-10
  • 2. Present Day BioPerl ✤ Addressing new bioinformatics problems ✤ Collaborations in Open Bioinformatics Foundation ✤ Google Summer of Code
  • 3. Towards a Modern BioPerl ✤ Lowering the barrier for new users to become involved ✤ Using Modern Perl language features ✤ Dealing with the BioPerl monolith
  • 4. BioPerl 2.0? ✤ BioPerl and Modern Perl OOP (Moose) ✤ BioPerl and Perl 6
  • 5. Background ✤ Started in 1996, many contributors over the years ✤ Jason Stajich (UCR) ✤ Ian Korf (Wash U) ✤ Hilmar Lapp (NESCent) ✤ Chris Mungall (NCBO) ✤ Heikki Lehväslaiho (KAUST) ✤ Brian Osborne (BioTeam) ✤ Georg Fuellen (Bielefeld) ✤ Steve Trutane (Stanford) ✤ Ewan Birney (Sanger, EBI) ✤ Sendu Bala (Sanger) ✤ Aaron Mackey (Univ. Virginia) ✤ Dave Messina (Sonnhammer Lab) ✤ Chris Dagdigian (BioTeam) ✤ Mark Jensen (TCGA) ✤ Steven Brenner (UC-Berkeley) ✤ Rob Buels (SGN) ✤ Lincoln Stein (OICR, CSHL) ✤ Many, many more!
  • 6. Background ✤ Open source: ‘Released under the same license as Perl itself’ i.e. Artistic ✤ http://bioperl.org ✤ Core developers - make releases, drive the project, set vision ✤ Regular contributors - have direct commit access
  • 7. BioPerl Distributions ✤ BioPerl Core - the main distribution (aka ‘bioperl-live’ if using dev version) ✤ BioPerl-Run - Perl ‘wrappers’ for common bioinformatics tools ✤ BioPerl-DB - BioSQL ORM to BioPerl classes
  • 8. Biological Sequences ✤ Bio::Seq - sequence record class #!/bin/perl -w use Modern::Perl; use Bio::Seq; my $seq_obj = Bio::Seq->new(-seq => "aaaatgggggggggggccccgtt", -display_id => "ABC12345", -desc => "example 1", -alphabet => "dna"); say $seq_obj->display_id; # ABC12345 say $seq_obj->desc; # example 1 say $seq_obj->seq; # aaaatgggggggggggccccgtt my $revcom = $seq_obj->revcom; # new Bio::Seq, but revcom say $revcom->seq; # aacggggcccccccccccatttt
  • 9. Sequence I/O ✤ Bio::SeqIO - sequence I/O stream classes (pluggable) #!/usr/bin/perl -w use Modern::Perl; use Bio::SeqIO; my ($infile, $outfile) = @ARGV; my $in = Bio::SeqIO->new(-file => $infile, -format => 'genbank'); my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta'); while (my $seq_obj = $in->next_seq) { say $seq_obj->display_id; $out->write_seq($seq_obj); }
  • 10. Sequence Features ✤ Bio::SeqFeature::Generic - generic SF implementation GenBank File use Modern::Perl; source 1..2629 use Bio::SeqIO; /organism="Enterococcus faecalis OG1RF" /mol_type="genomic DNA" my $in = Bio::SeqIO->new(-file => shift, /strain="OG1RF" -format => 'genbank'); /db_xref="taxon:474186" gene 25..>2629 while (my $seq_obj = $in->next_seq) { /gene="pyr operon" for my $feat_obj ($seq_obj->get_SeqFeatures) { /note="pyrimidine biosynthetic operon" say "Primary tag: ".$feat_obj->primary_tag; say "Location: ".$feat_obj->location->to_FTstring; Primary tag: source for my $tag ($feat_obj->get_all_tags) { Location: 1..2629 say " tag: $tag"; tag: db_xref for my $value ($feat_obj->get_tag_values($tag)) { value: taxon:474186 say " value: $value"; tag: mol_type } value: genomic DNA } tag: organism } value: Enterococcus faecalis OG1RF } tag: strain value: OG1RF
  • 11. Sequence Features ✤ Bio::SeqFeature::Generic - generic SF implementation GenBank File use Modern::Perl; source 1..2629 use Bio::SeqIO; /organism="Enterococcus faecalis OG1RF" /mol_type="genomic DNA" my $in = Bio::SeqIO->new(-file => shift, /strain="OG1RF" -format => 'genbank'); /db_xref="taxon:474186" gene 25..>2629 while (my $seq_obj = $in->next_seq) { /gene="pyr operon" for my $feat_obj ($seq_obj->get_SeqFeatures) { /note="pyrimidine biosynthetic operon" say "Primary tag: ".$feat_obj->primary_tag; say "Location: ".$feat_obj->location->to_FTstring; Primary tag: source for my $tag ($feat_obj->get_all_tags) { Location: 1..2629 say " tag: $tag"; tag: db_xref for my $value ($feat_obj->get_tag_values($tag)) { value: taxon:474186 say " value: $value"; tag: mol_type } value: genomic DNA } tag: organism } value: Enterococcus faecalis OG1RF } tag: strain value: OG1RF
  • 12. Sequence Features ✤ Bio::SeqFeature::Generic - generic SF implementation GenBank File use Modern::Perl; source 1..2629 use Bio::SeqIO; /organism="Enterococcus faecalis OG1RF" /mol_type="genomic DNA" my $in = Bio::SeqIO->new(-file => shift, /strain="OG1RF" -format => 'genbank'); /db_xref="taxon:474186" gene 25..>2629 while (my $seq_obj = $in->next_seq) { /gene="pyr operon" for my $feat_obj ($seq_obj->get_SeqFeatures) { /note="pyrimidine biosynthetic operon" say "Primary tag: ".$feat_obj->primary_tag; say "Location: ".$feat_obj->location->to_FTstring; Primary tag: source for my $tag ($feat_obj->get_all_tags) { Location: 1..2629 say " tag: $tag"; tag: db_xref for my $value ($feat_obj->get_tag_values($tag)) { value: taxon:474186 say " value: $value"; tag: mol_type } value: genomic DNA } tag: organism } value: Enterococcus faecalis OG1RF } tag: strain value: OG1RF
  • 13. Report Parsing Query= gi|1786183|gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia coli] (820 letters) Database: ecoli.aa 4289 sequences; 1,358,990 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogen... 1567 0.0 gb|AAC76922.1| (AE000468) aspartokinase II and homoserine dehydr... 332 1e-91 gb|AAC76994.1| (AE000475) aspartokinase III, lysine sensitive [E... 184 3e-47 gb|AAC73282.1| (AE000126) uridylate kinase [Escherichia coli] 42 3e-04 >gb|AAC73113.1| (AE000111) aspartokinase I, homoserine dehydrogenase I [Escherichia coli] Length = 820 Score = 1567 bits (4058), Expect = 0.0 Identities = 806/820 (98%), Positives = 806/820 (98%) Query: 1 MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA 60 MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA Sbjct: 1 MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA 60
  • 14. Report Parsing Query=gi|1786183|gb|AAC73113.1| ✤ Bio::SearchIO Hit=gb|AAC73113.1| #!/usr/bin/perl -w Length=820 Percent_id=98.2926829268293 use Modern::Perl; use Bio::SearchIO; Query=gi|1786183|gb|AAC73113.1| my $in = Bio::SearchIO->new(-format => 'blast', -file => 'ecoli.bls'); Hit=gb|AAC76922.1| Length=821 while( my $result = $in->next_result ) { Percent_id=29.5980511571255 while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { Query=gi|1786183|gb|AAC73113.1| say "Query=".$result->query_name; Hit=gb|AAC76994.1| say " Hit=".$hit->name; Length=471 say " Length=".$hsp->length('total'); say " Percent_id=".$hsp->percent_identity."n"; Percent_id=30.1486199575372 } } Query=gi|1786183|gb|AAC73113.1| } Hit=gb|AAC73282.1| Length=97 Percent_id=28.8659793814433
  • 15. Local/Remote Database Interfaces ✤ Bio::DB::GenBank #!/bin/perl -w use Modern::Perl; use Bio::DB::GenBank; my $db_obj = Bio::DB::GenBank->new; # query NCBI nuc db my $seq_obj = $db_obj->get_Seq_by_acc('A00002'); say $seq_obj->display_id; # A00002 say $seq_obj->length(); # 194 ✤ Also EntrezGene, GenPept, RefSeq, UniProt, EBI, etc.
  • 16. And Lots More! ✤ Bio::Align/IO ✤ Bio::Map/IO ✤ Bio::Assembly/IO ✤ Bio::Restriction/IO ✤ Bio::Tree/IO ✤ Bio::Structure/IO ✤ Local flatfile databases ✤ Bio::Factory ✤ Bio::Graphics ✤ Bio::Tools::Run (catch-all namespace) ✤ SeqFeature databases ✤ Bio::Factory (create objects) ✤ Bio::Pedigree/IO ✤ Bio::Range/Location ✤ Bio::Coordinate/IO
  • 18. Next-Gen Sequence ✤ Second-generation/next-generation sequencing ✤ This is Lincoln Stein ✤ There is a reason he is smiling...
  • 19. Next-Gen Sequence ✤ Bio-SamTools - support for SAM and BAM data (via SamTools) ✤ Bio-BigFile - support for BigWig/BigBed (via Jim Kent’s UCSC tools) ✤ Separate CPAN distributions ✤ GBrowse (Lincoln’s talk this afternoon), BioPerl ✤ Via SeqFeatures (high-level API for both modules) ✤ Via Bio::Assembly and BioPerl-Run (using the above modules)
  • 20. Data Courtesy R. Khetani, M. Hudson, G. Robinson
  • 21. New Tools/Wrappers ✤ BowTie ✤ Infernal v.1.0 ✤ BWA ✤ NCBI eUtils (SOAP, CGI-based) ✤ MAQ ✤ TopHat/CuffLinks (upcoming) ✤ BEDTools (beta) ✤ The Cloud - bioperl-max ✤ SAMTools Mark Jensen, ✤ HMMER3 Thomas Sharpton, Dave Messina, ✤ BLAST+ Kai Blin, ✤ PAML Dan Kortschak
  • 22. Collaborations Published online 16 December 2009 Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771 doi:10.1093/nar/gkp1137 SURVEY AND SUMMARY The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Peter J. A. Cock1,*, Christopher J. Fields2, Naohisa Goto3, Michael L. Heuer4 and Peter M. Rice5 1 Plant Pathology, SCRI, Invergowrie, Dundee DD2 5DA, UK, 2Institute for Genomic Biology, 1206 W. Gregory Drive, M/C 195, University of Illinois at Urbana-Champaign, IL 61801, USA, 3Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan, 4Harbinger Partners, Inc., 855 Village Center Drive, Suite 356, St. Paul, MN 55127, USA and 5EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Received October 13, 2009; Revised November 13, 2009; Accepted November 17, 2009 ABSTRACT of an explicit standard some parsers will fail to cope with very long ‘>’ title lines or very long sequences without FASTQ has emerged as a common file format for line wrapping. There is also no standardization for
  • 23. The Google Summer of Code ✤ O|B|F was accepted this year for the first time ✤ Headed by Rob Buels (SGN), with some help from Hilmar Lapp and myself ✤ Six projects, covering BioPerl, BioJava, Biopython, BioRuby
  • 24. The Google Summer of Code ✤ BioPerl has actually been part of the Google Summer of Code for the last three years (as have many other Bio*): ✤ NESCent - admin: H. Lapp: ✤ 2008 - PhyloXML parsing (student: Mira Han) ✤ 2009 - NeXML parsing (student: Chase Miller) ✤ O|B|F - admin: R. Buels: ✤ 2010 - Alignment subsystem refactoring (student: Jun Yin)
  • 25. GSoC - Alignment Subsystem ✤ Clean up current code ✤ Include capability of dealing with large datasets ✤ Target next-gen data, very large alignments? ✤ Abstract the backend (DB, memory, etc.) ✤ SAM/BAM may work (via Bio::DB::SAM) ✤ ...but what about protein sequences?
  • 26. Towards a Modern BioPerl
  • 27. Towards a Modern BioPerl ✤ BioPerl will be turning 15 soon ✤ What can we improve? ✤ What can we do with the current code? ✤ Maybe some that we can use in a BioPerl 2.0? ✤ Or a BioPerl 6?
  • 28. What We Can Do Now ✤ Lower the barrier ✤ Use Modern Perl ✤ Deal with the monolith
  • 29. Lower the Barrier ✤ We have already started on this - May 2010 ✤ Migrate source code repository to git and GitHub ✤ Original BioPerl developers are added as collaborators on GitHub... ✤ ...but now anyone can now ‘fork’ BioPerl, make changes, submit ‘pull requests’, etc. ✤ Since May, have had many forks, pull requests with code reviews (so a decent success)
  • 30. Using Modern Perl ✤ Minimal version of Perl required for BioPerl is v5.6.1 ✤ Even v5.8.1 is considered quite old ✤ Both the 5.6.x and 5.8.x releases are EOL (as of Dec. 2008)
  • 31. Using Modern Perl ✤ Minimal version of Perl required for BioPerl is v5.6.1 ✤ Even v5.8.1 is considered quite old ✤ Both the 5.6.x and 5.8.x releases are EOL (as of Dec. 2008)
  • 32. Using Modern Perl say defined-or print "I like newlinesn"; # work only if false && defined $foo ||= 'default'; say "I like newlines"; if (!defined($foo)) { $foo = 'default' yada yada } $foo //= 'default'; sub implement_me { shift->throw_not_implemented } sub implement_me { ... } # yada yada
  • 33. Using Modern Perl Smart Match given/when if ($key ~~ %hash) { # like exists given ($foo) { # do something when (%lookup) { ... } } when (/^(d+)/) { ... } when (/^[A-Za-z]+/) { ... } if ($foo ~~ /d+/ ) { # like =~ default { ... } # do something } }
  • 34. Dealing with the Monolith ✤ Release manager nightmares: ✤ Remote databases disappear (XEMBL) ✤ Others change service or URLs (SeqHound) ✤ Services become obsolete (Pise) ✤ Developers move on, disappear, modules bit-rot (not saying :) ✤ How do we solve this problem?
  • 35. Dealing with the Monolith Classes Tests (Files) bioperl-live 874 23146 (341) (Core) bioperl-run 123* 2468 (80) bioperl-db 72 113 (16) bioperl-network 9 327 (9) * Had 285 more prior to Pise module removal!
  • 36. Dealing with the Monolith ✤ Maybe we shouldn’t be friendly to the monolith ✤ Maybe we should ‘blow it up’ ✤ (Of course, that means make the code modular) ✤ It was originally designed with that somewhat in mind (interfaces)
  • 37. Dealing with the Monolith ✤ Separate distributions make it easier to submit fixes as needed ✤ However, separate distributions make developing a little trickier ✤ Can we create a distribution that resembles BioPerl as users know it? ✤ Is this something we should worry about? ✤ YES ✤ Don’t alienate end-users!
  • 38. Towards BioPerl 2.0? ✤ Biome: BioPerl with Moose ✤ BioPerl6: self-explanatory
  • 39. Biome ✤ BioPerl classes implemented in Moose ✤ GitHub: http://github.com/cjfields/biome ✤ Implemented: Ranges, Locations, simple PrimarySeq, Annotation, SeqFeatures, prototype SeqIO ✤ Interfaces converted to Moose Roles ✤ ‘Type’-checking used for data types
  • 40. Role package Biome::Role::Range; Attributes use Biome::Role; use Biome::Types qw(SequenceStrand); requires 'to_string'; Class package Biome::Range; has strand => ( isa => SequenceStrand, use Biome; is => 'rw', default => 0, with 'Biome::Role::Range'; coerce => 1 ); sub to_string { my ($self) = @_; has start => ( return sprintf("(%s, %s) strand=%s", is => 'rw', $self->start, isa => 'Int', $self->end, ); $self->strand); } has end => ( is => 'rw', isa => 'Int' ); sub length { $_[0]->end - $_[0]->start + 1; }
  • 41. BioPerl 6 ✤ BioPerl6: http://github.com/cjfields/bioperl6 ✤ Little has been done beyond simple implementations ✤ Code is open to anyone for experimentation ✤ Ex: Philip Mabon donated a FASTA grammar:
  • 42. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { token TOP { ^<fasta>+ $ } token fasta { <description_line> <sequence> } token description_line { ^^> <id> <.ws> <description> n } token id { | <identifier> | <generic_id> } token identifier { S+ } token generic_id { S+ } token description { N+ } token sequence { <-[>]>+ } }
  • 43. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 44. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 45. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 46. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 47. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 48. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 49. Grammar (FASTA) Actions (FASTA) grammar Bio::Grammar::Fasta { class Bio::Grammar::Actions::Fasta { token TOP { method TOP($/){ ^<fasta>+ $ my @matches = gather for $/<fasta> -> $m { take $m.ast; } }; token fasta { <description_line> <sequence> make @matches; } } method fasta($/){ token description_line { my $id =$/<description_line>.ast<id>; ^^> <id> <.ws> <description> n my $desc = $/<description_line>.ast<description>; } my $obj = Bio::PrimarySeq.new( token id { display_id => $id, | <identifier> description => $desc, | <generic_id> seq => $/<sequence>.ast); } make $obj; token identifier { } S+ method description_line($/){ } make $/; token generic_id { } S+ method id($/) { } make $/; } token description { method description($/){ N+ make $/; } } token sequence { method sequence($/){ <-[>]>+ make (~$/).subst("n", '', :g); } } } }
  • 50. Acknowledgements ✤ All BioPerl developers ✤ Chris Dagdigian and Mauricio Herrera Cuadra (O|B|F gurus) ✤ Cross-Collaborative work: Peter Cock (Biopython), Pjotr Prins (BioLib, BioRuby), Naohisa Goto (BioRuby), Michael Heuer and Andreas Prlic (BioJava), Peter Rice (EMBOSS) ✤ Questions? Do we even have time?