SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
BioRuby
                                 Project Update
Raoul J.P. Bonnal                                                             co-authors:
                                                                                              Toshiaki Katayama
r@bioruby.org                                                                                          Pjotr Prins
Life Science Informatics                                                                       Mitsuteru Nakao
Integrative Biology Program
Fondazione INGM
                                                                                             Christian M Zmasek
Italy                                                                                               Nahoisa Goto
                              11th Annual Bioinformatic Open Source Conference (BOSC) 2010

                                              Boston, Massachusetts, USA
Introduction


BioRuby - bioinformatics library for Ruby language
• Object oriented scripting language, functional and reflective
• has become popular by "Ruby on Rails“
• created by Matz in 1993 in Japan
BioRuby & Platforms



                    Ruby Interpreter
     Performances                       Portability
Ruby                                      JRuby
RubyEE                                        Java libraries


              gem install bio
                    Operating Systems
BioRuby & Platforms
BioLib




                             Ruby Interpreter
              Performances                       Portability
         Ruby                                      JRuby
         RubyEE                                        Java libraries


                       gem install bio
                             Operating Systems
BioRuby & Platforms                              Cytoscape




                    Ruby Interpreter
     Performances                       Portability
Ruby                                      JRuby
RubyEE                                        Java libraries


              gem install bio
                    Operating Systems
History
    2008                   2009                                            2010

           WebServices              Workflows                                        SemanticWeb
                                                                                                             Code fest
                                        1.3.0                      1.4.0
                                                    1.3.1                                                    BOSC




               ---                                   GSoC                                GSoC
               +++ git
                                                     •phyloXML                      •Ruby 1.9.2
                                                                                    •NeXML I/O, RDF triples
                                                                                    •Infer gene duplications


GitHub:                                         GSoC references:
    http://github.com/bioruby/bioruby               Ruby 1.9.2 support of BioRuby (OBF)
                                                    Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent)
                                                    Implementation of algorithm to infer gene duplications in BioRuby (OBF)
                                                    Implementing phyloXML support in BioRuby (NESCent)
BioRuby Features

Category          Modules
Object Sequence   pathway, tree, bibliography reference
Sequence          translation, alignment, location,mapping, feature table, molecular
Manipulation      weight, design siRNA, restriction enzyme

Format            GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTQ, GFF,
                  MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology

Tool              BLAST, FASTA, EMBOSS, HMMER, InterProScan,GenScan, BLAT, Sim4,
                  Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons
Phylogeny         PHYLIP, PAML, phyloXML, NEXUS, Newick

Web Service       NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM

ODBA              BioSQL, BioFetch, indexed flat files

Shell             Interactive environment for rapid Bioinformatics analyses
Relevant New                 Features1


Bio::SQL Interoperable storage of sequences -Raoul Bonnal-
  require ‘bio’
  #active_record (ORM)
  #your_database_adapter (MYSQL, Postgresql,JDBC)
  connection =
  Bio::SQL.establish_connection({‘development=>{‘hostname=>you_host_name,
                                              ‘database’=> ‘CoolBioSeqDB’,
                                              ‘adapter’=> ‘jdbcmysql’
                                              ‘username’=> ‘Raoul’,
                                              ‘password’=> ‘SmartPassword’},
                            ‘development’)
  #read a GenBank file and store:
  my_sotrage = Bio::SQL::Biodatabase.find(:first)
  genbank = Bio::GenBank.open(‘dbvrl1.gb’)
  genbank.each_entry do |gb|
    Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence,
                                :biodatabase=>my_sotrage)
  end

  #fetch an accession is easy
  Bio::SQL.fetch_accession(your_accession).to_biosequence.output(:embl)
Relevant New                     Features2


Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek-
  require ‘bio’ # libxml-ruby

  #Create a parser
  phyloxml = Bio::PhyloXML::Parser.new(‘example.xml’)

  #Consume the tree
  phyloxml.each do |tree|
    puts tree.name
  end
  #Wrinting
  writer = Bio::PhyloXML::Writer.new(‘my_tree.xml’)
  write.writer(tree2)

  #Extract information
  phyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy_mollusca.xml’)
  phyloxml.each do |tree|
    tree.each_nome do |node|
      print ‘Scientific name: ‘, node.taxonomies[0].scientific_name,‘n’
    end
  end                               Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for
                                    evolutionary biology and
                                    comparative genomics. BMC Bioinformatics, 10, 356.
Relevant New                     Features3


Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto-
  require ‘bio’
  ff_fasta = Bio::FlatFile.open(filename.fasta)
  ff_qual = Bio::FlatFile.open(filename.qual)

  while entry_fasta = ff_fasta.next_entry
    seq = entry_fasta.to_biosequence
    seq.quality_score_type = :phred
    seq.quality_scores = ff_qual.next_entry.data
    puts seq.output(:fastq,
                    :title => entry_fasta.definition)
  end

   ●   Format supported: SOLEXA, ILLUMINA




                                            Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P.
                                            M. (2010). The Sanger
                                            FASTQ file format for sequences with quality scores, and
                                            the Solexa/Illumina
                                            FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.
Relevant New               Features4



Bio::NCBI::REST example
  require ‘bio’
  ncbi = Bio::NCBI::REST::ESearch.new
  ncbi.search("nucleotide", "tardigrada")
  ncbi.count("nucleotide", "tardigrada")
  ncbi.nucleotide("tardigrada")
  ncbi.taxonomy("tardigrada")
  ncbi.pubmed("tardigrada", "reldate" => 365)
  ncbi.pubmed("mammoth mitochondrial genome")


Bio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGG
  require ‘bio’
  t = Bio::TogoWS::REST.new
  puts t.entry('genbank', 'AF237819')
  puts t.search('uniprot', 'lung cancer')
BioRuby is Agile
●   OpenBio* developers are the Stakeholders
    ●    Speed up in the iteration proccess
    ●    Frequent meetings (mail, skype/voice chat, irc)
●   Test Everything (required for new features)
     –   Improve quality , maintainability and guarantee portability
     –   Ruby Unit Testing Framework , Rspec
●   GitHub
    ●    Low barries for new developers
    ●    32 forks and 100 people watching us


                                                                Agile Manifesto
Moving to Agile Programming
2500



2000



1500

                                                                 Tests
1000                                                             Tutorial's lines



500



   0
       1.0.0     1.1.0   1.2.0   1.2.1   1.3.0   1.3.1   1.4.0
Refactoring
3500


3000


2500


2000                                                           Files
                                                               Classes
1500                                                           Modules
                                                               Methods
1000


 500


   0
       1.0.0   1.1.0   1.2.0   1.2.1   1.3.0   1.3.1   1.4.0
Ongoing Work
●   Semantic Web (started @ BioHackathon 2010)
    ●   Expose data in RDF
    ●   Consuming SPARQL end points efficiently
●   Ruby 1.9.2 support of BioRuby ( GSoC & OBF)
    ●   Improved performances
●   Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC &
    NESCent)
●   Implementation of algorithm to infer gene duplications in BioRuby
    (GSoC & OBF)
PlugIn system
●   We want a BioRuby core stable on every OS
    ●   But… we want to use experimental code ASAP
    ●   BioRuby + BioRuby Plugin + Rails we can have multiple
        applications with an unique core and specific features
        –   User or Application
●   Suggest Guidelines for plugin namespace
    ●   On GitHub you can find our plugins looking for
        bioruby-plugin-NAME
PlugIn system
The plugin system will be delivered with the next
  BioRuby release
BioGraphics – Jan Aerts-
For biologists:
bioruby --plugin install graphics

For geeks:
bioruby --plugin install git://github.com/user/repo.git




It’s very experimental
What We Need



●   Better integration with R
●   Better support for data visualization (interpretation)
●   Detailed Roadmap
Publications
BioRuby: Bioinformatics software for the Ruby programming language (submitted)
    Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama

The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and
   workflows (accepted)
   Toshiaki Katayama et all.

Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010)
    TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids
    Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010)

Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010).
   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
   Nucleic Acids Res, 38(6), 1767.1771.



Over 24 articles use BioRuby as in their analyses, check the up to date list:
   http://bioruby.open-bio.org/wiki/Research_using_BioRuby
Acknoledgments
●   BioRuby Team
                                       Open Bioinformatics Foundation
    ●   Toshiaki Katayama*
    ●   Naoshita Goto*
    ●   Pjotr Prins*                   Database Center for Life Science
    ●   Mitsuteru Nakao*
    ●   Jan Aerts*
    ●   Christian M Zmasek*
                                       Google Summer of Code
    ●   All GSoC students


                                       NESCent
                                       National Evolutionary Synthesis Center



* co-author

Mais conteúdo relacionado

Destaque

Chap012 the one to-one media
Chap012 the one to-one mediaChap012 the one to-one media
Chap012 the one to-one media
Hee Young Shin
 
Segway pt se and seg solutions launch webinar -- united states (3-24-14)
Segway pt se and seg solutions launch webinar  -- united states (3-24-14)Segway pt se and seg solutions launch webinar  -- united states (3-24-14)
Segway pt se and seg solutions launch webinar -- united states (3-24-14)
Mark Vena
 
Knjiga evidencije se kci ja
Knjiga evidencije se kci jaKnjiga evidencije se kci ja
Knjiga evidencije se kci ja
zaDruga
 
Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015
Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015
Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015
Yudi Lesmana
 
WeonTV at the EuroITV 2009
WeonTV at the EuroITV 2009WeonTV at the EuroITV 2009
WeonTV at the EuroITV 2009
Social iTV
 
안드로이드스터디 2
안드로이드스터디 2안드로이드스터디 2
안드로이드스터디 2
jangpd007
 

Destaque (20)

Chap012 the one to-one media
Chap012 the one to-one mediaChap012 the one to-one media
Chap012 the one to-one media
 
Responding to Climate Change at the Local Level
Responding to Climate Change at the Local LevelResponding to Climate Change at the Local Level
Responding to Climate Change at the Local Level
 
Foto loca
Foto locaFoto loca
Foto loca
 
Leadership life
Leadership life Leadership life
Leadership life
 
An introduction to social media by Thom Corah
An introduction to social media by Thom CorahAn introduction to social media by Thom Corah
An introduction to social media by Thom Corah
 
Segway pt se and seg solutions launch webinar -- united states (3-24-14)
Segway pt se and seg solutions launch webinar  -- united states (3-24-14)Segway pt se and seg solutions launch webinar  -- united states (3-24-14)
Segway pt se and seg solutions launch webinar -- united states (3-24-14)
 
Tiga perwakilan indonesia bertanding di hongkong open memory championship
Tiga perwakilan indonesia bertanding  di hongkong open memory championshipTiga perwakilan indonesia bertanding  di hongkong open memory championship
Tiga perwakilan indonesia bertanding di hongkong open memory championship
 
Firmalar için mobil eğitim
Firmalar için mobil eğitimFirmalar için mobil eğitim
Firmalar için mobil eğitim
 
Big data
Big data Big data
Big data
 
Русская бизнес-профессионалов
Русская бизнес-профессионаловРусская бизнес-профессионалов
Русская бизнес-профессионалов
 
_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation
 
Knjiga evidencije se kci ja
Knjiga evidencije se kci jaKnjiga evidencije se kci ja
Knjiga evidencije se kci ja
 
Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015
Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015
Workshop melejitkan potensi daya ingat level 1 dan 2 Oktober 2015
 
WeonTV at the EuroITV 2009
WeonTV at the EuroITV 2009WeonTV at the EuroITV 2009
WeonTV at the EuroITV 2009
 
Job safety centara 21 06 10
Job safety centara 21 06 10Job safety centara 21 06 10
Job safety centara 21 06 10
 
안드로이드스터디 2
안드로이드스터디 2안드로이드스터디 2
안드로이드스터디 2
 
Презентация препарата bio-rost.com
Презентация препарата bio-rost.comПрезентация препарата bio-rost.com
Презентация препарата bio-rost.com
 
suiza
suizasuiza
suiza
 
Social media voor een tankstation
Social media voor een tankstationSocial media voor een tankstation
Social media voor een tankstation
 
Cau kien 36 70
Cau kien 36 70Cau kien 36 70
Cau kien 36 70
 

Semelhante a Bonnal bosc2010 bio_ruby

Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
bosc
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
Andrei Savu
 
Java Introductie
Java IntroductieJava Introductie
Java Introductie
mbruggen
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
bosc
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
EBI
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
Mark Jensen
 

Semelhante a Bonnal bosc2010 bio_ruby (20)

Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 
The secret of programming language development and future
The secret of programming  language development and futureThe secret of programming  language development and future
The secret of programming language development and future
 
Java Introductie
Java IntroductieJava Introductie
Java Introductie
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
 
Biopython programming workshop at UGA
Biopython programming workshop at UGABiopython programming workshop at UGA
Biopython programming workshop at UGA
 
The Ruby UCSC API @ISMB2012
The Ruby UCSC API  @ISMB2012The Ruby UCSC API  @ISMB2012
The Ruby UCSC API @ISMB2012
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
A Prlic - BioJava update
A Prlic - BioJava updateA Prlic - BioJava update
A Prlic - BioJava update
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
20120907 microbiome-intro
20120907 microbiome-intro20120907 microbiome-intro
20120907 microbiome-intro
 

Mais de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
BOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
BOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
BOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
BOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
BOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
BOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
BOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
BOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
BOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
BOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
BOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
BOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
BOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
BOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
BOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
BOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
BOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
BOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
BOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
BOSC 2010
 

Mais de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Bonnal bosc2010 bio_ruby

  • 1. BioRuby Project Update Raoul J.P. Bonnal co-authors: Toshiaki Katayama r@bioruby.org Pjotr Prins Life Science Informatics Mitsuteru Nakao Integrative Biology Program Fondazione INGM Christian M Zmasek Italy Nahoisa Goto 11th Annual Bioinformatic Open Source Conference (BOSC) 2010 Boston, Massachusetts, USA
  • 2. Introduction BioRuby - bioinformatics library for Ruby language • Object oriented scripting language, functional and reflective • has become popular by "Ruby on Rails“ • created by Matz in 1993 in Japan
  • 3. BioRuby & Platforms Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  • 4. BioRuby & Platforms BioLib Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  • 5. BioRuby & Platforms Cytoscape Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  • 6. History 2008 2009 2010 WebServices Workflows SemanticWeb Code fest 1.3.0 1.4.0 1.3.1 BOSC --- GSoC GSoC +++ git •phyloXML •Ruby 1.9.2 •NeXML I/O, RDF triples •Infer gene duplications GitHub: GSoC references: http://github.com/bioruby/bioruby Ruby 1.9.2 support of BioRuby (OBF) Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent) Implementation of algorithm to infer gene duplications in BioRuby (OBF) Implementing phyloXML support in BioRuby (NESCent)
  • 7. BioRuby Features Category Modules Object Sequence pathway, tree, bibliography reference Sequence translation, alignment, location,mapping, feature table, molecular Manipulation weight, design siRNA, restriction enzyme Format GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTQ, GFF, MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology Tool BLAST, FASTA, EMBOSS, HMMER, InterProScan,GenScan, BLAT, Sim4, Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons Phylogeny PHYLIP, PAML, phyloXML, NEXUS, Newick Web Service NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM ODBA BioSQL, BioFetch, indexed flat files Shell Interactive environment for rapid Bioinformatics analyses
  • 8. Relevant New Features1 Bio::SQL Interoperable storage of sequences -Raoul Bonnal- require ‘bio’ #active_record (ORM) #your_database_adapter (MYSQL, Postgresql,JDBC) connection = Bio::SQL.establish_connection({‘development=>{‘hostname=>you_host_name, ‘database’=> ‘CoolBioSeqDB’, ‘adapter’=> ‘jdbcmysql’ ‘username’=> ‘Raoul’, ‘password’=> ‘SmartPassword’}, ‘development’) #read a GenBank file and store: my_sotrage = Bio::SQL::Biodatabase.find(:first) genbank = Bio::GenBank.open(‘dbvrl1.gb’) genbank.each_entry do |gb| Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence, :biodatabase=>my_sotrage) end #fetch an accession is easy Bio::SQL.fetch_accession(your_accession).to_biosequence.output(:embl)
  • 9. Relevant New Features2 Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek- require ‘bio’ # libxml-ruby #Create a parser phyloxml = Bio::PhyloXML::Parser.new(‘example.xml’) #Consume the tree phyloxml.each do |tree| puts tree.name end #Wrinting writer = Bio::PhyloXML::Writer.new(‘my_tree.xml’) write.writer(tree2) #Extract information phyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy_mollusca.xml’) phyloxml.each do |tree| tree.each_nome do |node| print ‘Scientific name: ‘, node.taxonomies[0].scientific_name,‘n’ end end Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10, 356.
  • 10. Relevant New Features3 Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto- require ‘bio’ ff_fasta = Bio::FlatFile.open(filename.fasta) ff_qual = Bio::FlatFile.open(filename.qual) while entry_fasta = ff_fasta.next_entry seq = entry_fasta.to_biosequence seq.quality_score_type = :phred seq.quality_scores = ff_qual.next_entry.data puts seq.output(:fastq, :title => entry_fasta.definition) end ● Format supported: SOLEXA, ILLUMINA Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.
  • 11. Relevant New Features4 Bio::NCBI::REST example require ‘bio’ ncbi = Bio::NCBI::REST::ESearch.new ncbi.search("nucleotide", "tardigrada") ncbi.count("nucleotide", "tardigrada") ncbi.nucleotide("tardigrada") ncbi.taxonomy("tardigrada") ncbi.pubmed("tardigrada", "reldate" => 365) ncbi.pubmed("mammoth mitochondrial genome") Bio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGG require ‘bio’ t = Bio::TogoWS::REST.new puts t.entry('genbank', 'AF237819') puts t.search('uniprot', 'lung cancer')
  • 12. BioRuby is Agile ● OpenBio* developers are the Stakeholders ● Speed up in the iteration proccess ● Frequent meetings (mail, skype/voice chat, irc) ● Test Everything (required for new features) – Improve quality , maintainability and guarantee portability – Ruby Unit Testing Framework , Rspec ● GitHub ● Low barries for new developers ● 32 forks and 100 people watching us Agile Manifesto
  • 13. Moving to Agile Programming 2500 2000 1500 Tests 1000 Tutorial's lines 500 0 1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
  • 14. Refactoring 3500 3000 2500 2000 Files Classes 1500 Modules Methods 1000 500 0 1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
  • 15. Ongoing Work ● Semantic Web (started @ BioHackathon 2010) ● Expose data in RDF ● Consuming SPARQL end points efficiently ● Ruby 1.9.2 support of BioRuby ( GSoC & OBF) ● Improved performances ● Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC & NESCent) ● Implementation of algorithm to infer gene duplications in BioRuby (GSoC & OBF)
  • 16. PlugIn system ● We want a BioRuby core stable on every OS ● But… we want to use experimental code ASAP ● BioRuby + BioRuby Plugin + Rails we can have multiple applications with an unique core and specific features – User or Application ● Suggest Guidelines for plugin namespace ● On GitHub you can find our plugins looking for bioruby-plugin-NAME
  • 17. PlugIn system The plugin system will be delivered with the next BioRuby release BioGraphics – Jan Aerts- For biologists: bioruby --plugin install graphics For geeks: bioruby --plugin install git://github.com/user/repo.git It’s very experimental
  • 18. What We Need ● Better integration with R ● Better support for data visualization (interpretation) ● Detailed Roadmap
  • 19. Publications BioRuby: Bioinformatics software for the Ruby programming language (submitted) Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows (accepted) Toshiaki Katayama et all. Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010) TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010) Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771. Over 24 articles use BioRuby as in their analyses, check the up to date list: http://bioruby.open-bio.org/wiki/Research_using_BioRuby
  • 20. Acknoledgments ● BioRuby Team Open Bioinformatics Foundation ● Toshiaki Katayama* ● Naoshita Goto* ● Pjotr Prins* Database Center for Life Science ● Mitsuteru Nakao* ● Jan Aerts* ● Christian M Zmasek* Google Summer of Code ● All GSoC students NESCent National Evolutionary Synthesis Center * co-author