SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Bio.Phylo
A unified phylogenetics toolkit for Biopython


                Eric Talevich

            Institute of Bioinformatics
              University of Georgia


               June 29, 2010
Abstract


       Bio.Phylo is a new phylogenetics library for:

• Exploring, modifying and annotating trees
• Reading & writing standard file formats
• Quick visualization
• Gluing together computational pipelines



                 Availability: Biopython 1.54
A quick survey of file formats

   Newick (a.k.a. New Hampshire) is a simple nested-parens
          format:    (A, (B, C), (D, E))
             • Extended & tweaked, led to NHX (and parsing
               problems)

   Nexus is a collection of formats, including Newick trees
             • More than just tree data. . . still tough to parse

PhyloXML is an XML-based replacement for NHX
             • Annotations formalized as XML elements;
               extensible with user-defined element types

  NeXML is an XML-based successor to Nexus
             • Ontology-based — key-value assignments have
               semantic meaning
Demo: What’s in a tree?




1. Read a simple Newick file
                              4. Promote to a PhyloXML tree
2. Inspect through IPython
                              5. Set branch colors
3. Draw with
                              6. Write a PhyloXML file
   PyLab/matplotlib
# In a terminal, make a simple Newick file
# Then launch the IPython interpreter and read the file


% cat > simple.dnd <<EOF
> (((A,B),(C,D)),(E,F,G))
> EOF

% ipython -pylab
>>> from Bio import Phylo
>>> tree = Phylo.read(’simple.dnd’, ’newick’)
# String representation shows the object structure

>>> print tree

Tree(weight=1.0, rooted=False, name=’’)
    Clade(branch_length=1.0)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’A’)
                Clade(branch_length=1.0, name=’B’)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’C’)
                Clade(branch_length=1.0, name=’D’)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0, name=’E’)
            Clade(branch_length=1.0, name=’F’)
            Clade(branch_length=1.0, name=’G’)
# Draw an ASCII-art dendrogram

>>> Phylo.draw_ascii(tree, column_width=52)

                                  ______________   A
                  ______________|
                 |               |______________   B
   ______________|
 |               |                ______________   C
 |               |______________|
_|                               |______________   D
 |
 |                 ______________ E
 |               |
 |______________|______________ F
                 |
                 |______________ G
>>> tree.rooted = True
>>> Phylo.draw graphiz(tree)

                                   D
              A


                                           C



       B

                                       G

                  E
                               F
# Promote a basic tree to PhyloXML
>>> from Bio.Phylo.PhyloXML import Phylogeny
>>> phy = Phylogeny.from_tree(tree)
>>> print phy

Phylogeny(rooted=True, name=’’)
    Clade(branch_length=1.0)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’A’)
                Clade(branch_length=1.0, name=’B’)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’C’)
                Clade(branch_length=1.0, name=’D’)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0, name=’E’)
            Clade(branch_length=1.0, name=’F’)
            Clade(branch_length=1.0, name=’G’)
Branch color
>>> phy.root.color = (128, 128, 128)
Or:
>>> phy.root.color = ’#808080’
Or:
>>> phy.root.color = ’gray’

Find clades by attribute values:
>>> mrca = phy.common ancestor({’name’:’E’},
                                 {’name’:’F’})
>>> mrca.color = ’salmon’

Directly index a clade:
>>> phy.clade[0,1].color = ’blue’

>>> Phylo.draw graphviz(phy, prog=’neato’)
D               B


C                       A




        G       F

            E
# Save the color annotations in phyloXML

>>> Phylo.write(phy, ’simple-color.xml’, ’phyloxml’)

<phy:phyloxml xmlns:phy="http://www.phyloxml.org">
  <phylogeny rooted="true">
    <clade>
        <branch_length>1.0</branch_length>
        <color>
            <red>128</red>
            <green>128</green>
            <blue>128</blue>
        </color>
        <clade>
            <branch_length>1.0</branch_length>
            <clade>
                 <branch_length>1.0</branch_length>
                 <clade>
                     <name>A</name>
                     ...
Thanks


Holla:
  • Brad Chapman and Christian Zmasek, GSoC 2009 mentors
  • The Biopython developers, feat. Peter J. A. Cock,
    Frank Kauff & Cymon J. Cox
  • Hilmar Lapp & the NESCent Phyloinformatics program
  • Google’s Open Source Programs Office
  • My professor, Dr. Natarajan Kannan
  • Developers like you
Q&A



• Which 3rd-party applications should we wrap in
  Bio.Phylo.Applications? (e.g. RAxML, MrBayes)
• Which other libraries should we support interoperability with?
  (PyCogent, ape)
• What other algorithms are simple, stable and relevant?
  (Consensus, rooting)
• Features for systematics? (Geography, PopGen integration?)
Extra: Tree methods
>>> dir(tree)

collapse                      get terminals
collapse all                  is bifurcating
common ancestor               is monophyletic
count terminals               is parent of
depths                        is preterminal
distance                      ladderize
find any                      prune
find clades                   split
find elements                 total branch length
get nonterminals              trace
get path

   See: http://biopython.org/DIST/docs/api/Bio.Phylo.
             BaseTree.TreeMixin-class.html
Extra: The Bio.Phylo class hierarchy




Figure: Inheritance relationship among the core classes
Extra: PhyloXML classes

 $ pydoc Bio.Phylo.PhyloXML

Accession              Date                 Point
Alphabet               Distribution         Polygon
Annotation             DomainArchitecture   Property
BaseTree               Events               ProteinDomain
BinaryCharacters       Id                   Reference
BranchColor            MolSeq               Sequence
Clade                  Other                SequenceRelation
CladeRelation          Phylogeny            Taxonomy
Confidence              Phyloxml             Uri


            See: http://biopython.org/wiki/PhyloXML

Mais conteúdo relacionado

Semelhante a Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)

Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Paul Richards
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaRoderic Page
 
Package-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsPackage-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsJie Bao
 
Bioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekingeBioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Representing and Reasoning with Modular Ontologies
Representing and Reasoning with Modular OntologiesRepresenting and Reasoning with Modular Ontologies
Representing and Reasoning with Modular OntologiesJie Bao
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in RKlaus Schliep
 
Querying XML: XPath and XQuery
Querying XML: XPath and XQueryQuerying XML: XPath and XQuery
Querying XML: XPath and XQueryKatrien Verbert
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxCS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxannettsparrow
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Chris Freeland
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2sadhana312471
 
Perl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PBPerl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PBtutorialsruby
 

Semelhante a Bio.Phylo: Phylogenetics in Biopython (BOSC 2010) (20)

Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
PYTHON 101.pptx
PYTHON 101.pptxPYTHON 101.pptx
PYTHON 101.pptx
 
Package-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsPackage-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary Results
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 
Uncovering Library Features from API Usage on Stack Overflow
Uncovering Library Features from API Usage on Stack OverflowUncovering Library Features from API Usage on Stack Overflow
Uncovering Library Features from API Usage on Stack Overflow
 
Bioinformatica p6-bioperl
Bioinformatica p6-bioperlBioinformatica p6-bioperl
Bioinformatica p6-bioperl
 
Bioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekingeBioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekinge
 
Representing and Reasoning with Modular Ontologies
Representing and Reasoning with Modular OntologiesRepresenting and Reasoning with Modular Ontologies
Representing and Reasoning with Modular Ontologies
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
Querying XML: XPath and XQuery
Querying XML: XPath and XQueryQuerying XML: XPath and XQuery
Querying XML: XPath and XQuery
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxCS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
 
i18n and L10n in TYPO3 Flow
i18n and L10n in TYPO3 Flowi18n and L10n in TYPO3 Flow
i18n and L10n in TYPO3 Flow
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2
 
Perl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PBPerl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PB
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)

  • 1. Bio.Phylo A unified phylogenetics toolkit for Biopython Eric Talevich Institute of Bioinformatics University of Georgia June 29, 2010
  • 2. Abstract Bio.Phylo is a new phylogenetics library for: • Exploring, modifying and annotating trees • Reading & writing standard file formats • Quick visualization • Gluing together computational pipelines Availability: Biopython 1.54
  • 3. A quick survey of file formats Newick (a.k.a. New Hampshire) is a simple nested-parens format: (A, (B, C), (D, E)) • Extended & tweaked, led to NHX (and parsing problems) Nexus is a collection of formats, including Newick trees • More than just tree data. . . still tough to parse PhyloXML is an XML-based replacement for NHX • Annotations formalized as XML elements; extensible with user-defined element types NeXML is an XML-based successor to Nexus • Ontology-based — key-value assignments have semantic meaning
  • 4. Demo: What’s in a tree? 1. Read a simple Newick file 4. Promote to a PhyloXML tree 2. Inspect through IPython 5. Set branch colors 3. Draw with 6. Write a PhyloXML file PyLab/matplotlib
  • 5. # In a terminal, make a simple Newick file # Then launch the IPython interpreter and read the file % cat > simple.dnd <<EOF > (((A,B),(C,D)),(E,F,G)) > EOF % ipython -pylab >>> from Bio import Phylo >>> tree = Phylo.read(’simple.dnd’, ’newick’)
  • 6. # String representation shows the object structure >>> print tree Tree(weight=1.0, rooted=False, name=’’) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’A’) Clade(branch_length=1.0, name=’B’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’C’) Clade(branch_length=1.0, name=’D’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’E’) Clade(branch_length=1.0, name=’F’) Clade(branch_length=1.0, name=’G’)
  • 7. # Draw an ASCII-art dendrogram >>> Phylo.draw_ascii(tree, column_width=52) ______________ A ______________| | |______________ B ______________| | | ______________ C | |______________| _| |______________ D | | ______________ E | | |______________|______________ F | |______________ G
  • 8. >>> tree.rooted = True >>> Phylo.draw graphiz(tree) D A C B G E F
  • 9. # Promote a basic tree to PhyloXML >>> from Bio.Phylo.PhyloXML import Phylogeny >>> phy = Phylogeny.from_tree(tree) >>> print phy Phylogeny(rooted=True, name=’’) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’A’) Clade(branch_length=1.0, name=’B’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’C’) Clade(branch_length=1.0, name=’D’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’E’) Clade(branch_length=1.0, name=’F’) Clade(branch_length=1.0, name=’G’)
  • 10. Branch color >>> phy.root.color = (128, 128, 128) Or: >>> phy.root.color = ’#808080’ Or: >>> phy.root.color = ’gray’ Find clades by attribute values: >>> mrca = phy.common ancestor({’name’:’E’}, {’name’:’F’}) >>> mrca.color = ’salmon’ Directly index a clade: >>> phy.clade[0,1].color = ’blue’ >>> Phylo.draw graphviz(phy, prog=’neato’)
  • 11. D B C A G F E
  • 12. # Save the color annotations in phyloXML >>> Phylo.write(phy, ’simple-color.xml’, ’phyloxml’) <phy:phyloxml xmlns:phy="http://www.phyloxml.org"> <phylogeny rooted="true"> <clade> <branch_length>1.0</branch_length> <color> <red>128</red> <green>128</green> <blue>128</blue> </color> <clade> <branch_length>1.0</branch_length> <clade> <branch_length>1.0</branch_length> <clade> <name>A</name> ...
  • 13. Thanks Holla: • Brad Chapman and Christian Zmasek, GSoC 2009 mentors • The Biopython developers, feat. Peter J. A. Cock, Frank Kauff & Cymon J. Cox • Hilmar Lapp & the NESCent Phyloinformatics program • Google’s Open Source Programs Office • My professor, Dr. Natarajan Kannan • Developers like you
  • 14. Q&A • Which 3rd-party applications should we wrap in Bio.Phylo.Applications? (e.g. RAxML, MrBayes) • Which other libraries should we support interoperability with? (PyCogent, ape) • What other algorithms are simple, stable and relevant? (Consensus, rooting) • Features for systematics? (Geography, PopGen integration?)
  • 15. Extra: Tree methods >>> dir(tree) collapse get terminals collapse all is bifurcating common ancestor is monophyletic count terminals is parent of depths is preterminal distance ladderize find any prune find clades split find elements total branch length get nonterminals trace get path See: http://biopython.org/DIST/docs/api/Bio.Phylo. BaseTree.TreeMixin-class.html
  • 16. Extra: The Bio.Phylo class hierarchy Figure: Inheritance relationship among the core classes
  • 17. Extra: PhyloXML classes $ pydoc Bio.Phylo.PhyloXML Accession Date Point Alphabet Distribution Polygon Annotation DomainArchitecture Property BaseTree Events ProteinDomain BinaryCharacters Id Reference BranchColor MolSeq Sequence Clade Other SequenceRelation CladeRelation Phylogeny Taxonomy Confidence Phyloxml Uri See: http://biopython.org/wiki/PhyloXML