SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Sequence Matrix
 Gene concatenation made easy
  Gaurav Vaidya1, David Lohman2, Rudolf Meier2

                           1: NeatCo Asia, Singapore.
                           2: Department of Biological Sciences,
                              National University of Singapore, Singapore.
Our goals


 ✤   Many powerful tools exist for concatenating sequences.

 ✤   Adding new sequences to an existing dataset is tedious and time consuming.

 ✤   Our initial goal: simple, user-friendly program for concatenating sequences.

 ✤   We also added a few tools to help you look for lab contamination in your dataset.
Sequence Matrix


✤   Written in Java.

    ✤   Graphical user interface libraries.

    ✤   Works on different operating systems.

    ✤   Easy to install: download and run the batch file.
Importing sequences



✤   You can use the sequence names as
    entered in the input file.

✤   Or you can ask Sequence Matrix to try
    to identify the species names.
Importing sequences

✤   Sequences mode:                                      ✤   Species name
    ✤   gi|237510679|gb|AY556753.2|Daubentonia               ✤   Daubentonia madagascariensis
        madagascariensis voucher WE94001 5.8S
        ribosomal RNA gene, partial sequence; internal
        transcribed spacer 2, complete sequence; and
        28S ribosomal RNA gene, partial sequence

    ✤   gi|237510678|gb|AY556735.2|Macaca                    ✤   Macaca sylvanus
        sylvanus voucher OK96022 5.8S ribosomal
        RNA gene, partial sequence; internal
        transcribed spacer 2, complete sequence; and
        28S ribosomal RNA gene, partial sequence
Importing sequences



✤   A common source of error is forgetting
    to recode leading and trailing gaps as
    missing information.

✤   Sequence Matrix can automatically
    replace such gaps with question marks.
Importing sequences: Naming



✤   Sequences from one dataset are matched up to another dataset by sequence name.

    ✤   Errors in sequence naming need to be fixed.

✤   We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
Export: Taxonsets


✤   By default, we generate taxonsets on the
    basis of:

    ✤   Combined length.

    ✤   Number of character sets

    ✤   Information for a particular gene.
Gene trees



✤   Two ways to do them:

    ✤   Use the taxonset of taxa having information for a particular gene to exclude other
        taxa.

    ✤   Export the entire dataset with one file per column.
Export features



✤   You can also export the Sequence Matrix table as an Excel-readable text file.

    ✤   Supervisory mode.

    ✤   Keep track of a project as it grows.
Character sets


✤   We can read character sets defined in
    Nexus CHARSET and TNT xgroup
    commands.

✤   These can be “split” into individual
    columns, or imported as a single
    column representing the entire file.
Excision


✤   Individual sequences can be excised
    from the dataset.

✤   Excised sequences will not be exported.

    ✤   Sequence Matrix will warn you about
        that.
Contamination


✤   You thought you were sequencing Gorilla gorilla

    ✤   but you were really sequencing Homo sapiens.

✤   We have two tools you can use:

    ✤   If Homo sapiens is in your dataset.

    ✤   If Homo sapiens is not in your dataset (experimental!).
H. sapiens in dataset

✤   Looks for pairs of sequences whose
    pairwise distance is very low.

✤   Expected difference depends on gene:

    ✤   28S doesn’t change very much, but

    ✤   COI changes very quickly.

✤   Some interpretation is required.
H. sapiens not present

✤   Use “Pairwise Distance Mode” to look
    for unusual pairwise distances.

✤   Ignore one charset, then sort taxa based
    on their pairwise distance to a
    “reference taxon”.

    ✤   Colour sequences by their individual
        pairwise distances to the reference
        taxon.
H. sapiens not present

✤   Colour pairwise distances on the gene
    in question by their pairwise distance to
    the reference taxon.

✤   Look for colour variation which is
    unusual or out of place.

✤   We would expect sequences from
    different species to be correlated
    together.
Pairwise distance
mode

✤   You need to vary:

    ✤   The gene you are studying.

    ✤   The reference taxon being compared
        against.

✤   Possibly helpful as an alert mechanism.
Summary

✤   Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets.

✤   Taxonsets allow you to analyse subsets of your data in downstream programs.

✤   Excising sequences gives you greater control over which sequences to analyse.

✤   You can look for contamination in two ways:

    ✤   Looking for very low pairwise distances across your entire dataset.

    ✤   Looking for unusual pairwise distances in Pairwise Distance Mode.
Acknowledgements

✤   Rudolf Meier

✤   Zhang Guanyang

✤   Farhan Ali

✤   David Lohman

✤   Everybody at the NUS DBS
    Evolutionary Biology lab.
Question time!

Mais conteúdo relacionado

Mais procurados

Differential Equation and its Application in LR circuit
Differential Equation and its Application in LR circuitDifferential Equation and its Application in LR circuit
Differential Equation and its Application in LR circuitKrushnaNemade
 
The p n Junction Diode (Basic Electronics)
The p n Junction Diode (Basic Electronics) The p n Junction Diode (Basic Electronics)
The p n Junction Diode (Basic Electronics) Ivan Saguit
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statisticsSahidul Islam
 
ELECTROMAGNETICS: Laplace’s and poisson’s equation
ELECTROMAGNETICS: Laplace’s and poisson’s equationELECTROMAGNETICS: Laplace’s and poisson’s equation
ELECTROMAGNETICS: Laplace’s and poisson’s equationShivangiSingh241
 
Calculus .pdf
Calculus .pdfCalculus .pdf
Calculus .pdfIramTyagi
 
Calculus of variation problems
Calculus of variation   problemsCalculus of variation   problems
Calculus of variation problemsSolo Hermelin
 
graphs, level curve and contours of function of two variable
 graphs, level curve and contours of function of two variable graphs, level curve and contours of function of two variable
graphs, level curve and contours of function of two variablesabaali73
 
ROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMS
ROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMSROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMS
ROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMSkaushalkataria1
 
Signal flow graphs
Signal flow graphsSignal flow graphs
Signal flow graphsKALPANA K
 
Fuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQFuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQShaheen Shaikh
 
Presentation on mini dictionary using C language
Presentation on  mini dictionary using C languagePresentation on  mini dictionary using C language
Presentation on mini dictionary using C languagePriya Yadav
 
A Mobile Based Women Safety Application (I Safe Apps)
A Mobile Based Women Safety Application (I Safe Apps)A Mobile Based Women Safety Application (I Safe Apps)
A Mobile Based Women Safety Application (I Safe Apps)IOSR Journals
 
Food Order Management System using C
Food Order Management System using CFood Order Management System using C
Food Order Management System using CManish Tuladhar
 
l4-ac-ac converters.ppt
l4-ac-ac converters.pptl4-ac-ac converters.ppt
l4-ac-ac converters.pptantexnebyu
 
Linear transformation.ppt
Linear transformation.pptLinear transformation.ppt
Linear transformation.pptRaj Parekh
 
Women security application
Women security applicationWomen security application
Women security applicationAkshay Surve
 

Mais procurados (20)

Differential Equation and its Application in LR circuit
Differential Equation and its Application in LR circuitDifferential Equation and its Application in LR circuit
Differential Equation and its Application in LR circuit
 
The p n Junction Diode (Basic Electronics)
The p n Junction Diode (Basic Electronics) The p n Junction Diode (Basic Electronics)
The p n Junction Diode (Basic Electronics)
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statistics
 
Relay driver
Relay  driverRelay  driver
Relay driver
 
ELECTROMAGNETICS: Laplace’s and poisson’s equation
ELECTROMAGNETICS: Laplace’s and poisson’s equationELECTROMAGNETICS: Laplace’s and poisson’s equation
ELECTROMAGNETICS: Laplace’s and poisson’s equation
 
Calculus .pdf
Calculus .pdfCalculus .pdf
Calculus .pdf
 
Calculus of variation problems
Calculus of variation   problemsCalculus of variation   problems
Calculus of variation problems
 
graphs, level curve and contours of function of two variable
 graphs, level curve and contours of function of two variable graphs, level curve and contours of function of two variable
graphs, level curve and contours of function of two variable
 
ROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMS
ROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMSROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMS
ROLE OF SCIENCE AND TECHNOLOGY IN IMPROVING CONDITIONS OF SLUMS
 
Lead-lag controller
Lead-lag controllerLead-lag controller
Lead-lag controller
 
Signal flow graphs
Signal flow graphsSignal flow graphs
Signal flow graphs
 
Fuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQFuzzy System and fuzzy logic -MCQ
Fuzzy System and fuzzy logic -MCQ
 
Presentation on mini dictionary using C language
Presentation on  mini dictionary using C languagePresentation on  mini dictionary using C language
Presentation on mini dictionary using C language
 
A Mobile Based Women Safety Application (I Safe Apps)
A Mobile Based Women Safety Application (I Safe Apps)A Mobile Based Women Safety Application (I Safe Apps)
A Mobile Based Women Safety Application (I Safe Apps)
 
Food Order Management System using C
Food Order Management System using CFood Order Management System using C
Food Order Management System using C
 
Weather Display app
Weather Display appWeather Display app
Weather Display app
 
l4-ac-ac converters.ppt
l4-ac-ac converters.pptl4-ac-ac converters.ppt
l4-ac-ac converters.ppt
 
Lec 3 desgin via root locus
Lec 3 desgin via root locusLec 3 desgin via root locus
Lec 3 desgin via root locus
 
Linear transformation.ppt
Linear transformation.pptLinear transformation.ppt
Linear transformation.ppt
 
Women security application
Women security applicationWomen security application
Women security application
 

Semelhante a Sequence Matrix: Gene concatenation made easy

sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.pptMedhavi27
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 

Semelhante a Sequence Matrix: Gene concatenation made easy (20)

31931 31941
31931 3194131931 31941
31931 31941
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Seq 301116
Seq 301116Seq 301116
Seq 301116
 
1 md2016 homology
1 md2016 homology1 md2016 homology
1 md2016 homology
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.ppt
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 

Último

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 

Último (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Sequence Matrix: Gene concatenation made easy

  • 1. Sequence Matrix Gene concatenation made easy Gaurav Vaidya1, David Lohman2, Rudolf Meier2 1: NeatCo Asia, Singapore. 2: Department of Biological Sciences, National University of Singapore, Singapore.
  • 2. Our goals ✤ Many powerful tools exist for concatenating sequences. ✤ Adding new sequences to an existing dataset is tedious and time consuming. ✤ Our initial goal: simple, user-friendly program for concatenating sequences. ✤ We also added a few tools to help you look for lab contamination in your dataset.
  • 3. Sequence Matrix ✤ Written in Java. ✤ Graphical user interface libraries. ✤ Works on different operating systems. ✤ Easy to install: download and run the batch file.
  • 4. Importing sequences ✤ You can use the sequence names as entered in the input file. ✤ Or you can ask Sequence Matrix to try to identify the species names.
  • 5. Importing sequences ✤ Sequences mode: ✤ Species name ✤ gi|237510679|gb|AY556753.2|Daubentonia ✤ Daubentonia madagascariensis madagascariensis voucher WE94001 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence ✤ gi|237510678|gb|AY556735.2|Macaca ✤ Macaca sylvanus sylvanus voucher OK96022 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence
  • 6. Importing sequences ✤ A common source of error is forgetting to recode leading and trailing gaps as missing information. ✤ Sequence Matrix can automatically replace such gaps with question marks.
  • 7. Importing sequences: Naming ✤ Sequences from one dataset are matched up to another dataset by sequence name. ✤ Errors in sequence naming need to be fixed. ✤ We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
  • 8. Export: Taxonsets ✤ By default, we generate taxonsets on the basis of: ✤ Combined length. ✤ Number of character sets ✤ Information for a particular gene.
  • 9. Gene trees ✤ Two ways to do them: ✤ Use the taxonset of taxa having information for a particular gene to exclude other taxa. ✤ Export the entire dataset with one file per column.
  • 10. Export features ✤ You can also export the Sequence Matrix table as an Excel-readable text file. ✤ Supervisory mode. ✤ Keep track of a project as it grows.
  • 11. Character sets ✤ We can read character sets defined in Nexus CHARSET and TNT xgroup commands. ✤ These can be “split” into individual columns, or imported as a single column representing the entire file.
  • 12. Excision ✤ Individual sequences can be excised from the dataset. ✤ Excised sequences will not be exported. ✤ Sequence Matrix will warn you about that.
  • 13. Contamination ✤ You thought you were sequencing Gorilla gorilla ✤ but you were really sequencing Homo sapiens. ✤ We have two tools you can use: ✤ If Homo sapiens is in your dataset. ✤ If Homo sapiens is not in your dataset (experimental!).
  • 14. H. sapiens in dataset ✤ Looks for pairs of sequences whose pairwise distance is very low. ✤ Expected difference depends on gene: ✤ 28S doesn’t change very much, but ✤ COI changes very quickly. ✤ Some interpretation is required.
  • 15. H. sapiens not present ✤ Use “Pairwise Distance Mode” to look for unusual pairwise distances. ✤ Ignore one charset, then sort taxa based on their pairwise distance to a “reference taxon”. ✤ Colour sequences by their individual pairwise distances to the reference taxon.
  • 16. H. sapiens not present ✤ Colour pairwise distances on the gene in question by their pairwise distance to the reference taxon. ✤ Look for colour variation which is unusual or out of place. ✤ We would expect sequences from different species to be correlated together.
  • 17. Pairwise distance mode ✤ You need to vary: ✤ The gene you are studying. ✤ The reference taxon being compared against. ✤ Possibly helpful as an alert mechanism.
  • 18. Summary ✤ Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets. ✤ Taxonsets allow you to analyse subsets of your data in downstream programs. ✤ Excising sequences gives you greater control over which sequences to analyse. ✤ You can look for contamination in two ways: ✤ Looking for very low pairwise distances across your entire dataset. ✤ Looking for unusual pairwise distances in Pairwise Distance Mode.
  • 19. Acknowledgements ✤ Rudolf Meier ✤ Zhang Guanyang ✤ Farhan Ali ✤ David Lohman ✤ Everybody at the NUS DBS Evolutionary Biology lab.