Milko stat seq_toulouse

•Transferir como PPT, PDF•

0 gostou•207 visualizações

Valeriya Simeonova

Tecnologia

Milko Krachunov2
, Ivan Popov1
, Valeria Simeonova2
, Irena Avdjieva1
,
Paweł Szczęsny3
, Urszula Zelenkiewicz3
, Piotr Zelenkiewicz3
,
Dimitar Vassilev1
1
Bioinforomatics group, AgroBioInstitute, Bulgaria
2
Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria
3
Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
Detection and correction of errors in
metagenomic 16S RNA parallel sequencing

NGS errors – common problems
 Introduced errors in the assembled reads due to
imperfections both of biological and mathematical origin;
Impossibility to re-sequence the same sample again in
metagenomic studies ;
Tendency the error rate to increase in every step of the
process;
No easy way to differentiate between “sequencing error” and
“rare variant”;
Many existing methods and algorithms concerning different
aspects of the problem but no unified solutions are available;
Large amounts of data are difficult to process with common
software.

Significance of 16S RNA sequencing
Highly conserved between different species of bacteria and
archaea;
Sequence analysis is done with universal PCR primers;
Contains hypervariable regions that can provide species-
specific signature sequences;
Suitable for phylogenetic studies;
Suitable for metagenomic studies.

General approach in metagenomic biodiversity studies
454 Sequencing
Filtering / Denoising
Multiple alignment
Distance matrix
ОTU clusters with abundance count

A. Raw data characteristics and processing
Two separate runs of metagenomic 16S RNA fragments,
sequenced with 454 platform and converted in FASTA format:
run 02 – 46429 short reads
run 04 – 41386 short reads
Our task – extract, denoise and correct only the quality
reads.

Classification and performance evaluation
ClaMS parameters:
Distance cut-off: 0,05
Signature type: DBC
k-mer length: 3
Existing taxonomy: 4th Level

Aim of the method – idea outline
To deal with the heterogeneous nature of the data, similar or
related sequences are considered more important in the error
evaluation
The naïve approach: If a base is less common than the
sequencer error rate, assume it’s likely an error and replace
with the most common base
Our modification: Calculate the occurrence of the base in
reads that are similar in the given region – assign them bigger
weights or use them exclusively

Progress so far
Calculate occurrence rates of every base in reads that are
identical to the evaluated read in a window with radius of n
bases
 Preliminary results: The first basic implementation leads to
an increase in the number of OTUs found with ClaMS
Under development
 Good choice(s) of approach for alignment of the reads
 Empirical evaluation of the parameters
 Comparative evaluation of the variants of the approach

Software used in this project:
Python: http://www.python.org/
Cython: http://cython.org/
MEGA (Molecular Evolutionary Genetics Analysis):
http://www.megasoftware.net/
Muscle: http://www.drive5.com/muscle/
SHREC (SHort Read Error Correction method):
http://ww2.cs.mu.oz.au/~schroder/shrec_www/
ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi-
psf.org/
NINJA (modified): http://nimbletwist.com/software/ninja/index.html
R-package: http://www.r-project.org/

Mais conteúdo relacionado

Mais procurados

Prediction and visualisation of viral genome antigen usingShamik Tiwari

Network approaches to systems biology analysis of complex disease integrative...PMAS Arid Agriculture University, Rawalpindi

Open Source Networking Solving Molecular Analysis of CancerOpen Networking Summit

Spatial Analysis On Histological Images Using SparkJen Aman

Bioinformatics Projects And ApplicationsDr. Paulsharma Chakravarthy

Master's Thesis - deep genomics: harnessing the power of deep neural networks...Enrico Busto

Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Rebeca Orellana

Modular RADAR: Immune System Inspired Strategies for Distributed SystemsSoumya Banerjee

nicolau_BioSketchMonica Nicolau

Mais procurados (9)

Prediction and visualisation of viral genome antigen using

Network approaches to systems biology analysis of complex disease integrative...

Open Source Networking Solving Molecular Analysis of Cancer

Spatial Analysis On Histological Images Using Spark

Bioinformatics Projects And Applications

Master's Thesis - deep genomics: harnessing the power of deep neural networks...

Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...

Modular RADAR: Immune System Inspired Strategies for Distributed Systems

nicolau_BioSketch

Destaque

Ett Profilemartin86315

3302 3305Valeriya Simeonova

3877 3884Valeriya Simeonova

презентация за варшаваValeriya Simeonova

Product Listmartin86315

SimeonovaValeriya Simeonova

Startup pitching tips at LaunchPad 2015, PakistanSiim Teller

Day in the life of a mobile commerce userSiim Teller

Kontakt 2006Siim Teller

Startup lessons from EstoniaSiim Teller

Thailand Mobile Market 2013Siim Teller

Pakistan Trends 2013: Online, Mobile, SocialSiim Teller

Destaque (12)

Ett Profile

3302 3305

3877 3884

презентация за варшава

Product List

Simeonova

Startup pitching tips at LaunchPad 2015, Pakistan

Day in the life of a mobile commerce user

Kontakt 2006

Startup lessons from Estonia

Thailand Mobile Market 2013

Pakistan Trends 2013: Online, Mobile, Social

Semelhante a Milko stat seq_toulouse

Kirmitzoglou_PhD_FinalIoannis Kirmitzoglou

Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceIJSTA

2015 09-29-sbc322-methods.keyYannick Wurm

Systems biology for Medicine' is 'Experimental methods and the big datasetsimprovemed

[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1

T-BioInfo Methods and ApproachesElia Brodsky

T-bioinfo overviewJaclyn Williams

STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen

Pathway analysis for genomics dataSakshiJha40

Errors and Limitaions of Next Generation SequencingNixon Mendez

Assign 2.0 software for the analysis of Phred quality values for quality con...Crystal Sanchez

Impact_of_gene_length_on_DEGLong Pei

Next Generation Sequencing methods Zohaib HUSSAIN

Medical sciencePalani Appan

EiB Seminar from Antoni Miñarro, Ph.DVHIR Vall d’Hebron Institut de Recerca

Softwares For Phylogentic AnalysisPrasanthperceptron

Community Finding with Applications on Phylogenetic Networks [Thesis]Luís Rita

Common copy number variation detection from multiple sequenced samplesieeepondy

2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...Ji-Youn Yeo

BRITEREU_finalposterElsa Fecke

Semelhante a Milko stat seq_toulouse (20)

Kirmitzoglou_PhD_Final

Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence

2015 09-29-sbc322-methods.key

Systems biology for Medicine' is 'Experimental methods and the big datasets

[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...

T-BioInfo Methods and Approaches

T-bioinfo overview

STRING - Prediction of a functional association network for the yeast mitocho...

Pathway analysis for genomics data

Errors and Limitaions of Next Generation Sequencing

Assign 2.0 software for the analysis of Phred quality values for quality con...

Impact_of_gene_length_on_DEG

Next Generation Sequencing methods

Medical science

EiB Seminar from Antoni Miñarro, Ph.D

Softwares For Phylogentic Analysis

Community Finding with Applications on Phylogenetic Networks [Thesis]

Common copy number variation detection from multiple sequenced samples

2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...

BRITEREU_finalposter

Último

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

GenAI Risks & Security Meetup 01052024.pdflior mazor

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Architecting Cloud Native ApplicationsWSO2

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

FWD Group - Insurer Innovation Award 2024The Digital Insurer

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Corporate and higher education May webinar.pptxRustici Software

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays

Milko stat seq_toulouse

1. Milko Krachunov2 , Ivan Popov1 , Valeria Simeonova2 , Irena Avdjieva1 , Paweł Szczęsny3 , Urszula Zelenkiewicz3 , Piotr Zelenkiewicz3 , Dimitar Vassilev1 1 Bioinforomatics group, AgroBioInstitute, Bulgaria 2 Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria 3 Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland Detection and correction of errors in metagenomic 16S RNA parallel sequencing

2. NGS errors – common problems  Introduced errors in the assembled reads due to imperfections both of biological and mathematical origin; Impossibility to re-sequence the same sample again in metagenomic studies ; Tendency the error rate to increase in every step of the process; No easy way to differentiate between “sequencing error” and “rare variant”; Many existing methods and algorithms concerning different aspects of the problem but no unified solutions are available; Large amounts of data are difficult to process with common software.

3. Significance of 16S RNA sequencing Highly conserved between different species of bacteria and archaea; Sequence analysis is done with universal PCR primers; Contains hypervariable regions that can provide species- specific signature sequences; Suitable for phylogenetic studies; Suitable for metagenomic studies.

4. General approach in metagenomic biodiversity studies 454 Sequencing Filtering / Denoising Multiple alignment Distance matrix ОTU clusters with abundance count

5. Our approach:

6. A. Raw data characteristics and processing Two separate runs of metagenomic 16S RNA fragments, sequenced with 454 platform and converted in FASTA format: run 02 – 46429 short reads run 04 – 41386 short reads Our task – extract, denoise and correct only the quality reads.

7. Raw data length histogram Run 02 Run 04

8. B. Correction with SHREC

9. C. Correction with our method:

10. Classification and performance evaluation ClaMS parameters: Distance cut-off: 0,05 Signature type: DBC k-mer length: 3 Existing taxonomy: 4th Level

11. Aim of the method – idea outline To deal with the heterogeneous nature of the data, similar or related sequences are considered more important in the error evaluation The naïve approach: If a base is less common than the sequencer error rate, assume it’s likely an error and replace with the most common base Our modification: Calculate the occurrence of the base in reads that are similar in the given region – assign them bigger weights or use them exclusively

12. Progress so far Calculate occurrence rates of every base in reads that are identical to the evaluated read in a window with radius of n bases  Preliminary results: The first basic implementation leads to an increase in the number of OTUs found with ClaMS Under development  Good choice(s) of approach for alignment of the reads  Empirical evaluation of the parameters  Comparative evaluation of the variants of the approach

13. Software used in this project: Python: http://www.python.org/ Cython: http://cython.org/ MEGA (Molecular Evolutionary Genetics Analysis): http://www.megasoftware.net/ Muscle: http://www.drive5.com/muscle/ SHREC (SHort Read Error Correction method): http://ww2.cs.mu.oz.au/~schroder/shrec_www/ ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi- psf.org/ NINJA (modified): http://nimbletwist.com/software/ninja/index.html R-package: http://www.r-project.org/

14. milko@3mhz.net Thank you

Notas do Editor

Last two change places?
Нещо допълнително?
Деф. заглавие!
Още 1 доп. Слайд?

Milko stat seq_toulouse

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (9)

Destaque

Destaque (12)

Semelhante a Milko stat seq_toulouse

Semelhante a Milko stat seq_toulouse (20)

Último

Último (20)

Milko stat seq_toulouse

Notas do Editor