SlideShare uma empresa Scribd logo
1 de 14
Milko Krachunov2
, Ivan Popov1
, Valeria Simeonova2
, Irena Avdjieva1
,
Paweł Szczęsny3
, Urszula Zelenkiewicz3
, Piotr Zelenkiewicz3
,
Dimitar Vassilev1
1
Bioinforomatics group, AgroBioInstitute, Bulgaria
2
Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria
3
Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
Detection and correction of errors in
metagenomic 16S RNA parallel sequencing
NGS errors – common problems
 Introduced errors in the assembled reads due to
imperfections both of biological and mathematical origin;
Impossibility to re-sequence the same sample again in
metagenomic studies ;
Tendency the error rate to increase in every step of the
process;
No easy way to differentiate between “sequencing error” and
“rare variant”;
Many existing methods and algorithms concerning different
aspects of the problem but no unified solutions are available;
Large amounts of data are difficult to process with common
software.
Significance of 16S RNA sequencing
Highly conserved between different species of bacteria and
archaea;
Sequence analysis is done with universal PCR primers;
Contains hypervariable regions that can provide species-
specific signature sequences;
Suitable for phylogenetic studies;
Suitable for metagenomic studies.
General approach in metagenomic biodiversity studies
454 Sequencing
Filtering / Denoising
Multiple alignment
Distance matrix
ОTU clusters with abundance count
Our approach:
A. Raw data characteristics and processing
Two separate runs of metagenomic 16S RNA fragments,
sequenced with 454 platform and converted in FASTA format:
run 02 – 46429 short reads
run 04 – 41386 short reads
Our task – extract, denoise and correct only the quality
reads.
Raw data length histogram
Run 02 Run 04
B. Correction with SHREC
C. Correction with our method:
Classification and performance evaluation
ClaMS parameters:
Distance cut-off: 0,05
Signature type: DBC
k-mer length: 3
Existing taxonomy: 4th Level
Aim of the method – idea outline
To deal with the heterogeneous nature of the data, similar or
related sequences are considered more important in the error
evaluation
The naïve approach: If a base is less common than the
sequencer error rate, assume it’s likely an error and replace
with the most common base
Our modification: Calculate the occurrence of the base in
reads that are similar in the given region – assign them bigger
weights or use them exclusively
Progress so far
Calculate occurrence rates of every base in reads that are
identical to the evaluated read in a window with radius of n
bases
 Preliminary results: The first basic implementation leads to
an increase in the number of OTUs found with ClaMS
Under development
 Good choice(s) of approach for alignment of the reads
 Empirical evaluation of the parameters
 Comparative evaluation of the variants of the approach
Software used in this project:
Python: http://www.python.org/
Cython: http://cython.org/
MEGA (Molecular Evolutionary Genetics Analysis):
http://www.megasoftware.net/
Muscle: http://www.drive5.com/muscle/
SHREC (SHort Read Error Correction method):
http://ww2.cs.mu.oz.au/~schroder/shrec_www/
ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi-
psf.org/
NINJA (modified): http://nimbletwist.com/software/ninja/index.html
R-package: http://www.r-project.org/
milko@3mhz.net
Thank you

Mais conteúdo relacionado

Mais procurados

Prediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingPrediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingShamik Tiwari
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Networking Summit
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkJen Aman
 
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Enrico Busto
 
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Rebeca Orellana
 
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsModular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsSoumya Banerjee
 

Mais procurados (9)

Prediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingPrediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen using
 
Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Bioinformatics Projects And Applications
Bioinformatics Projects And ApplicationsBioinformatics Projects And Applications
Bioinformatics Projects And Applications
 
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
 
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
 
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsModular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
 
nicolau_BioSketch
nicolau_BioSketchnicolau_BioSketch
nicolau_BioSketch
 

Destaque

презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Startup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanStartup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanSiim Teller
 
Day in the life of a mobile commerce user
Day in the life of a mobile commerce userDay in the life of a mobile commerce user
Day in the life of a mobile commerce userSiim Teller
 
Startup lessons from Estonia
Startup lessons from EstoniaStartup lessons from Estonia
Startup lessons from EstoniaSiim Teller
 
Thailand Mobile Market 2013
Thailand Mobile Market 2013Thailand Mobile Market 2013
Thailand Mobile Market 2013Siim Teller
 
Pakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialPakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialSiim Teller
 

Destaque (12)

Ett Profile
Ett ProfileEtt Profile
Ett Profile
 
3302 3305
3302 33053302 3305
3302 3305
 
3877 3884
3877 38843877 3884
3877 3884
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Product List
Product ListProduct List
Product List
 
Simeonova
SimeonovaSimeonova
Simeonova
 
Startup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanStartup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, Pakistan
 
Day in the life of a mobile commerce user
Day in the life of a mobile commerce userDay in the life of a mobile commerce user
Day in the life of a mobile commerce user
 
Kontakt 2006
Kontakt 2006Kontakt 2006
Kontakt 2006
 
Startup lessons from Estonia
Startup lessons from EstoniaStartup lessons from Estonia
Startup lessons from Estonia
 
Thailand Mobile Market 2013
Thailand Mobile Market 2013Thailand Mobile Market 2013
Thailand Mobile Market 2013
 
Pakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialPakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, Social
 

Semelhante a Milko stat seq_toulouse

Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceIJSTA
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsimprovemed
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesElia Brodsky
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics dataSakshiJha40
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...Crystal Sanchez
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisPrasanthperceptron
 
Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Luís Rita
 
Common copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesCommon copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesieeepondy
 
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...Ji-Youn Yeo
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 

Semelhante a Milko stat seq_toulouse (20)

Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasets
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics data
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Medical science
Medical scienceMedical science
Medical science
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]
 
Common copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesCommon copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samples
 
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 

Último

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Milko stat seq_toulouse

  • 1. Milko Krachunov2 , Ivan Popov1 , Valeria Simeonova2 , Irena Avdjieva1 , Paweł Szczęsny3 , Urszula Zelenkiewicz3 , Piotr Zelenkiewicz3 , Dimitar Vassilev1 1 Bioinforomatics group, AgroBioInstitute, Bulgaria 2 Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria 3 Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland Detection and correction of errors in metagenomic 16S RNA parallel sequencing
  • 2. NGS errors – common problems  Introduced errors in the assembled reads due to imperfections both of biological and mathematical origin; Impossibility to re-sequence the same sample again in metagenomic studies ; Tendency the error rate to increase in every step of the process; No easy way to differentiate between “sequencing error” and “rare variant”; Many existing methods and algorithms concerning different aspects of the problem but no unified solutions are available; Large amounts of data are difficult to process with common software.
  • 3. Significance of 16S RNA sequencing Highly conserved between different species of bacteria and archaea; Sequence analysis is done with universal PCR primers; Contains hypervariable regions that can provide species- specific signature sequences; Suitable for phylogenetic studies; Suitable for metagenomic studies.
  • 4. General approach in metagenomic biodiversity studies 454 Sequencing Filtering / Denoising Multiple alignment Distance matrix ОTU clusters with abundance count
  • 6. A. Raw data characteristics and processing Two separate runs of metagenomic 16S RNA fragments, sequenced with 454 platform and converted in FASTA format: run 02 – 46429 short reads run 04 – 41386 short reads Our task – extract, denoise and correct only the quality reads.
  • 7. Raw data length histogram Run 02 Run 04
  • 9. C. Correction with our method:
  • 10. Classification and performance evaluation ClaMS parameters: Distance cut-off: 0,05 Signature type: DBC k-mer length: 3 Existing taxonomy: 4th Level
  • 11. Aim of the method – idea outline To deal with the heterogeneous nature of the data, similar or related sequences are considered more important in the error evaluation The naïve approach: If a base is less common than the sequencer error rate, assume it’s likely an error and replace with the most common base Our modification: Calculate the occurrence of the base in reads that are similar in the given region – assign them bigger weights or use them exclusively
  • 12. Progress so far Calculate occurrence rates of every base in reads that are identical to the evaluated read in a window with radius of n bases  Preliminary results: The first basic implementation leads to an increase in the number of OTUs found with ClaMS Under development  Good choice(s) of approach for alignment of the reads  Empirical evaluation of the parameters  Comparative evaluation of the variants of the approach
  • 13. Software used in this project: Python: http://www.python.org/ Cython: http://cython.org/ MEGA (Molecular Evolutionary Genetics Analysis): http://www.megasoftware.net/ Muscle: http://www.drive5.com/muscle/ SHREC (SHort Read Error Correction method): http://ww2.cs.mu.oz.au/~schroder/shrec_www/ ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi- psf.org/ NINJA (modified): http://nimbletwist.com/software/ninja/index.html R-package: http://www.r-project.org/

Notas do Editor

  1. Last two change places?
  2. Нещо допълнително?
  3. Деф. заглавие!
  4. Още 1 доп. Слайд?