O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

vectorQC: 'A pipeline for assembling and annotation of vectors'

19 visualizações

Publicada em

DNA vectors are widely used in molecular cloning, gene engineering, studies of gene expression and other applications. Sequence validation of a vector DNA is a crucial quality control step before using the vector. With the cost of sequencing rapidly decreasing it becomes cost-effective to ensure the vectors quality using high-throughput sequencing and bioinformatics analysis. VectorQC is an automatic pipeline for quality control of a collection of sequenced DNA vectors. The pipeline is built using the NextFlow framework and is distributed with the Docker container, which makes the pipeline easy to install, modify, and re-use on any Unix-compatible OS on a computer, cluster or cloud

Publicada em: Ciências
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

vectorQC: 'A pipeline for assembling and annotation of vectors'

  1. 1. Luca Cozzuto Bioinformatics Core Facility vectorQC A pipeline for assembling and annotation of vectors
  2. 2. Background A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a cell, where it can be replicated and/or expressed. The vector itself is generally a DNA sequence that consists of an insert (transgene) and a larger sequence that serves as the "backbone" of the vector.
  3. 3. Background Vector Host cell A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a cell, where it can be replicated and/or expressed. The vector itself is generally a DNA sequence that consists of an insert (transgene) and a larger sequence that serves as the "backbone" of the vector.
  4. 4. Background Vector Host cell Amplification (cloning vector) A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a cell, where it can be replicated and/or expressed. The vector itself is generally a DNA sequence that consists of an insert (transgene) and a larger sequence that serves as the "backbone" of the vector.
  5. 5. Background Vector Host cell Amplification (cloning vector) Expression (expression vector) A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a cell, where it can be replicated and/or expressed. The vector itself is generally a DNA sequence that consists of an insert (transgene) and a larger sequence that serves as the "backbone" of the vector.
  6. 6. Background A vector is composed of different elements: • Origin of replication • Cloning sites: one or more targets for restriction enzymes The pBR322 plasmid • Reporter genes: genes that activate / inactivate their function after successful insertion and colour the positive colonies • Antibiotic resistance: for selecting only the colonies containing the vector • Promoter • … Source: wikipedia
  7. 7. The problem Nowadays vectors are considered a basic tools in biotechnology and having a library of vector in a lab / facility is quite common. After each year there is an increase of the risk of mis-labelling, construct degradation, contamination. Having a quality control of the integrity of the vectors backbone and of the inserted DNA could help in avoiding wasting of time and money and in reducing errors.
  8. 8. Solution Biomolecular Screening & Protein Technologies Unit Genomics Unit Bioinformatics Unit
  9. 9. Solution Massive sequencing Pool of vectors
  10. 10. Solution Massive sequencing Pool of vectors Analysis Reproducible pipeline
  11. 11. Solution Massive sequencing Pool of vectors Analysis Reproducible pipeline Result Report and map of each vector Database
  12. 12. The pipeline: vectorQC Fragmented DNA Scaffolds / whole constructs Quality trimming and assembly
  13. 13. vectorQC Fragmented DNA Scaffolds / whole constructs Quality trimming and assembly Annotation of features DB of features + list of inserts Annotations
  14. 14. Fragmented DNA Scaffolds / whole constructs Quality trimming and assembly Annotation of features DB of features + list of inserts Annotations Generating maps Generating report and sequences vectorQC
  15. 15. Quality control and trimming • FASTQC: QC of initial and trimmed reads • Skewer: trimming the raw reads. vectorQC
  16. 16. Quality control and trimming • FASTQC: QC of initial and trimmed reads • Skewer: trimming the raw reads. Read assembly • Flash: merging of overlapping reads (optional) • SPAdes: assembly that is corrected with a custom script for addressing the circularity • Custom script: to randomly join the scaffolds in a single molecule vectorQC
  17. 17. Quality control and trimming • FASTQC: QC of initial and trimmed reads • Skewer: trimming the raw reads. Read assembly • Flash: merging of overlapping reads (optional) • SPAdes: assembly that is corrected with a custom script for addressing the circularity • Custom script: to randomly join the scaffolds in a single molecule Annotation • Blast: annotating features and eventually detecting the DNA insert. • Restrict (Emboss): for detecting restriction enzyme sites • Circular Genome Viewer: for generating the maps • MultiQC: for collecting the results in a comprehensive report vectorQC
  18. 18. Available resources • Database of features: from Plasmapper tool, but can be expanded • Database of restriction enzyme: REBASE Custom resources • Insert list: custom fasta file with the name of the inserts vectorQC
  19. 19. Available resources • Database of features: from Plasmapper tool, but can be expanded • Database of restriction enzyme: REBASE Custom resources • Insert list: custom fasta file with the name of the inserts https://github.com/biocorecrg/vectorQC vectorQC
  20. 20. Available resources • Database of features: from Plasmapper tool, but can be expanded • Database of restriction enzyme: REBASE Custom resources • Insert list: custom fasta file with the name of the inserts https://github.com/biocorecrg/vectorQC vectorQC
  21. 21. vectorQC
  22. 22. vectorQC
  23. 23. vectorQC
  24. 24. Good practices
  25. 25. Good practices Continuous integration
  26. 26. Good practices Docker image in dockerhub with automatic buildings
  27. 27. Next developments • Improving the assembly: removing the low covered contigs • Comparison with reference: if provided we should check the concordance of the contigs with the reference • Detection of variants: SNP / Indel calling against the reference if provided https://github.com/biocorecrg/vectorQC
  28. 28. Thank you! Toni Hermoso Pulido Julia Ponomarenko Sarah Bonnin Jochen Hecht (Genomics Unit) Carlo Carolis (BS&PT Unit)

×