DNA vectors are widely used in molecular cloning, gene engineering, studies of gene expression and other applications. Sequence validation of a vector DNA is a crucial quality control step before using the vector. With the cost of sequencing rapidly decreasing it becomes cost-effective to ensure the vectors quality using high-throughput sequencing and bioinformatics analysis. VectorQC is an automatic pipeline for quality control of a collection of sequenced DNA vectors. The pipeline is built using the NextFlow framework and is distributed with the Docker container, which makes the pipeline easy to install, modify, and re-use on any Unix-compatible OS on a computer, cluster or cloud
2. Background
A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a
cell, where it can be replicated and/or expressed.
The vector itself is generally a DNA sequence that consists of an insert (transgene) and
a larger sequence that serves as the "backbone" of the vector.
3. Background
Vector
Host cell
A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a
cell, where it can be replicated and/or expressed.
The vector itself is generally a DNA sequence that consists of an insert (transgene) and
a larger sequence that serves as the "backbone" of the vector.
4. Background
Vector
Host cell
Amplification (cloning vector)
A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a
cell, where it can be replicated and/or expressed.
The vector itself is generally a DNA sequence that consists of an insert (transgene) and
a larger sequence that serves as the "backbone" of the vector.
5. Background
Vector
Host cell
Amplification (cloning vector)
Expression (expression vector)
A vector is a DNA molecule used as a vehicle to carry foreign genetic material into a
cell, where it can be replicated and/or expressed.
The vector itself is generally a DNA sequence that consists of an insert (transgene) and
a larger sequence that serves as the "backbone" of the vector.
6. Background
A vector is composed of different elements:
• Origin of replication
• Cloning sites: one or more targets for restriction enzymes
The pBR322 plasmid
• Reporter genes: genes that activate / inactivate
their function after successful insertion and colour
the positive colonies
• Antibiotic resistance: for selecting only the
colonies containing the vector
• Promoter
• …
Source: wikipedia
7. The problem
Nowadays vectors are considered a basic tools in biotechnology and having a library of
vector in a lab / facility is quite common.
After each year there is an increase of the risk of mis-labelling, construct degradation,
contamination.
Having a quality control of the integrity of the
vectors backbone and of the inserted DNA
could help in avoiding wasting of time and
money and in reducing errors.
13. vectorQC
Fragmented DNA
Scaffolds / whole
constructs
Quality
trimming and
assembly
Annotation of
features
DB of features
+ list of inserts
Annotations
14. Fragmented DNA
Scaffolds / whole
constructs
Quality
trimming and
assembly
Annotation of
features
DB of features
+ list of inserts
Annotations
Generating
maps Generating report
and sequences
vectorQC
15. Quality control and trimming
• FASTQC: QC of initial and trimmed reads
• Skewer: trimming the raw reads.
vectorQC
16. Quality control and trimming
• FASTQC: QC of initial and trimmed reads
• Skewer: trimming the raw reads.
Read assembly
• Flash: merging of overlapping reads (optional)
• SPAdes: assembly that is corrected with a custom script for addressing the circularity
• Custom script: to randomly join the scaffolds in a single molecule
vectorQC
17. Quality control and trimming
• FASTQC: QC of initial and trimmed reads
• Skewer: trimming the raw reads.
Read assembly
• Flash: merging of overlapping reads (optional)
• SPAdes: assembly that is corrected with a custom script for addressing the circularity
• Custom script: to randomly join the scaffolds in a single molecule
Annotation
• Blast: annotating features and eventually detecting the DNA insert.
• Restrict (Emboss): for detecting restriction enzyme sites
• Circular Genome Viewer: for generating the maps
• MultiQC: for collecting the results in a comprehensive report
vectorQC
18. Available resources
• Database of features: from Plasmapper tool, but can be expanded
• Database of restriction enzyme: REBASE
Custom resources
• Insert list: custom fasta file with the name of the inserts
vectorQC
19. Available resources
• Database of features: from Plasmapper tool, but can be expanded
• Database of restriction enzyme: REBASE
Custom resources
• Insert list: custom fasta file with the name of the inserts
https://github.com/biocorecrg/vectorQC
vectorQC
20. Available resources
• Database of features: from Plasmapper tool, but can be expanded
• Database of restriction enzyme: REBASE
Custom resources
• Insert list: custom fasta file with the name of the inserts
https://github.com/biocorecrg/vectorQC
vectorQC
27. Next developments
• Improving the assembly: removing the low covered contigs
• Comparison with reference: if provided we should check the concordance of the
contigs with the reference
• Detection of variants: SNP / Indel calling against the reference if provided
https://github.com/biocorecrg/vectorQC
28. Thank you!
Toni Hermoso Pulido
Julia Ponomarenko
Sarah Bonnin
Jochen Hecht (Genomics Unit)
Carlo Carolis (BS&PT Unit)