Use of NCBI Databases in qPCR Assay Design

Integrated DNA Technologies
Use of NCBI Databases in qPCR Assay
Design
Elisabeth Wagner, PhD
Scientific Applications Specialist

1
Session Outcomes
 You will:
 Learn which NCBI tools are useful for designing qPCR assays
 Become proficient using tools for qPCR design in the IDT SciTools® suite
 Navigate the features and tools available on the NCBI website
 Obtain sequence information for your gene of interest
 Perform a BLAST search for assay specificity
 Search for SNPs
 Understand how to proceed with a basic qPCR design

2
qPCR Design Covers A Lot of Ground
There are many uses for quantitative PCR.
For some examples:
 Gene expression
 Copy number variation
 Genotyping
 Multi-species analysis
 Splice variant specific (or common) expression
We will address the general considerations for design in this session,
and cover more specific examples later this afternoon.

3
SciTools® Overview
 http://www.idtdna.com/pages/scitools
 Several Tools are available in the IDT SciTools® suite to assist with qPCR design
 1. RealTime PCR Tool
 2. PrimerQuest® Tool
 3. OligoAnalyzer® Tool
 4. PrimeTime® Predesigned qPCR Assay Database

4
NCBI Databases Overview:
 1. Obtain sequence information for your gene of interest-
 NCBI Nucleotide or Gene
 2. Perform a BLAST search for assay specificity
 NCBI BLAST
 3. Search for SNPs
 NCBI dbSNP
NCBI enables you to access all of this information necessary for design
in one location.

5
Using NCBI Databases for Custom qPCR Assay Design

NCBI Overview (National Center for Biotechnology and Information)
 Founded in 1988 as part of the United States National Library of Medicine
 Houses a series of databases relevant to biotechnology and biomedicine
 Curates Genbank, a database of over 1x1012 bp of DNA sequences
 Gene database, which integrates gene-specific information from numerous
species
 dbSNP, which is a database of reported Single Nucleotide Polymorphisms (SNPs)
 Contains the BLAST sequence similarity search program
 Maintains PubMed, a journal database for biomedical literature
 Much, much more information!
6

NCBI Database Search: Sequence Information for qPCR Assay Design
http://www.ncbi.nlm.nih.gov/
7

NCBI Sequence Files
Files:
 Can be entered by anyone
 May or may not be checked for accuracy
 May contain contaminated sequence (plasmid or other)
 May contain annotation errors
Accession numbers:
 Letters at the beginning indicate the type of file
 Nucleotide sequences start with 1 or 2 letters:
8

The RefSeq Database
 non-redundant
 explicitly linked nucleotide and protein
sequences
 ongoing curation by NCBI staff and
collaborators, with reviewed records
indicated
 includes data validation and
format consistency
 distinct accession numbers
 all accessions include an
underscore '_' character
 Different versions are tracked
9

RefSeq Accession Numbers
 mRNAs and Proteins
 NM_123456 Curated mRNA
 NP_123456 Curated Protein
 NR_123456 Curated non-coding RNA
 XM_123456 Predicted mRNA
 XP_123456 Predicted Protein
 XR_123456 Predicted non-coding RNA
 Gene Records
 NG_123456 Reference Genomic Sequence
 Chromosome
 NC_123455 Microbial replicons, organelle
genomes, human chromosomes
 AC-123455 Alternate assemblies
 Assemblies
 NT_123456 Contig
 NW_123456 WGS Supercontig
10

Accessing Sequence Information in NCBI
11
NCBI

NCBI Gene Database Information: Gene Search
12

Sequence Data Searches Using Nucleotide
 Sequence Files
 mRNA and genomic
 Transcript variants
http://www.ncbi.nlm.nih.gov/nuccore
13

Data Retrieval: Graphics View
15

Data Retrieval: FASTA Sequence Format
16

17
Using PrimerQuest® Tool for Custom qPCR Designs

18
PrimerQuest® Tool for Generating Custom qPCR Designs
Highly customizable tool

19
You Can Use NCBI Accession Number or FASTA Sequence

20
Once Sequence Entered, 3 Defaults Become Available
Often you will need to adjust the
parameters of the tool to meet
experimental design requirements

21
PrimerQuest® Tool Assay Output

22
Changing Parameters Depend on the Assay Required
Before changing anything, make sure
you have selected the correct assay
Sometimes you
simply need to
increase the number
of designs returned
It is unlikely that you will need
to change these parameters

23
Directing the Design to a Specific Region
Target a particular “junction”

24
Examples
Excluded region
260-280
Excluded region-probe 260-280Target region 260-280

25
Changing Primer/Probe Parameters
If the target is particularly biased (AT or GC rich), you may need to change primer/probe
parameters (i.e. length)

26
Once Initial Design Completed, Back to NCBI
Use NCBI tools to:
 Check whether assay is specific (BLAST)
 Ensure there are no SNPs to worry about (dbSNP)
Use IDT OligoAnalzyer® Tool
 Check primers (and probe) for secondary structure and dimer
formation

27
Using NCBI BLAST to Check for Primer Specificity

28
What is BLAST?—Getting to BLAST
 http://www.ncbi.nlm.nih.gov/
 Or http://blast.ncbi.nlm.nih.gov/Blast.cgi

29
What is BLAST (Basic Local Alignment Search Tool)?
 BLAST stands for Basic Local Alignment Search Tool and is provided by the National Center for
Biotechnology and Information (NCBI)
 Aligns a user defined query (sequence) to a wide variety of databases
 Can translate the query or the database to align sequences
 Can align 2 or more sequences together
 Heuristic algorithm to create alignments very fast
 Breaks sequences into “words” and searches the database for matches
 Reassembles these matches based on the criteria entered

30
What is BLAST?—Basic BLAST

31
How BLAST Works—Words
 BLAST divides the query sequence into subsets called “words”,
which the algorithm uses to perform the alignment
 Example (35 nt sequence):
CGATCGGGCATCACACAAAGTTATGTAGTAGAAAT
 All possible words that can be generated from the sequence are
used for the alignment
 The max number of words for this sequence is 29
7-letter word

32
Overview—Definitions
 Hit: A sequence to which the query is aligned and is returned in the
results of BLAST
 Identity: the extent of exact matches between 2 sequences (eg
ACGT and ACGG have 75% identity)
 Similarity = Positives (in BLAST scoring)

33
How BLAST Works—Scores
 The BLAST raw score is converted to a bit score for each alignment using
parameters based on statistics described in Karlin and Altschul (1990)
(www.ncbi.nlm.nih.gov/pmc/articles/PMC53667/pdf/pnas01031-0226.pdf).
 A high score does not necessarily indicate that the query is unique
 The score is only dependent on the alignment, length of the sequence, and the
length of the database
 E-value is the expected amount of random sequences that have equivalent
sequence alignment
 Calculated using the Max bit score and the length of the query and database
 Tells you the relative strength of the alignment
 Shorter sequences have higher E-values because the probability of finding that
sequence is higher
 A low E-value does not mean you have a unique match!

34
BLAST Assessment for qPCR Primers
 Go to the BLAST server:
 http://blast.ncbi.nlm.nih.gov/Blast.cgi
Enter primer sequences
separated by 7+ N’s

35
Select the Correct Database
“Others” is the most general but contains a lot of sequences. If possible use
Human or Mouse specific databases
For species with completed genome
projects, consider using “NCBI
Genomes” to limit BLAST results

36
Change the parameters of the BLAST scoring
Select less
rigorous
algorithm
Change
Word size
to “7”

37
Looking at the Results
The Graphic
Summary can
immediately give
you a sense of
what the overall
results are
Hover over
each result in
the graphic to
identify the
sequence
name

38
Then Look at Results List
Look at E-value and Query Coverage. Look for
jumps in either/both.
Looks like assay is specific to a single
gene by transcript
Ignore the “alternate”
chromosome assemblies

39
Investigate details of alignment
Check distance between primer
binding if looking at mRNA
Open
Graphics
result in a
new
tab/window

40
BLAST Shows Primer Aligned to Sequence
Zoom out with “-” sign
You can grab within
window and drag
sequence side to side

41
The Target Gene is on Chromosome 6
This looks promising with primers on different exons.

42
But We Had Other Chromosomal Hits……
“real” transcript
Pseudogene—
doesn’t look
transcribed
Primers (red
bar indicates
mismatch)

43
And Another One……
Another pseudogene.
But what’s
this?
Intron of a transcribed gene. So potentially in
RNA samples. Recommend avoiding if possible

44
Using NCBI to Check for SNPs

45
While Assessing BLAST Results, Also Assess for SNPs

46
Investigate SNPs in Primer Binding Sites

47
Assessing SNP Data
Tells you it’s a
single base
substitution
Indicates alternate
forms (here recorded
on opposite strand)
Indicates allele
frequency if known
Sometimes more frequency
data at bottom of page

48
SNP Data Roughly Divided by Risk
Trusted source
Very low frequency
No data, likely not
going to be
problematic
Significant risk. Look
to redesign if possible

49
Using OligoAnalzyer® Tool to Check Primers and Probes

50
Checking Primers with OligoAnalyzer® Tool
 PrimerQuest® design tools give you the “best” assays for the region
specified
 They check for self- and hetero-dimers, but this is only part of the scoring
system used
 An assay maybe be “better” even with dimer issues if it scores well on
other parameters
 Go to the OligoAnalyzer Tool
 Perform self-dimer checks for primers and probe
 Perform heterodimer checks on all primer/probe combinations (especially
important to include all combinations when multiplexing)
 Check hairpin structures.
 Look for stability of < -9 kcal/mol
 Or multiple hairpins forming with < -4 kcal/mol

51
Assessing Dimer Data
Looks stable < -9kcal/mol
But this is not “dangerous”,
avoid if possible but ok
Looks stable < -9kcal/mol
Not extendable, not a problem
Doesn’t look stable > -9kcal/mol
Danger of extension,
exponential amplification!

52
Assessing Hairpin Structures
 Based on UNAfold predictions

IDT PrimeTime® Predesigned qPCR Database
53

54
Primer and Probe Design Criteria for PrimeTime® Assays
 Primers
 equal Tm (60–63oC)
 15–30 bases in length
 no runs of 4 or more Gs
 amplicon size 50–150 bp (max 400 bp)
 Probe
 Probe length no longer than 30–35 bases
 Tm value 4–10oC higher than primers
 no runs of 4 or more consecutive Gs
 G+C content 30–80%
 no G at the 5′ end

Use of NCBI Databases in qPCR Assay Design

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a Use of NCBI Databases in qPCR Assay Design

Semelhante a Use of NCBI Databases in qPCR Assay Design (20)

Mais de Integrated DNA Technologies

Mais de Integrated DNA Technologies (20)

Último

Último (20)

Use of NCBI Databases in qPCR Assay Design

Notas do Editor