DNA Day
Hospital Universitario La Paz, Madrid, Spain April 28th, 2014
The first server of the Spanish Population Variability.
Freely available: http://ciberer.es/bier/exome-server/
See alse related tools:
BiERapp: http://bierapp.babelomics.org (to help in the prioritization of disease genes)
TEAM: http://team.babelomics.org (to manage panels of genes for targeter resequencing based diagnostic)
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
The server of the Spanish Population Variability
1. CIBERER Exome Server (CES) The server of the Spanish Population Variability
Joaquín Dopazo, PhD
Department of Computational Genomics,
CIPF, Valencia
Hospital Universitario La Paz, Madrid
28 de abril, 2014
2. Why is interesting to have a Spanish Exome Variant repository
Rationale: Local variability is more important than previously thought. The existence of numerous local rare variants, many of them (apparently) deleterious hampers the prioritization of disease variants.
Data recycling: CIBERER has accumulated a large number of samples that can be used as (pseudo)controls of normal population
3. Pipeline of data analysis
Primary processing
Initial QC
FASTQ file
Mapping
BAM file
Variant calling
VCF File
Knowledge-based prioritization
Proximity to other known disease genes
Functional proximity
Network proximity
Burden tests
Other prioritization methods
Secondary analysis
(Successive filtering)
Variant annotation
Filtering by effect
Filtering by MAF
Filtering by family segregation
Primary analysis
Gene prioritization
1000 genomes
EVS
Local variants
4. Use known variants and their population frequencies to filter out.
•Typically dbSNP, 1000 genomes and the 6515 exomes from the ESP are used as sources of population frequencies.
•We selected 75 local controls to add and extra filtering step to the analysis pipeline
Novembre et al., 2008. Genes mirror geography within Europe. Nature
Comparison of Spanish controls to 1000g
How important do you think is local information to detect disease genes?
5. Filtering with or without local variants
Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant
The use of local variants makes an enormous difference
6. What do we know about the Spanish population Variability?
7. Using CIBERER families to create a first version of the database of local variability of Spanish population
•In each family we select two unrelated members (preferably the parents)
•If there are no parents, then one of the unaffected children (unaffected, if possible) are selected
•A total of 75, out of the 136 samples available among the families analyzed in the BiER, were initially selected.
•Variant files (VCF) were obtained following the same pipeline (with missing values included) and merged.
•Genotype proportions and MAFs were obtained for all the variable positions. ONLY this information is used in the web server.
8. Samples used
UNIT
n
%
U723
12
16
U737
11
14,7
U759
2
2,7
U705
10
13,3
U720
12
16
U732
1
1,3
U755
3
4
U746
9
12
U728
2
2,7
U729
3
4
U703
7
9,3
U718
1
1,3
U730
2
2,7
Total
75
100
DISEASE
n
%
3-Methylglutaconic aciduria
11
14,7
Atypical fracture
4
5,3
Autosomal DOMINANT non-syndromic hearing loss
1
1,3
Autosomal RECESSIVE non-syndromic hearing loss
1
1,3
BCKDK-deficiency disease
2
2,7
CMT
1
1,3
Congenital disorder of glycosylation types I and II
8
10,7
CoQ disease
3
4,0
CoQ10 deficiency and DNA depletion
3
4,0
CoQ10 deficiency
2
2,7
Inherited Metabolic Disease
2
2,7
MMD (Multiple deletion of mitochondrial DNA)
4
5,3
MSUD (Maple Syrup Urine Disease)
1
1,3
Opitz
8
10,7
Pelizaeus-like
2
2,7
RCD (Respiratory complexes deficiency)
8
10,7
Retinitis pigmentosa
11
14,7
Usher
3
4,0
Total
75
100,0
Gender
Man
Woman
Phenotype
Affected
Healthy
9. Variability spectrum of the
Spanish population
A total of 131.897 variant positions, unique in Spanish population, were
detected in all the 75 samples together. Approximately 90.000 were
singletons. 51.295 variants are non-synonymous changes and 18.450
correspond to synonymous changes (singleton-driven pattern, opposite to
variants shared with 1000g and EVS, from polymorphic positions).
10. The CIBERER Exome Server (CES): the first repository of variability of the Spanish population
Only another similar initiative exists: the GoNL http://www.nlgenome.nl/
http://ciberer.es/bier/exome-server/
11. Information provided
Genotypes in the different reference populations
Genomic coordinates, variation, and gene.
SNPid if any
13. Variants can also be seen in their genomic context
GenomeMaps viewer (Medina et al., 2013, NAR) embedded in the application. GenomeMaps is the official genome viewer of the ICGC (http://dcc.icgc.org/)
14. Occurrence of pathological variants in “normal” population
Reference genome is mutated
Nine carriers in 1000 genomes
One affect and 73 carriers in EVS
16. Spanish variability database. FAQ
What is stored in the database?
ONLY frequencies of the genotypes observed in the positions in which variants have been found in at least one individual. This information is obtained from Spanish unrelated individuals.
What information is provided by the database?
Aggregated information on the genotype frequencies of the variable position in the gene(s) requested.
Is possible to know that a particular individual is stored in the database?
No, unless you sequence the individual and check if the genotype frequencies are compatible with the database, but seems stupid because you already have the information pursued.
Lets imagine that I am stupid and managed to know that the individual is in the database, can I retrieve her/his genome?
No, it is impossible from the aggregated information
17. Spanish variability database. FAQ
Who can contribute?
Anyone (especially if you are sequencing with public resources)
What do you need to submit?
Anonymized files of variants (VCF: variant calling format)
Why VCFs?
Because we need to check that your contribution contains no relatives of the individuals in the database
18. What’s next?
•Strategic steps:
–Populating the database with contributions of CIBERER and externals. Future project SPANEx
–Opening the database
•Technical steps:
–Automatic access to the local variability data via webservices
–Use in gene discovery pipelines
–Use for the interpretation of incidental findings in diagnostic panels
19. Table of Spanish Frequencies
(TSF)
DB of Spanish variants (DBSV)
Chr
Position
Ref
Alt
0/0
0/1
1/1
1
1365313
A
T
75
0
0
1
1484884
G
A
70
4
1
2
326252
T
C
25
35
15
CES use
Other countries
CES input
External
Unrelated?
(DBSV)
VCFs
Spanish?
(TSF)
YES
YES
NO
NO
Counts
Internal
Regional
20. Future of the Database of variation in Spanish population
CIBERER contributions
SPANEx contributions
21. CIBERER
76 samples
Unaffected
CES II
76+269+X
Mixed
MGP
269 samples
Healthy controls
Phase I Phase II Phase III
CES II 1000+76+269+X Mixed
More CIBERER
samples
SPANEX:
1000 exomes CIBERER
CIBERER exome server roadmap
2014-June
2014
2015
22. Future utilization. Access via webservices
Access to aggregated data of variation and genotype frequencies. Therefore, no confidentiality or privacy issues associated.
Spanish variation database
CellBase. (Bleda et al., 2012. NAR) Our data server system. Now at the EBI
23. NA19660 NA19661
NA19600 NA19685
BiERapp: the interactive filtering tool for easy candidate prioritization
http://bierapp.babelomics.org
24. Panel (real or virtual) manager
Tool for defining panels
New filter based on local population variant frequencies
If no diagnostic variants appear, then secondary findings can be studied
Diagnostic mutations
http://team.babelomics.org
25. Take home message
•Local variability is critical for distinguishing real pathologic variants from local polymorphisms
•CES will be populated with the SPANEX project (M.A. Moreno talk)
•CES is the starting point of a more ambitious crowdsourcing project that aims at constructing a high-resolution map of the Spanish population variation
•Contributions to CES are compliant with confidentially issues. No patient information is shared, only statistical information.
26. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and…
...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases, and…
...the Medical Genome Project (Sevilla)
@xdopazo @bioinfocipf