Background: The major bottleneck in genome sequencing is no longer data generation, but the computational challenges around data analysis, display and integration. New approaches and methods are, therefore, required to meet these challenges. Visual analytics is the representation and presentation of data that exploits human visual perception abilities in order to amplify cognition. Opportunities exist for African researchers to expand the use of visual discovery tools and curated datasets to enable visual discovery (exploration, mining and analysis via interactive visual interfaces) of bioinformatics results from high-quality genomics research.
Methods: We are developing a system of visual analytics resources that are based on molecular and clinical data including molecular consequences of single nucleotide variants; the RNA-seq expression levels of transcripts; and the functional sites in protein sequences.
Results: We have developed an initial set of visual analytics resources with the use case as the major intrinsic protein family of water and glycerol transporters. Members of these protein family have been implicated in diverse cardiometabolic diseases. The computational resources developed can be adapted for gene lists including those obtained from high-throughput assays. The long-term goal of the project is to empower researchers to make discoveries from largescale molecular and clinical datasets to support decision-making on genetic and environmental determinants of cardiometabolic diseases in Africa.
4. Map of Africa showing the distribution
of nodes in the H3ABioNet network
5. H3Africa: Bioinformatics Network
• H3ABioNet: a sustainable African
Bioinformatics Network for H3Africa
The network provide:
• computational infrastructure and hardware,
• human resources,
• tools and computational solutions for genomic and population-based research,
and
• communications among African researchers and other interested parties.
These aims are be achieved by:
• providing user support,
• training and capacity development,
• research and tools development, and
• outreach and communication.
6. ORGANIZATION OF THE HVP Nigeria Node
ICCAC Country Representative : Prof. Oyekanmi Nash,
Alternate Representative: Hadiza Rasheed-Jada
Reports directly to the DG/CEO, NABDA/FMST
7. ORGANIZATION OF THE HVP NIGERIA NODE II
The staff members of the Node include:
• Alternate Representative - Hadiza Rasheed-Jada
• Node Manager - Atinuke Hassan
• Systems Administrator - Adekunle Farouk
• Research Associates - Abimbola Kashim
- Deborah Fasesan
- Taoheed Abdulkareem
- Ayodele Fakoya
- Adijat Ozohu Jimoh
• Post-doctoral Researcher - Dr. Segun Fatumo
8. Background – Cardiometabolic Diseases
• Worldwide cardiometabolic diseases are the major causes of:
• Disability; Rising Healthcare Costs and Deaths
• Examples:
• Type 2 diabetes, hypertension, dyslipidemia, coronary
heart disease and chronic kidney disease
• Over the next 7 years
• Africa is projected to experience the largest increase in
death rates from cardiovascular disease, cancer,
respiratory disease and diabetes (Aikins et al., 2010)
Noncommunicable Diseases AFR - 2015 AFR - 2030 Fold Change
Diabetes mellitus 205,378.79 390,614.91 1.90
Malignant neoplasms 521,029.65 966,876.53 1.86
Other neoplasms 20,155.67 37,375.03 1.85
Cardiovascular diseases 1,179,320.20 1,966,212.66 1.67
Respiratory diseases 234,649.72 356,651.78 1.52
Source: Global Health Estimates (GHE) 2013: Deaths by age, sex and cause
9. A Strategy in Africa to Address Burden of
Cardiometabolic Diseases
• Genomic and Environmental Determinants
(H3Africa Projects)
• H3Africa Kidney Disease Research Network
• Genomic and environmental risk factors for
cardiometabolic disease in Africans
• Burden, spectrum and etiology of type 2
diabetes in sub-Saharan Africa
• …..
10. Examples of Projected Massive and Complex
Datasets from H3Africa Projects (2013….
Type 2 Diabetes Project
• 12,000 Cases and 12,000 Controls
• Sequencing of known T2DM regions
• Genome-wide genotyping arrays
• Whole exome/genome sequencing
Body Composition Project
• African genome structure
• Phenotyping and sampling for Cohorts
• Genetic and environmental contribution to body
composition (~12,000 individuals)
These research investigations rely significantly on bioinformatics
analysis and inferences from large and heterogeneous datasets
obtained from populations inside and outside Africa.
12. “The major bottleneck in genome sequencing is
no longer data generation—the computational
challenges around data analysis, display and
integration are now rate limiting. New
approaches and methods are required to meet
these challenges”.
National Human Genome Research Institute Strategic Plan:
Charting a course for genomic medicine from base pairs to bedside
http://www.genome.gov/Pages/About/Planning/2011NHGRIStrategicPlan.pdf
Making Discoveries from the Massive and Complex
Genomics Datasets and Bioinformatics Results
from H3Africa Projects
13. Visual Discovery Tools
Visual Discovery Tasks
• Exploration
• Mining
• Analysis
To access and analyze data visually at the speed of thought with
minimal or no IT assistance and then share the results of their
discoveries with colleagues, usually in the form of an interactive
dashboard
Benefits
• Data sharing
• Collaboration
• Easy to Deploy
• Research in Limited or No Internet
Access
14. What is Visual Analytics?
http://www.slideshare.net/TableauSoftware/visual-analytics-best-practices
“Visual analytics is the representation and
presentation of data that exploits our visual
perception abilities in order to amplify
cognition.”
- Andy Kirk, author of “Data Visualization: a
successful design process”
19. H3ABioNet Workshop: Visual Analytics of
Human Genomics Variation Datasets
July 2013
Opportunities exist for African researchers to expand the use of visual
discovery tools and curated datasets to enable visual discovery
(exploration, mining and analysis via interactive visual interfaces) of
bioinformatics results from high-quality genomics research.
20.
21. Long-Term Goal of Project
• Visual Analytical System for
• discovery of molecular consequences of variants and linked
transcript expression for sets of genes or gene families
http://www.ensembl.org/info/genome/variation/predicted_data.html
Molecular Consequences of Gene Variants
Transcripts
22. Research Approach
Obtain Datasets
Ensembl Genome Browser (www.ensembl.org)
BioMart for genes and variants
Database of Alternate Transcript Expression
Data Download for transcript expression values
Data Cleaning and Preparation
Scripting and Spreadsheets
Construct Views and Dashboards
To address scientific questions such as:
Identify molecular consequences of gene variants (Single
Nucleotide Variants) in specific disease or trait.
Identify gene variants that result in multiple molecular
consequences in gene transcripts.
Identify gene variant specific for transcript
Compare RNA-Seq expression values for gene transcripts in
tissues.
23. Use Case – Gene Families
AQUAPORIN – Water and glycerol transporter
13 Mammalian Aquaporins (AQP0-AQP12).
Malfunction or absence linked to disease.
Adipose AQP7 deficiency is associated with an increase of
intracellular glycerol content.
Up-regulation of AQP1 in the glomeruli of most diseased
kidneys.
Reference: Hibuse et al. (2005). Aquaporin 7 deficiency is associated with development of
obesity through activation of adipose glycerol kinase. Proc Natl Acad Sci U S A. 2005 Aug
2;102(31):10993-8. http://www.ncbi.nlm.nih.gov/pubmed/16009937
25. Visual Analytical System for Screening Disease Linked Gene Variants
Integrates data from ENSEMBL and Database of Alternate Transcript Expression (DBATE)
DataSources
Blending of Data Dimensions from multiple Data Sources
Identifies Variants linked to Transcripts
Insights: rs199936776 is unique to AQP7-004 and could affect
expression of transcript or properties of protein isoform
27. Summary
In Africa, researchers will be able to use visual discovery tools to make
DISCOVERIES from large-scale molecular and clinical datasets to support
decision-making on genetic and environmental determinants of
cardiometabolic diseases.
Visual Analytics can facilitate collaboration between
Data Experts and Subject Matter Experts
28. Acknowledgments
• H3Africa Bioinformatics Network (H3ABioNet)
– National Human Genome Research Institute
– NIH Common Fund
– Grant U41HG006941
• National Institutes of Health
• Dr. Raphael Isokpehi, Bethune-Cookman University,
Florida, USA
• National Biotechnology Development Agency, Federal
Ministry of Science and Technology, Nigeria
• Visual Analytics in Biology Curriculum Network