SlideShare uma empresa Scribd logo
1 de 51
New insights into the human
genome by ENCODE
What is a gene???
ENCODE
• Union of genomic sequences encoding a
coherent set of potentially overlapping
functional products.
(Gerstein et al., 2007)
Its been ten years since scientists sequenced the human
genome
But What do all these letters????????
21,000 genes
ENCODE- the Encyclopedia of
DNA Elements has ANSWERS
Aiming to
delineate all of
the functional
elements
encoded in the
human genome
sequence
ENCODE Consortium
(The ENCODE Project Consortium, 2011)
Pilot Phase •2003-2007
Technology
development
phase
•2007-2012
•30 papers
Production
phase
ENCODE
Major methods
Data production and
initial analysis
Accessing ENCODE
data
Working with ENCODE
data
Data analysis
Limitations
Threads – Nature
explorer
Major Methods
(The ENCODE Project Consortium, 2004)
Overall data flow
(The ENCODE Project Consortium, 2011)
(The ENCODE Project Consortium, 2011)
RNA-seq – Isolation of RNA sequences followed by high-throughput
sequencing
CAGE – Capture of the methylated cap at the 5’end of RNA, followed
by high-throughput sequencing
RNA-PET – Simultaneous capture of RNAs with both a 5’methyl cap
and a poly(A) tail
ChIP-seq - Chromatin immunoprecipitation followed by sequencing
FAIRE-seq - Formaldehyde assisted isolation of regulatory
elements. Crosslinking, phenol extraction, and sequencing the DNA
fragments in the aqueous phase
(The ENCODE Project Consortium, 2011)
ENCODE cell types
(The ENCODE Project Consortium, 2011)
ENCODE data production and initial analyses
• Since 2007, ENCODE has developed methods and performed a large
number of sequence-based studies to map functional elements across
the human genome.
• The elements mapped (and approaches used) include
 RNA transcribed regions (RNA-seq, CAGE, RNA-PET and manual
annotation),
 Protein-coding regions (mass spectrometry),
 Transcription-factor-binding sites (ChIP-seq and DNase-seq),
 Chromatin structure (DNase-seq, FAIRE-seq, histone ChIP-seq),
 DNA methylation sites (RRBS assay)
(The ENCODE Project Consortium, 2012)
Transcribed and protein-coding regions
• In total, GENCODE-annotated exons of protein-coding genes cover 2.94% of the
genome or 1.22% for protein-coding exons.
• Protein-coding genes span 33.45% from the outermost start to stop codons, or
39.54% from promoter to poly(A) site.
• Additional protein-coding genes remain to be found.
• In addition, they annotated 8,801 automatically derived small RNAs and 9,640
manually curated long non-coding RNA (lncRNA) loci
• The GENCODE annotated 11,224 pseudogenes
(The ENCODE Project Consortium, 2012)
Process flow of experimental evaluation of
pseudogene transcription
Experimental validation
results showing the
transcription of pseudogenes
in different tissues
(Pei et al., 2012)
ENCODE gene and transcript annotations.
(The ENCODE Project Consortium, 2011)
RNA
• They sequenced RNA from different cell lines and multiple
subcellular fractions to develop an extensive RNA expression
catalogue.
• They used CAGE-seq (5’cap-targeted RNA isolation and
sequencing) to identify 62,403 (TSSs) in tier 1 and2 cell types
(The ENCODE Project Consortium, 2012)
A large majority of GENCODE elements are detected by
RNA-seq data
(Djebali et al., 2012)
Protein bound regions
• 119 different DNA-binding proteins and a number of RNA
polymerase components in 72 cell types using ChIP-seq
• Overall, 636,336 binding regions covering 231 mega bases
(8.1%) of the genome are enriched for regions bound by DNA-
binding proteins across all cell types.
(The ENCODE Project Consortium, 2012)
Occupancy of transcription factors and RNA
polymerase 2 on human chromosome 6p as
determined by ChIP-seq
(The ENCODE Project Consortium, 2011)
DNase I hypersensitive sites and footprinting
• Chromatin accessibility characterized by DNase I hypersensitivity
is the hallmark of regulatory DNA regions.
• 2.89 million unique, non-overlapping (DHSs) by DNase-seq in 125
cell types – lie distal to TSSs
• In tier 1 and tier 2 cell types - 205,109 DHSs per cell
type, encompassing an average of 1.0% of the genomic sequence
in each cell type, and 3.9% in aggregate.
(The ENCODE Project Consortium, 2012)
Density of DNase I cleavage sites for selected cell types
(Thurman et al., 2012)
• On average, 98.5% of the occupancy sites of transcription factors
mapped by ENCODE ChIP-seq
• Using genomic DNase I footprinting on 41 cell types they
identified 8.4million distinct DNase I footprints
(The ENCODE Project Consortium, 2012)
Regions of histone modification
• They assayed chromosomal locations for up to 12 histone
modifications and variants in 46 cell types, across tier 1 and 2.
(The ENCODE Project Consortium, 2012)(http://www.factorbook.org)
DNA methylation
• They used reduced representation bisulphite sequencing (RRBS)
to profile DNA methylation quantitatively for an average of 1.2
million CpGs in each of 82 cell lines and tissues (8.6% of non-
repetitive genomic CpGs), including CpGs in intergenic
regions, proximal promoters and intragenic regions.
(The ENCODE Project Consortium, 2012)
Proteomics
 To assess putative protein products generated from novel RNA
transcripts and isoforms, proteins are sequenced and quantified
by mass spectrometry and mapped back to their encoding
transcripts.
 K562 and GM12878 – protein study begun
(The ENCODE Project Consortium, 2011)
ENCODE chromatin annotations in the HLA
locus
(The ENCODE Project Consortium, 2011)
Accessing ENCODE Data
ENCODE Data Release and Use Policy
• The ENCODE Data Release and Use Policy is described at
http://www.encodeproject.org/ENCODE/terms.html.
• ENCODE data are released for viewing in a publicly accessible
browser (initially at http://genome-preview.ucsc.edu/ENCODE
and, after additional quality checks, at http://encodeproject.org)
Public Repositories
• UCSC Genome Browser database (http://genome.ucsc.edu).
(The ENCODE Project Consortium, 2011)
UCSC Portal
Working with ENCODE Data
Using ENCODE Data in the UCSC Browser
• Many users will want to view and interpret the ENCODE data for
particular genes of interest. At the online ENCODE portal
(http://encodeproject.org), users should follow a ‘‘Genome
Browser’’ link to visualize the data in the context of other genome
annotations.
(The ENCODE Project Consortium, 2011)
ENCODE Data Analysis
• Development and implementation of algorithms and pipelines for
processing and analyzing data - major activity of the ENCODE
Project.
•Short sequences
are aligned to
the reference
genome
1st Phase
•Identifying the
enriched regions
2nd Phase •Integrating the
identified regions
of enriched signal
with each other
and with other
data types
3rd Phase
(The ENCODE Project Consortium, 2011)
Analysis tools applied by the ENCODE
consortium
(The ENCODE Project Consortium, 2011)
Integrating ENCODE with other projects and the
Scientific Community
1. defining promoter and enhancer regions by combining transcript
mapping and biochemical marks,
2. delineating distinct classes of regions within the genomic
landscape by their specific combinations of biochemical and
functional characteristics, and
3. defining transcription factor co-associations and regulatory
networks.
(The ENCODE Project Consortium, 2011)
• ENCODE Project - interpretation of human genome variation that
is associated with disease or quantitative phenotypes
• Integrate with 1,000 Genomes Project - how SNPs and structural
variation may affect transcript, regulatory and DNA methylation
data
• ENCODE - GWAS and other sequence variation driven studies of
human phenotypes
Major contributor not only of data but also novel technologies for
deciphering the human genome
(The ENCODE Project Consortium, 2011)
Limitations of ENCODE Annotations
• Cell types - physiologically and genetically inhomogeneous.
• Local micro-environments in culture may also vary
• Use of DNA sequencing to annotate functional genomic features is
also constrained.
• Considerable quantitative variation in the signal strength along
the genome
(The ENCODE Project Consortium, 2011)
Challenges
• Adult human body contains several hundred distinct cell types
• Each of which expresses a unique subset of the 1,800 TFs
encoded in the human genome
• Brain alone contains thousands of types of neurons that are likely
to express not only different sets of TFs but also a larger variety
of non-coding RNAs
• A truly comprehensive atlas of human functional elements is not
practical with current technologies
(The ENCODE Project Consortium, 2011)
Outcome
• Understanding of the human genome
• The broad coverage of ENCODE annotations enhances our
understanding of common diseases with a genetic
component, rare genetic diseases
• 119 of 1,800 known transcription factors and 13 of more than 60
currently known histone or DNA modifications across 147 cell
types
• Overall these data reflect a minor fraction of the potential
functional information encoded in the human genome
(The ENCODE Project Consortium, 2012)
http://www.nature.com/encode/#/threads
13 Threads
1. Transcription factor motifs
2. Chromatin patterns at transcription factor binding sites
3. Characterization of intergenic regions and gene definition
4. RNA and chromatin modification patterns around promoters
5. Epigenetic regulation of RNA processing
6. Non-coding RNA characterization
7. DNA methylation
8. Enhancer discovery and characterization
9. Three-dimensional connections across the genome
10. Characterization of network topology
11. Machine learning approaches to genomics
12. Impact of functional information on understanding variation
13. Impact of evolutionary selection on functional regions
Schematic overview of the functional SNP
approach
(Schaub et al., 2012)
Comparison of GWAS identified loci with
ENCODE data
(Boyle et al., 2012)
Future goal
• Mechanistic processes that generate these elements and how and
where they function
• Enlarge the data set to additional factors, modifications and cell
types, complementing the other related projects
• Constitute foundational resources for human genomics, allowing a
deeper interpretation of the organization of gene and regulatory
information and the mechanisms of regulation, and thereby
provide important insights into human health and disease
(The ENCODE Project Consortium, 2012)
Project is still far from complete
Conclusion
For update: https://www.facebook.com/ENCODEProject
Encode – assign word to letter
Thank you:)
Presented by: R. Veera Ranjani

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Yeast Genome
Yeast Genome Yeast Genome
Yeast Genome
 
Genomics experimental-methods
Genomics experimental-methodsGenomics experimental-methods
Genomics experimental-methods
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Genomics and Plant Genomics
Genomics and Plant GenomicsGenomics and Plant Genomics
Genomics and Plant Genomics
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Project
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished?
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part I
 
Computational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKComputational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IK
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
HGP, the human genome project
HGP, the human genome projectHGP, the human genome project
HGP, the human genome project
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Genomics 101 jun 15 2012
Genomics 101 jun 15 2012Genomics 101 jun 15 2012
Genomics 101 jun 15 2012
 
The Human Genome Project
The Human Genome Project The Human Genome Project
The Human Genome Project
 
Yeast genome project
Yeast genome projectYeast genome project
Yeast genome project
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics
 

Destaque

Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...QIAGEN
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsfaraharooj
 
Normalization of Illumina 450 DNA methylation data
Normalization of Illumina 450 DNA methylation dataNormalization of Illumina 450 DNA methylation data
Normalization of Illumina 450 DNA methylation dataBrock Donovan
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single CellQIAGEN
 
DNA Methylation Data Analysis
DNA Methylation Data AnalysisDNA Methylation Data Analysis
DNA Methylation Data AnalysisYi-Feng Chang
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single CellQIAGEN
 
Analyzing RRBS methylation data
Analyzing RRBS methylation dataAnalyzing RRBS methylation data
Analyzing RRBS methylation dataAltuna Akalin
 

Destaque (16)

Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
 
sequencing-methods-review
sequencing-methods-reviewsequencing-methods-review
sequencing-methods-review
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Normalization of Illumina 450 DNA methylation data
Normalization of Illumina 450 DNA methylation dataNormalization of Illumina 450 DNA methylation data
Normalization of Illumina 450 DNA methylation data
 
2015 10 21_pathology_wim_vancriekinge
2015 10 21_pathology_wim_vancriekinge2015 10 21_pathology_wim_vancriekinge
2015 10 21_pathology_wim_vancriekinge
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
DNA Methylation Data Analysis
DNA Methylation Data AnalysisDNA Methylation Data Analysis
DNA Methylation Data Analysis
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Dna methylation
Dna methylationDna methylation
Dna methylation
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single Cell
 
Analyzing RRBS methylation data
Analyzing RRBS methylation dataAnalyzing RRBS methylation data
Analyzing RRBS methylation data
 

Semelhante a New insights into the human genome by encode 14.12.12

Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalENCODE-DCC
 
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...IBM India Smarter Computing
 
DNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysisDNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysisjordanpeccia
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Dr. Mukesh Chavan
 
Epigenetic Analysis Sequencing
Epigenetic Analysis SequencingEpigenetic Analysis Sequencing
Epigenetic Analysis SequencingLisa Martinez
 
Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3stanislas547
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Monica Munoz-Torres
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
second generation of DNA Sequencing
second generation of DNA Sequencingsecond generation of DNA Sequencing
second generation of DNA SequencingSidra Shaffique
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymesGetachew Birhanu
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Why Transcriptome? Why RNA-Seq?  ENCODE answers….Why Transcriptome? Why RNA-Seq?  ENCODE answers….
Why Transcriptome? Why RNA-Seq? ENCODE answers….Mohammad Hossein Banabazi
 
Duons parallel gene code defies evolution
Duons parallel gene code defies evolutionDuons parallel gene code defies evolution
Duons parallel gene code defies evolutionabdulaziz mikail
 

Semelhante a New insights into the human genome by encode 14.12.12 (20)

Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
 
DNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysisDNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysis
 
Microbial physiology in genomic era
Microbial physiology in genomic eraMicrobial physiology in genomic era
Microbial physiology in genomic era
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
Epigenetic Analysis Sequencing
Epigenetic Analysis SequencingEpigenetic Analysis Sequencing
Epigenetic Analysis Sequencing
 
Bhojeshwari sahu
Bhojeshwari sahuBhojeshwari sahu
Bhojeshwari sahu
 
Arrays
ArraysArrays
Arrays
 
Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3
 
functional genomics.ppt
functional genomics.pptfunctional genomics.ppt
functional genomics.ppt
 
12 arrays
12 arrays12 arrays
12 arrays
 
12 arrays
12 arrays12 arrays
12 arrays
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
second generation of DNA Sequencing
second generation of DNA Sequencingsecond generation of DNA Sequencing
second generation of DNA Sequencing
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Why Transcriptome? Why RNA-Seq?  ENCODE answers….Why Transcriptome? Why RNA-Seq?  ENCODE answers….
Why Transcriptome? Why RNA-Seq? ENCODE answers….
 
Duons parallel gene code defies evolution
Duons parallel gene code defies evolutionDuons parallel gene code defies evolution
Duons parallel gene code defies evolution
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

New insights into the human genome by encode 14.12.12

  • 1. New insights into the human genome by ENCODE
  • 2. What is a gene??? ENCODE • Union of genomic sequences encoding a coherent set of potentially overlapping functional products. (Gerstein et al., 2007)
  • 3. Its been ten years since scientists sequenced the human genome But What do all these letters????????
  • 5. ENCODE- the Encyclopedia of DNA Elements has ANSWERS Aiming to delineate all of the functional elements encoded in the human genome sequence
  • 6. ENCODE Consortium (The ENCODE Project Consortium, 2011)
  • 8.
  • 9. ENCODE Major methods Data production and initial analysis Accessing ENCODE data Working with ENCODE data Data analysis Limitations Threads – Nature explorer
  • 10. Major Methods (The ENCODE Project Consortium, 2004)
  • 11. Overall data flow (The ENCODE Project Consortium, 2011)
  • 12. (The ENCODE Project Consortium, 2011)
  • 13. RNA-seq – Isolation of RNA sequences followed by high-throughput sequencing CAGE – Capture of the methylated cap at the 5’end of RNA, followed by high-throughput sequencing RNA-PET – Simultaneous capture of RNAs with both a 5’methyl cap and a poly(A) tail ChIP-seq - Chromatin immunoprecipitation followed by sequencing FAIRE-seq - Formaldehyde assisted isolation of regulatory elements. Crosslinking, phenol extraction, and sequencing the DNA fragments in the aqueous phase
  • 14. (The ENCODE Project Consortium, 2011)
  • 15. ENCODE cell types (The ENCODE Project Consortium, 2011)
  • 16. ENCODE data production and initial analyses • Since 2007, ENCODE has developed methods and performed a large number of sequence-based studies to map functional elements across the human genome. • The elements mapped (and approaches used) include  RNA transcribed regions (RNA-seq, CAGE, RNA-PET and manual annotation),  Protein-coding regions (mass spectrometry),  Transcription-factor-binding sites (ChIP-seq and DNase-seq),  Chromatin structure (DNase-seq, FAIRE-seq, histone ChIP-seq),  DNA methylation sites (RRBS assay) (The ENCODE Project Consortium, 2012)
  • 17. Transcribed and protein-coding regions • In total, GENCODE-annotated exons of protein-coding genes cover 2.94% of the genome or 1.22% for protein-coding exons. • Protein-coding genes span 33.45% from the outermost start to stop codons, or 39.54% from promoter to poly(A) site. • Additional protein-coding genes remain to be found. • In addition, they annotated 8,801 automatically derived small RNAs and 9,640 manually curated long non-coding RNA (lncRNA) loci • The GENCODE annotated 11,224 pseudogenes (The ENCODE Project Consortium, 2012)
  • 18. Process flow of experimental evaluation of pseudogene transcription Experimental validation results showing the transcription of pseudogenes in different tissues (Pei et al., 2012)
  • 19. ENCODE gene and transcript annotations. (The ENCODE Project Consortium, 2011)
  • 20. RNA • They sequenced RNA from different cell lines and multiple subcellular fractions to develop an extensive RNA expression catalogue. • They used CAGE-seq (5’cap-targeted RNA isolation and sequencing) to identify 62,403 (TSSs) in tier 1 and2 cell types (The ENCODE Project Consortium, 2012)
  • 21. A large majority of GENCODE elements are detected by RNA-seq data (Djebali et al., 2012)
  • 22. Protein bound regions • 119 different DNA-binding proteins and a number of RNA polymerase components in 72 cell types using ChIP-seq • Overall, 636,336 binding regions covering 231 mega bases (8.1%) of the genome are enriched for regions bound by DNA- binding proteins across all cell types. (The ENCODE Project Consortium, 2012)
  • 23. Occupancy of transcription factors and RNA polymerase 2 on human chromosome 6p as determined by ChIP-seq
  • 24. (The ENCODE Project Consortium, 2011)
  • 25. DNase I hypersensitive sites and footprinting • Chromatin accessibility characterized by DNase I hypersensitivity is the hallmark of regulatory DNA regions. • 2.89 million unique, non-overlapping (DHSs) by DNase-seq in 125 cell types – lie distal to TSSs • In tier 1 and tier 2 cell types - 205,109 DHSs per cell type, encompassing an average of 1.0% of the genomic sequence in each cell type, and 3.9% in aggregate. (The ENCODE Project Consortium, 2012)
  • 26. Density of DNase I cleavage sites for selected cell types (Thurman et al., 2012)
  • 27. • On average, 98.5% of the occupancy sites of transcription factors mapped by ENCODE ChIP-seq • Using genomic DNase I footprinting on 41 cell types they identified 8.4million distinct DNase I footprints (The ENCODE Project Consortium, 2012)
  • 28. Regions of histone modification • They assayed chromosomal locations for up to 12 histone modifications and variants in 46 cell types, across tier 1 and 2. (The ENCODE Project Consortium, 2012)(http://www.factorbook.org)
  • 29. DNA methylation • They used reduced representation bisulphite sequencing (RRBS) to profile DNA methylation quantitatively for an average of 1.2 million CpGs in each of 82 cell lines and tissues (8.6% of non- repetitive genomic CpGs), including CpGs in intergenic regions, proximal promoters and intragenic regions. (The ENCODE Project Consortium, 2012)
  • 30. Proteomics  To assess putative protein products generated from novel RNA transcripts and isoforms, proteins are sequenced and quantified by mass spectrometry and mapped back to their encoding transcripts.  K562 and GM12878 – protein study begun (The ENCODE Project Consortium, 2011)
  • 31. ENCODE chromatin annotations in the HLA locus (The ENCODE Project Consortium, 2011)
  • 32. Accessing ENCODE Data ENCODE Data Release and Use Policy • The ENCODE Data Release and Use Policy is described at http://www.encodeproject.org/ENCODE/terms.html. • ENCODE data are released for viewing in a publicly accessible browser (initially at http://genome-preview.ucsc.edu/ENCODE and, after additional quality checks, at http://encodeproject.org) Public Repositories • UCSC Genome Browser database (http://genome.ucsc.edu). (The ENCODE Project Consortium, 2011)
  • 34. Working with ENCODE Data Using ENCODE Data in the UCSC Browser • Many users will want to view and interpret the ENCODE data for particular genes of interest. At the online ENCODE portal (http://encodeproject.org), users should follow a ‘‘Genome Browser’’ link to visualize the data in the context of other genome annotations. (The ENCODE Project Consortium, 2011)
  • 35. ENCODE Data Analysis • Development and implementation of algorithms and pipelines for processing and analyzing data - major activity of the ENCODE Project. •Short sequences are aligned to the reference genome 1st Phase •Identifying the enriched regions 2nd Phase •Integrating the identified regions of enriched signal with each other and with other data types 3rd Phase (The ENCODE Project Consortium, 2011)
  • 36. Analysis tools applied by the ENCODE consortium (The ENCODE Project Consortium, 2011)
  • 37. Integrating ENCODE with other projects and the Scientific Community 1. defining promoter and enhancer regions by combining transcript mapping and biochemical marks, 2. delineating distinct classes of regions within the genomic landscape by their specific combinations of biochemical and functional characteristics, and 3. defining transcription factor co-associations and regulatory networks. (The ENCODE Project Consortium, 2011)
  • 38. • ENCODE Project - interpretation of human genome variation that is associated with disease or quantitative phenotypes • Integrate with 1,000 Genomes Project - how SNPs and structural variation may affect transcript, regulatory and DNA methylation data • ENCODE - GWAS and other sequence variation driven studies of human phenotypes Major contributor not only of data but also novel technologies for deciphering the human genome (The ENCODE Project Consortium, 2011)
  • 39. Limitations of ENCODE Annotations • Cell types - physiologically and genetically inhomogeneous. • Local micro-environments in culture may also vary • Use of DNA sequencing to annotate functional genomic features is also constrained. • Considerable quantitative variation in the signal strength along the genome (The ENCODE Project Consortium, 2011)
  • 40. Challenges • Adult human body contains several hundred distinct cell types • Each of which expresses a unique subset of the 1,800 TFs encoded in the human genome • Brain alone contains thousands of types of neurons that are likely to express not only different sets of TFs but also a larger variety of non-coding RNAs • A truly comprehensive atlas of human functional elements is not practical with current technologies (The ENCODE Project Consortium, 2011)
  • 41. Outcome • Understanding of the human genome • The broad coverage of ENCODE annotations enhances our understanding of common diseases with a genetic component, rare genetic diseases • 119 of 1,800 known transcription factors and 13 of more than 60 currently known histone or DNA modifications across 147 cell types • Overall these data reflect a minor fraction of the potential functional information encoded in the human genome (The ENCODE Project Consortium, 2012)
  • 43. 13 Threads 1. Transcription factor motifs 2. Chromatin patterns at transcription factor binding sites 3. Characterization of intergenic regions and gene definition 4. RNA and chromatin modification patterns around promoters 5. Epigenetic regulation of RNA processing 6. Non-coding RNA characterization 7. DNA methylation 8. Enhancer discovery and characterization 9. Three-dimensional connections across the genome 10. Characterization of network topology 11. Machine learning approaches to genomics 12. Impact of functional information on understanding variation 13. Impact of evolutionary selection on functional regions
  • 44. Schematic overview of the functional SNP approach (Schaub et al., 2012)
  • 45. Comparison of GWAS identified loci with ENCODE data
  • 46.
  • 47. (Boyle et al., 2012)
  • 48. Future goal • Mechanistic processes that generate these elements and how and where they function • Enlarge the data set to additional factors, modifications and cell types, complementing the other related projects • Constitute foundational resources for human genomics, allowing a deeper interpretation of the organization of gene and regulatory information and the mechanisms of regulation, and thereby provide important insights into human health and disease (The ENCODE Project Consortium, 2012)
  • 49. Project is still far from complete Conclusion For update: https://www.facebook.com/ENCODEProject
  • 50. Encode – assign word to letter
  • 51. Thank you:) Presented by: R. Veera Ranjani

Notas do Editor

  1. These analyses reveal that the human genome encodes a diversearray of transcripts. For example, in the proto-oncogene TP53locus, RNA-seq data indicate that, while TP53transcripts areaccurately assigned to the minus strand, those for the oppositelytranscribed, adjacent geneWRAP53emanate from the plus strand(Figure 3). An independent transcript within the first intron ofTP53is also observed in both GM12878 and K562 cells (Figure 3).
  2. Theupper portion shows the ChIP-seq signal of five sequence-specific transcription factors and RNA Pol2 throughout the 58.5 Mb of the short arm ofhuman chromosome 6 of the human lymphoblastoid cell line GM12878. Input control signal is shown below the RNA Pol2 data. At this level ofresolution, the sites of strongest signal appear as vertical spikes in blue next to the name of each experiment (‘‘BATF,’’ ‘‘EBF,’’ etc.).
  3. 116 kb segment of the HLA region is expanded; here, individual sites of occupancy can be seen mappingto specific regions of the three HLA genes shown at the bottom, with asterisks indicating binding sites called by peak calling software. Finally, thelower left region shows a 3,500 bp region around two tandem histone genes, with RNA Pol2 occupancy at both promoters and two of the fivetranscription factors, BATF and cFos, occupying sites nearby.
  4. They organized all the information associated with each transcription factor including the ChIP-seq peaks, discovered motifs and associated histone modification patterns in FactorBook (http://www.factorbook.org), a public resource that will be updated as the project proceeds.
  5. After curation and review at the Data Coordination Center, all processed ENCODE data are publicly released to the UCSC Genome Browser database (http://genome.ucsc.edu).
  6. Three differenttypes of regulatory data are represented for an area of the genome: motif-based predictions, DNase I hypersensitivity peaks, and ChIP-seq peaks. Thisregion contains six SNPs. SNP1 is associated with a phenotype in a genome-wide association study. SNP3 is an eQTL associated with changes in geneexpression in a different study. SNP6 overlaps a predicted motif, a DNase Ihypersensitivity peak, and a ChIP-seq peak. There are, therefore, multiplesources of evidence that SNP6 is in a regulatory region. Furthermore,SNP6 is in perfect linkage disequilibrium (r2=1.0) with SNP1 and SNP3,meaning that there is transitive evidence due to the LD that SNP6 is alsoassociated with the phenotype and is also an eQTL. SNP6 is therefore themost likely functional SNP in this associated region.
  7. Aggregate overlap of phenotypes to selected transcription-factor-binding sites (left matrix) or DHSsin selected cell lines (right matrix), with a count of overlaps between thephenotype and the cell line/factor. Values in blue squares pass an empiricalP-value threshold#0.01 (based on the same analysis of overlaps betweenrandomly chosen, GWAS-matched SNPs and these epigenetic features) andhave at least a count of three overlaps. ThePvalue for the total number ofphenotype–transcription factor associations is,0.001
  8. several SNPsassociatedwithCrohn’s disease andotherinflammatorydiseases that reside inalarge gene desert on chromosome 5, along with some epigenetic featuresindicative of function. The SNP (rs11742570) strongly associated to Crohn’sdisease overlaps a GATA2 transcription-factor-binding signal determined inHUVECs. This region is also DNase I hypersensitive inHUVECsandT-helperTH1 andTH2 cells. An interactive version of this figure is available in the onlineversion of the paper
  9. Users are able to interface with our database by entering lists of SNVs or regions to identify common SNVs at http://www.RegulomeDB.org/ (a). They are then presented with a sorted list of the most important SNVs (b). These SNVs can be examined for the evidence used to rank them as well as a citation for the evidence.
  10. Scientists in the Encyclopedia of DNA Elements Consortium have applied 24 experiment types (across) to more than 150 cell lines (down) to assign functions to as many DNA regions as possible — but the project is still far from complete