SlideShare a Scribd company logo
1 of 35
Download to read offline
A short introduction to single cell
RNA-seq analyses
Nathalie Vialaneix
January 17th, 2019 - Biopuces
Unité MIAT, INRA Toulouse
Sources
These slides have been made using previous presentations
from:
• Delphine Labourdette (LISBP) - diaporama
• Cathy Maugis (IMT) - diaporama
• Franck Picard (LBBE, Lyon) - diaporama
2
Simple description of single cell
datasets
3
10x Genomics Chromium
4
A few remarks
• barcoding is used to index each cell
• UMI are used to index each transcript and correct the
amplification bias during library preparation
• droplet technology does not allow for spike-ins (which
would be useful for normalization)
• droplets sometimes include duplicates or triplicates (more
frequent in cancer cells; estimated at ~0-10% of the
droplets, depending on the number of cells, it increases with
the number of cells )
• many other sc technologies (check Delphine’s slides)
5
Single-cell technologies
[Regev et al., 2017]
6
Why single cell?
From a statistical perspective…
From 10x Genomics
7
Standard analyses and tools
• normalization and dimension reduction
• clustering
• differential expression
can be performed using:
• the bioconductor workflow “single cell” (that uses
the packages scatter and scran)
• the all-in-one pipeline “seurat”
8
Description of datasets and
requests from project TregDiab
9
Datasets
• Count dataset (as produced by Claire) with n = 8, 273
cells and p = 27, 998 genes (Unique Molecular Identifier)
• Metadata:
• on cells: barcode (identifies the cell), group (IL15 or
IL2) and genotype (WT or KO)
• on genes: ENSEMBLE gene name and Gene name
• Frequency distribution of conditions over cells:
WT KO
IL15 2452 1609
IL2 2175 2037
The rest of the analysis will focus on cells coming from WT
samples.
10
Questions
1. On the whole population of cells (not taking into account
groups and genotypes), perform a typology of cells
(unsupervised clustering).
2. Identify markers (genes) that are specific of each cell type.
11
Data cleaning and normalization
12
Different steps of the normalization
1. Quality control of the cells: library size distribution, number of
expressed genes distribution, mitocondrial proportion distribution.
⇒ Atypical cells are removed from the analysis.
2. Cell cycle classification.
⇒ Only cells in G1 phase are used for the analysis.
3. Quality control of genes: average count distribution, number of
cells in which the gene is expressed.
⇒ Atypical genes (lowly expressed) are removed from the
analysis.
4. Normalization of cell specific biases: size factor to correct
library sizes are computed after a first (crude) clustering.
What has not been done: Doublet detection 13
Quality control of cells
14
Filtering low quality cells
• remove cells with low library size
• remove cells with a low number of expressed genes
• remove cells with a too large number of mitocondrial
genes
⇒ 4,282 remaining cells (out of 4,627 original cells)
15
Cell cycle classification
Cell cycle classification is performed using cyclone (R
package scran): based on a model that has been trained on
specific markers of cell cycles (for mouse and human) ⇒ only
cells in G1 phase are used in the analyses (to remove mitosis
effects)
IL15 IL2
G1 2027 1871
G2 148 99
S 84 53
16
Gene quality
distribution of high average log
expressed genes expression distribution
17
Filtering atypical genes
Removed non variable genes: 13, 629 with a variance equal to
0 (48.7%).
low expressed genes genes expressed in few cells
⇒ 10,418 remaining genes (out of 27,998 initial genes)
18
Normalization
Normalization is performed after similar cells have been
clustered together (based on the most expressed genes; R
package scater).
⇒ Scaling factors of library size are obtained (similar to
RNA-seq, one can even normalize the library size as in edgeR).
19
Dimension reduction and
clustering
20
Standard approach for exploratory analysis
• dimension reduction (PCA, nearest neighbors graphs…)
• visualization (PCA, or t-SNE based on PCA or on any
other dimension reduction)
• clustering
21
PCA (all genes)
22
t-SNE (perplexity: 50, R package scater)
23
What does t-SNE?
If cell expressions are noted x1, …, xn (n cells, xi is in Rp
), then
• compute a similaritly between samples with:
pi|j =
exp(−γ2
∥xi − xj∥2
)
∑
k̸=j exp(−γ2∥xk − xj∥2)
• search for representation in R2
, y1, …, yn with a similarity
between points in the new representation based on:
qi|j =
exp(−∥yi − yj∥2
)
∑
k̸=j exp(−∥yk − yj∥2)
• based on the minimization KL divergence between p and q
But: the objective function is not convex and the results are
very sensitive to γ (perplexity) and to the initialization
24
t-SNE: remarks
• t-SNE is good at representing local distances but not
global ones (non linear dimension reduction)
• the perplexity can change a lot the representation (no
good values found for this dataset)
• the population of cells seem very homogeneous and
not related to the genotype
(the same is observed on PCA projection)
How could we improve that? Use log / raw expression, base
the algorithm on PCA results, try a wider range of perplexity
values…?
25
Clustering
• extract t-SNE coordinates
• use HAC on those
Other approaches for clustering
• use a NN network + clustering of graph (Louvain
algorithm that optimizes the modularity)
• use other dimension reduction methods and perform any
clustering algorithm
⇒ results are different (visualization can even be extremely
different)
26
Clustering results
27
Conditions in clusters
28
Exploratory analysis of markers
automatic detection prior knowledge
29
Not too bad for some known markers…
30
General overview of sc models in
statistics
31
How bad is the situation in single cell data?
Overdispersion is mainly biological because diversity is high
between cells
32
Expression is a bursty process: zeros are biological
33
sc Differential Expression Analysis with ZINB
[Risso et al., 2018] - package zinbwave
For cell i, gene j in condition r, gene expression is modeled by:
Xijr ∼ πijrδ0 + (1 − πijr)NB(µijr)
Remaining problems:
• We are not really able to discriminate low expression from
no expression
• Estimation is hard (use of a Bayesian framework to
address this issue)
• a similar method exists for PCA [Durif et al., 2018]
34
References
Durif, G., Modolo, L., Mold, J., Lambert-Lacroix, S., and
Picard, F. (2018).
Probabilistic count matrix factorization for single
cell expression data analysis.
In Raphael, B. J., editor, Proceedings of Research in
Computational Biology (RECOMB 2018), volume 10812
of Lecture Notes in Computer Science, pages 254–255,
Paris, France. Springer.
Regev, A., Teichmann, S. A., Lander, E. S., Amit, I.,
Benoist, C., Birney, E., Bodenmiller, B., Campbell, P.,
Carninci, P., Clatworthy, M., Clevers, H., Deplancke, B.,
Dunham, I., Eberwine, J., Eils, R., Enard, W., Farmer, A.,
Fugger, L., Göttgens, B., Hacohen, N., Haniffa, M., 35

More Related Content

What's hot

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single CellQIAGEN
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingPALANIANANTH.S
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNAmaryamshah13
 
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1QIAGEN
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basicsUSD Bioinformatics
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Hamza Khan
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingUzma Jabeen
 
FastQC and Prinseqlite
FastQC and PrinseqliteFastQC and Prinseqlite
FastQC and PrinseqliteRavi Gandham
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyQIAGEN
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
 
Understanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysUnderstanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysCandy Smellie
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018David Cook
 

What's hot (20)

Rna seq
Rna seqRna seq
Rna seq
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
NGS.pptx
NGS.pptxNGS.pptx
NGS.pptx
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basics
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
FastQC and Prinseqlite
FastQC and PrinseqliteFastQC and Prinseqlite
FastQC and Prinseqlite
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Real Time PCR
Real Time PCRReal Time PCR
Real Time PCR
 
Understanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysUnderstanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assays
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 

Similar to A short introduction to single-cell RNA-seq analyses

20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生ysuzuki-naist
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Aditya K G
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDivyanshGupta922023
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2AdamCribbs1
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 

Similar to A short introduction to single-cell RNA-seq analyses (20)

May workshop
May workshopMay workshop
May workshop
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
12 arrays
12 arrays12 arrays
12 arrays
 
12 arrays
12 arrays12 arrays
12 arrays
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Microarrays
MicroarraysMicroarrays
Microarrays
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Arrays
ArraysArrays
Arrays
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Recently uploaded

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 

Recently uploaded (20)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 

A short introduction to single-cell RNA-seq analyses

  • 1. A short introduction to single cell RNA-seq analyses Nathalie Vialaneix January 17th, 2019 - Biopuces Unité MIAT, INRA Toulouse
  • 2. Sources These slides have been made using previous presentations from: • Delphine Labourdette (LISBP) - diaporama • Cathy Maugis (IMT) - diaporama • Franck Picard (LBBE, Lyon) - diaporama 2
  • 3. Simple description of single cell datasets 3
  • 5. A few remarks • barcoding is used to index each cell • UMI are used to index each transcript and correct the amplification bias during library preparation • droplet technology does not allow for spike-ins (which would be useful for normalization) • droplets sometimes include duplicates or triplicates (more frequent in cancer cells; estimated at ~0-10% of the droplets, depending on the number of cells, it increases with the number of cells ) • many other sc technologies (check Delphine’s slides) 5
  • 7. Why single cell? From a statistical perspective… From 10x Genomics 7
  • 8. Standard analyses and tools • normalization and dimension reduction • clustering • differential expression can be performed using: • the bioconductor workflow “single cell” (that uses the packages scatter and scran) • the all-in-one pipeline “seurat” 8
  • 9. Description of datasets and requests from project TregDiab 9
  • 10. Datasets • Count dataset (as produced by Claire) with n = 8, 273 cells and p = 27, 998 genes (Unique Molecular Identifier) • Metadata: • on cells: barcode (identifies the cell), group (IL15 or IL2) and genotype (WT or KO) • on genes: ENSEMBLE gene name and Gene name • Frequency distribution of conditions over cells: WT KO IL15 2452 1609 IL2 2175 2037 The rest of the analysis will focus on cells coming from WT samples. 10
  • 11. Questions 1. On the whole population of cells (not taking into account groups and genotypes), perform a typology of cells (unsupervised clustering). 2. Identify markers (genes) that are specific of each cell type. 11
  • 12. Data cleaning and normalization 12
  • 13. Different steps of the normalization 1. Quality control of the cells: library size distribution, number of expressed genes distribution, mitocondrial proportion distribution. ⇒ Atypical cells are removed from the analysis. 2. Cell cycle classification. ⇒ Only cells in G1 phase are used for the analysis. 3. Quality control of genes: average count distribution, number of cells in which the gene is expressed. ⇒ Atypical genes (lowly expressed) are removed from the analysis. 4. Normalization of cell specific biases: size factor to correct library sizes are computed after a first (crude) clustering. What has not been done: Doublet detection 13
  • 14. Quality control of cells 14
  • 15. Filtering low quality cells • remove cells with low library size • remove cells with a low number of expressed genes • remove cells with a too large number of mitocondrial genes ⇒ 4,282 remaining cells (out of 4,627 original cells) 15
  • 16. Cell cycle classification Cell cycle classification is performed using cyclone (R package scran): based on a model that has been trained on specific markers of cell cycles (for mouse and human) ⇒ only cells in G1 phase are used in the analyses (to remove mitosis effects) IL15 IL2 G1 2027 1871 G2 148 99 S 84 53 16
  • 17. Gene quality distribution of high average log expressed genes expression distribution 17
  • 18. Filtering atypical genes Removed non variable genes: 13, 629 with a variance equal to 0 (48.7%). low expressed genes genes expressed in few cells ⇒ 10,418 remaining genes (out of 27,998 initial genes) 18
  • 19. Normalization Normalization is performed after similar cells have been clustered together (based on the most expressed genes; R package scater). ⇒ Scaling factors of library size are obtained (similar to RNA-seq, one can even normalize the library size as in edgeR). 19
  • 21. Standard approach for exploratory analysis • dimension reduction (PCA, nearest neighbors graphs…) • visualization (PCA, or t-SNE based on PCA or on any other dimension reduction) • clustering 21
  • 23. t-SNE (perplexity: 50, R package scater) 23
  • 24. What does t-SNE? If cell expressions are noted x1, …, xn (n cells, xi is in Rp ), then • compute a similaritly between samples with: pi|j = exp(−γ2 ∥xi − xj∥2 ) ∑ k̸=j exp(−γ2∥xk − xj∥2) • search for representation in R2 , y1, …, yn with a similarity between points in the new representation based on: qi|j = exp(−∥yi − yj∥2 ) ∑ k̸=j exp(−∥yk − yj∥2) • based on the minimization KL divergence between p and q But: the objective function is not convex and the results are very sensitive to γ (perplexity) and to the initialization 24
  • 25. t-SNE: remarks • t-SNE is good at representing local distances but not global ones (non linear dimension reduction) • the perplexity can change a lot the representation (no good values found for this dataset) • the population of cells seem very homogeneous and not related to the genotype (the same is observed on PCA projection) How could we improve that? Use log / raw expression, base the algorithm on PCA results, try a wider range of perplexity values…? 25
  • 26. Clustering • extract t-SNE coordinates • use HAC on those Other approaches for clustering • use a NN network + clustering of graph (Louvain algorithm that optimizes the modularity) • use other dimension reduction methods and perform any clustering algorithm ⇒ results are different (visualization can even be extremely different) 26
  • 29. Exploratory analysis of markers automatic detection prior knowledge 29
  • 30. Not too bad for some known markers… 30
  • 31. General overview of sc models in statistics 31
  • 32. How bad is the situation in single cell data? Overdispersion is mainly biological because diversity is high between cells 32
  • 33. Expression is a bursty process: zeros are biological 33
  • 34. sc Differential Expression Analysis with ZINB [Risso et al., 2018] - package zinbwave For cell i, gene j in condition r, gene expression is modeled by: Xijr ∼ πijrδ0 + (1 − πijr)NB(µijr) Remaining problems: • We are not really able to discriminate low expression from no expression • Estimation is hard (use of a Bayesian framework to address this issue) • a similar method exists for PCA [Durif et al., 2018] 34
  • 35. References Durif, G., Modolo, L., Mold, J., Lambert-Lacroix, S., and Picard, F. (2018). Probabilistic count matrix factorization for single cell expression data analysis. In Raphael, B. J., editor, Proceedings of Research in Computational Biology (RECOMB 2018), volume 10812 of Lecture Notes in Computer Science, pages 254–255, Paris, France. Springer. Regev, A., Teichmann, S. A., Lander, E. S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., Clevers, H., Deplancke, B., Dunham, I., Eberwine, J., Eils, R., Enard, W., Farmer, A., Fugger, L., Göttgens, B., Hacohen, N., Haniffa, M., 35