SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
NGI stockholm
NGI-ChIPseq
Processing ChIP-seq data at the
National Genomics Infrastructure
Phil Ewels
phil.ewels@scilifelab.se
NBIS ChIP-seq tutorial
2017-11-29
NGI stockholm
SciLifeLab NGI
Our mission is to offer a

state-of-the-art infrastructure

for massively parallel DNA sequencing
and SNP genotyping, available to
researchers all over Sweden
NGI stockholm
SciLifeLab NGI
National resource
State-of-the-art
infrastructure
Guidelines and
support
We provide 

guidelines and support

for sample collection, study
design, protocol selection and
bioinformatics analysis
NGI stockholm
NGI Organisation
NGI Stockholm NGI Uppsala
NGI stockholm
NGI Organisation
Funding
Staff salaries
Premises and service
contracts
Capital equipment
Host universities
SciLifeLab
VR
KAW
User fees
Reagent costs
NGI Stockholm NGI Uppsala
NGI stockholm
Project timeline
Sample QC
Library
preparation,
Sequencing,
Genotyping
Data processing
and primary
analysis
Scientific support
and project
consultation
Data delivery
NGI stockholm
Methods offered at NGI
Exome

sequencing
Nanoporesequencing
ATAC-seq
Metagenomics
ChIP-seqBisulphite

sequencing
RAD-seq
RNA-seq
de novo
Whole
Genome
seq
Data
analysis
included for
FREE
Just

Sequencing
Accredited methods
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)

• Min 1 ng input
• Min 10 μl
• 0.2-10 ng/μl
• Ins. size 200-800 bp
• 963 kr / prep
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)
• Typically run SE 50bp

• Illumina HiSeq High
Output mode v4, SR
1x50bp
• 1226 kr / sample
(40M reads)
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)
• Typically run SE 50bp

• Start by organising a
planning meeting
https://ngisweden.scilifelab.se
NGI stockholm
ChIP-seq Pipeline
• Takes raw FastQ sequencing data as input

• Provides range of results

• Alignments (BAM)
• Peaks (optionally filtered)
• Quality Control
• Pipeline in use since early 2017 (on request)
NGI-ChIPseq
ChIP-seq Pipeline
FastQC

TrimGalore!

BWA

Samtools, Picard

Phantompeakqualtools 

deepTools

NGSPlot

MACS2

Bedtools

MultiQC
Sequence QC
Read trimming
Alignment
Sort, index, mark duplicates
Strand cross-correlation QC
Fingerprint, sample correlation
TSS / Gene profile plots
Peak calling
Filtering blacklisted regions
Reporting
NGI-ChIPseq
FastQ
BAM
HTML
BED
NGI stockholm
Nextflow
• Tool to manage computational pipelines

• Handles interaction with compute infrastructure

• Easy to learn how to run, minimal oversight required
NGI stockholm
Nextflow
https://www.nextflow.io/
NGI stockholm
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
}
https://www.nextflow.io/
NGI stockholm
Default: Run locally, assume
software is installed
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
} process {
executor = 'slurm'
clusterOptions = { "-A b2017123" }
cpus = 1
memory = 8.GB
time = 2.h
$fastqc {
module = ['bioinfo-tools', ‘FastQC']
}
}
Submit jobs to SLURM queue
Use environment modules
NGI stockholm
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
} process {
executor = 'slurm'
clusterOptions = { "-A b2017123" }
cpus = 1
memory = 8.GB
time = 2.h
$fastqc {
module = ['bioinfo-tools', ‘FastQC']
}
}
docker {
enabled = true
}
process {
container = 'biocontainers/fastqc'
cpus = 1
memory = 8.GB
time = 2.h
}
Run locally, use docker container
for all software dependencies
NGI stockholm
NGI-ChIPseq
https://github.com/SciLifeLab/NGI-ChIPseq
NGI stockholm
NGI-ChIPseq
https://github.com/SciLifeLab/NGI-ChIPseq
NGI stockholm
Running NGI-ChIPseq
Step 1: Install Nextflow

• Uppmax - load the Nextflow module
module load nextflow
• Anywhere (including Uppmax) - install Nextflow
curl -s https://get.nextflow.io | bash
Step 2: Try running NGI-ChIPseq pipeline

nextflow run SciLifeLab/NGI-ChIPseq --help
NGI stockholm
Running NGI-ChIPseq
Step 3: Choose your reference

• Common organism - use iGenomes
--genome GRCh37
• MACS peak calling config file
--macsconfig config.csv
Step 4: Organise your data

• One (if single-end) or two (if paired-end) FastQ per sample
• Everything in one directory, simple filenames help!
NGI stockholm
Running NGI-ChIPseq
Step 5: Run the pipeline on your data

• Remember to run detached from your terminal
screen / tmux / nohup
Step 6: Check your results

• Read the Nextflow log and check the MultiQC report
Step 7: Delete temporary files

• Delete the ./work directory, which holds all intermediates
NGI stockholm
Using UPPMAX
nextflow run SciLifeLab/NGI-ChIPseq
--project b2017123
--genome GRCh37 --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Default config is for UPPMAX

• Knows about central iGenomes references
• Uses centrally installed software
NGI stockholm
Using other clusters
nextflow run SciLifeLab/NGI-ChIPseq
-profile hebbe
--bwaindex ./ref --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Can run just about anywhere

• Supports local, SGE, LSF, SLURM, PBS/Torque,
HTCondor, DRMAA, DNAnexus, Ignite, Kubernetes
NGI stockholm
Using Docker
nextflow run SciLifeLab/NGI-ChIPseq
-profile docker
--fasta genome.fa --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Can run anywhere with Docker

• Downloads required software and runs in a container
• Portable and reproducible.
NGI stockholm
Using AWS
nextflow run SciLifeLab/NGI-ChIPseq
-profile aws
--genome GRCh37 --macsconfig p.txt
--reads "s3://my-bucket/*_{1,2}.fq.gz"
--outdir "s3://my-bucket/results/"
• Runs on the AWS cloud with Docker

• Pay-as-you go, flexible computing
• Can launch from anywhere with minimal configuration
NGI stockholm
Input data
ERROR ~ Cannot find any reads matching: XXXX
NB: Path needs to be enclosed in quotes!
NB: Path requires at least one * wildcard!
If this is single-end data, please specify

--singleEnd on the command line.
--reads '*_R{1,2}.fastq.gz'
--reads '*.fastq.gz' --singleEnd
--reads *_R{1,2}.fastq.gz
--reads '*.fastq.gz'
--reads sample.fastq.gz
NGI stockholm
Read trimming
• Pipeline runs TrimGalore! to remove adapter
contamination and low quality bases automatically

• Use --notrim to disable this
• Some library preps also include additional adapters

--clip_r1 [int]
--clip_r2 [int]
--three_prime_clip_r1 [int]
--three_prime_clip_r2 [int]
NGI stockholm
Blacklist filtering
• Some parts of the reference genome collect incorrectly
mapped reads

• Good practice to remove these peaks
• Pipeline has ENCODE regions for Human & Mouse

• Can pass own BED file of custom regions

--blacklist_filtering
--blacklist regions.bed
NGI stockholm
Broad Peaks
• Some chromatin profiles don't have narrow, sharp peaks

• For example, H3K9me3 & H3K27me3
• MACS2 can call peaks in "broad peak" mode

• Pipeline uses default qvalue cutoff of 0.1
--broad
NGI stockholm
Extending Read Length
• When using single-end data, sequenced read length is
shorter than the sequence fragment length

• For DeepTools, need to "extend" the read length

• Set to 100bp by default. Use this parameter to
customise this value.
• Expected fragment length - sequence read length
--extendReadsLen [int]
NGI stockholm
Saving intermediates
• By default, the pipeline doesn't save some intermediate
files to your final results directory

• Reference genome indices that have been built
• FastQ files from TrimGalore!
• BAM files from STAR (we have BAMs from Picard)
--saveReference
--saveTrimmed
--saveAlignedIntermediates
NGI stockholm
Resuming pipelines
• If something goes wrong, you can resume a stopped
pipeline

• Will use cached versions of completed processes
• NB: Only one hyphen!
-resume
• Can resume specific past runs

• Use nextflow log to find job names
-resume job_name
NGI stockholm
Customising output
-name
Give a name to your run. Used in logs
and reports
--outdir Specify the directory for saved results
--saturation
Run saturation analysis, subsampling
reads from 10% - 100%
--email
Get e-mailed a summary report when
the pipeline finishes
NGI stockholm
Nextflow config files
• Can save a config file with defaults

• Anything with two hyphens is a params
params {
email = 'phil.ewels@scilifelab.se'
project = "b2017123"
}
process.$multiqc.module = []
./nextflow.config
~/.nextflow/config
-c /path/to/my.config
NGI stockholm
NGI-ChIPseq config
N E X T F L O W ~ version 0.26.1
Launching `SciLifeLab/NGI-ChIPseq` [deadly_bose] - revision: 28e24c2a2a
=========================================
NGI-ChIPseq: ChIP-Seq Best Practice v1.4
=========================================
Run Name : deadly_bose
Reads : data/*fastq.gz
Data Type : Single-End
Genome : GRCh37
BWA Index : /sw/data/uppnex/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/
MACS Config : data/macsconfig.txt
Saturation analysis : false
MACS broad peaks : false
Blacklist filtering : false
Extend Reads : 100 bp
Current home : /home/phil
Current user : phil
Current path : /home/phil/demo_data/ChIP/Human/test
Working dir : /home/phil/demo_data/ChIP/Human/test/work
Output dir : ./results
R libraries : /home/phil/R/nxtflow_libs/
Script dir : /home/phil/GitHub/NGI-ChIPseq
Save Reference : false
Save Trimmed : false
Save Intermeds : false
Trim R1 : 0
Trim R2 : 0
Trim 3' R1 : 0
Trim 3' R2 : 0
Config Profile : UPPMAX
UPPMAX Project : b2017001
E-mail Address : phil.ewels@scilifelab.se
====================================
NGI stockholm
Version control
$fastqc.module = ['FastQC/0.7.2']
$trim_galore.module = ['TrimGalore/0.4.1', 'FastQC/0.7.2']
$bw.module = ['bwa/0.7.8', 'samtools/1.5']
$samtools.module = ['samtools/1.5', 'BEDTools/2.26.0']
$picard.module = ['picard/2.10.3', 'samtools/1.5']
$phantompeakqualtools.module = ['phantompeakqualtools/1.1']
$deepTools.module = ['deepTools/2.5.1']
$ngsplot.module = ['samtools/1.5', 'R/3.2.3', 'ngsplot/2.61']
$macs.module = ['MACS/2.1.0']
$saturation.module = ['MACS/2.1.0', 'samtools/1.5']
$saturation_r.module = ['R/3.2.3']
NGI stockholm
Version control
• Pipeline is always released under a stable version tag

• Software versions and code reproducible

• For full reproducibility, specify version revision when
running the pipeline
nextflow run SciLifeLab/NGI-ChIPseq -r v1.3
Conclusion
• Use NGI-ChIPseq to prepare your data if you want:

• To not have to remember every parameter for every tool
• Extreme reproducibility
• Ability to run on virtually any environment
• Now running for all ChIPseq projects at NGI-Stockholm
Conclusion
NGI stockholm
SciLifeLab/NGI-RNAseq
https://github.com/
SciLifeLab/NGI-MethylSeq
SciLifeLab/NGI-smRNAseq
SciLifeLab/NGI-ChIPseq
MITLicence
Conclusion
NGI stockholm
SciLifeLab/NGI-RNAseq
https://github.com/
SciLifeLab/NGI-MethylSeq
SciLifeLab/NGI-smRNAseq
SciLifeLab/NGI-ChIPseq
Acknowledgements

Phil Ewels
Chuan Wang
Jakub Westholm
Rickard Hammarén
Max Käller
Denis Moreno
NGI Stockholm Genomics Applications
Development Group
support@ngisweden.se
http://opensource.scilifelab.se

Mais conteúdo relacionado

Mais procurados

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Lenovo system management solutions
Lenovo system management solutionsLenovo system management solutions
Lenovo system management solutionsinside-BigData.com
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPCinside-BigData.com
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
 
Twisted: a quick introduction
Twisted: a quick introductionTwisted: a quick introduction
Twisted: a quick introductionRobert Coup
 
iptables and Kubernetes
iptables and Kubernetesiptables and Kubernetes
iptables and KubernetesHungWei Chiu
 
Addressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack NeutronAddressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack NeutronVikram G Hosakote
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFZoltan Arnold Nagy
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureANEO
 
An Introduction to Twisted
An Introduction to TwistedAn Introduction to Twisted
An Introduction to Twistedsdsern
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101HungWei Chiu
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker ContainersAndrey Sibirev
 

Mais procurados (20)

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Lenovo system management solutions
Lenovo system management solutionsLenovo system management solutions
Lenovo system management solutions
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
SDAccel Design Contest: Vivado HLS
SDAccel Design Contest: Vivado HLSSDAccel Design Contest: Vivado HLS
SDAccel Design Contest: Vivado HLS
 
Twisted: a quick introduction
Twisted: a quick introductionTwisted: a quick introduction
Twisted: a quick introduction
 
SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel
 
iptables and Kubernetes
iptables and Kubernetesiptables and Kubernetes
iptables and Kubernetes
 
Addressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack NeutronAddressing DHCP and DNS scalability issues in OpenStack Neutron
Addressing DHCP and DNS scalability issues in OpenStack Neutron
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows Azure
 
An Introduction to Twisted
An Introduction to TwistedAn Introduction to Twisted
An Introduction to Twisted
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker Containers
 

Semelhante a Processing ChIP-seq data with NGI-ChIPseq

NBIS RNA-seq course
NBIS RNA-seq courseNBIS RNA-seq course
NBIS RNA-seq coursePhil Ewels
 
Standardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowStandardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowPhil Ewels
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and TuningNGINX, Inc.
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolIvo Jimenez
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with PuppetKris Buytaert
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Fedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkRainer Gerhards
 
Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015Icinga
 
Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Richard Donkin
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Timothy Spann
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with PuppetKris Buytaert
 
Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015Alex S
 

Semelhante a Processing ChIP-seq data with NGI-ChIPseq (20)

NBIS RNA-seq course
NBIS RNA-seq courseNBIS RNA-seq course
NBIS RNA-seq course
 
Standardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowStandardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflow
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Logstash
LogstashLogstash
Logstash
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with Puppet
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Fedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 Talk
 
Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015Why favour Icinga over Nagios - Rootconf 2015
Why favour Icinga over Nagios - Rootconf 2015
 
Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)Go Faster with Ansible (AWS meetup)
Go Faster with Ansible (AWS meetup)
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with Puppet
 
Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015Ansible is the simplest way to automate. SymfonyCafe, 2015
Ansible is the simplest way to automate. SymfonyCafe, 2015
 

Mais de Phil Ewels

Reproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-coreReproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-corePhil Ewels
 
Reproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-coreReproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-corePhil Ewels
 
ELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-coreELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-corePhil Ewels
 
Coffee 'n code: Regexes
Coffee 'n code: RegexesCoffee 'n code: Regexes
Coffee 'n code: RegexesPhil Ewels
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Phil Ewels
 
Nextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorialNextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorialPhil Ewels
 
EpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGIEpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGIPhil Ewels
 
The future of genomics in the cloud
The future of genomics in the cloudThe future of genomics in the cloud
The future of genomics in the cloudPhil Ewels
 
SciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarSciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarPhil Ewels
 
Lecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics InfrastructureLecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics InfrastructurePhil Ewels
 
SBW 2016: MultiQC Workshop
SBW 2016: MultiQC WorkshopSBW 2016: MultiQC Workshop
SBW 2016: MultiQC WorkshopPhil Ewels
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIPhil Ewels
 
Developing Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructureDeveloping Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructurePhil Ewels
 
Using visual aids effectively
Using visual aids effectivelyUsing visual aids effectively
Using visual aids effectivelyPhil Ewels
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Internet McMenemy
Internet McMenemyInternet McMenemy
Internet McMenemyPhil Ewels
 

Mais de Phil Ewels (16)

Reproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-coreReproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics for everyone: Nextflow & nf-core
 
Reproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-coreReproducible bioinformatics workflows with Nextflow and nf-core
Reproducible bioinformatics workflows with Nextflow and nf-core
 
ELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-coreELIXIR Proteomics Community - Connection with nf-core
ELIXIR Proteomics Community - Connection with nf-core
 
Coffee 'n code: Regexes
Coffee 'n code: RegexesCoffee 'n code: Regexes
Coffee 'n code: Regexes
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
 
Nextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorialNextflow Camp 2019: nf-core tutorial
Nextflow Camp 2019: nf-core tutorial
 
EpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGIEpiChrom 2019 - Updates in Epigenomics at the NGI
EpiChrom 2019 - Updates in Epigenomics at the NGI
 
The future of genomics in the cloud
The future of genomics in the cloudThe future of genomics in the cloud
The future of genomics in the cloud
 
SciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminarSciLifeLab NGI NovaSeq seminar
SciLifeLab NGI NovaSeq seminar
 
Lecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics InfrastructureLecture: NGS at the National Genomics Infrastructure
Lecture: NGS at the National Genomics Infrastructure
 
SBW 2016: MultiQC Workshop
SBW 2016: MultiQC WorkshopSBW 2016: MultiQC Workshop
SBW 2016: MultiQC Workshop
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
 
Developing Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics InfrastructureDeveloping Reliable QC at the Swedish National Genomics Infrastructure
Developing Reliable QC at the Swedish National Genomics Infrastructure
 
Using visual aids effectively
Using visual aids effectivelyUsing visual aids effectively
Using visual aids effectively
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Internet McMenemy
Internet McMenemyInternet McMenemy
Internet McMenemy
 

Último

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Último (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

Processing ChIP-seq data with NGI-ChIPseq

  • 1. NGI stockholm NGI-ChIPseq Processing ChIP-seq data at the National Genomics Infrastructure Phil Ewels phil.ewels@scilifelab.se NBIS ChIP-seq tutorial 2017-11-29
  • 2. NGI stockholm SciLifeLab NGI Our mission is to offer a
 state-of-the-art infrastructure
 for massively parallel DNA sequencing and SNP genotyping, available to researchers all over Sweden
  • 3. NGI stockholm SciLifeLab NGI National resource State-of-the-art infrastructure Guidelines and support We provide 
 guidelines and support
 for sample collection, study design, protocol selection and bioinformatics analysis
  • 4. NGI stockholm NGI Organisation NGI Stockholm NGI Uppsala
  • 5. NGI stockholm NGI Organisation Funding Staff salaries Premises and service contracts Capital equipment Host universities SciLifeLab VR KAW User fees Reagent costs NGI Stockholm NGI Uppsala
  • 6. NGI stockholm Project timeline Sample QC Library preparation, Sequencing, Genotyping Data processing and primary analysis Scientific support and project consultation Data delivery
  • 7. NGI stockholm Methods offered at NGI Exome
 sequencing Nanoporesequencing ATAC-seq Metagenomics ChIP-seqBisulphite
 sequencing RAD-seq RNA-seq de novo Whole Genome seq Data analysis included for FREE Just
 Sequencing Accredited methods
  • 8. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Min 1 ng input • Min 10 μl • 0.2-10 ng/μl • Ins. size 200-800 bp • 963 kr / prep
  • 9. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Typically run SE 50bp • Illumina HiSeq High Output mode v4, SR 1x50bp • 1226 kr / sample (40M reads)
  • 10. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Typically run SE 50bp • Start by organising a planning meeting https://ngisweden.scilifelab.se
  • 11. NGI stockholm ChIP-seq Pipeline • Takes raw FastQ sequencing data as input • Provides range of results • Alignments (BAM) • Peaks (optionally filtered) • Quality Control • Pipeline in use since early 2017 (on request) NGI-ChIPseq
  • 12. ChIP-seq Pipeline FastQC TrimGalore! BWA Samtools, Picard Phantompeakqualtools  deepTools NGSPlot MACS2 Bedtools MultiQC Sequence QC Read trimming Alignment Sort, index, mark duplicates Strand cross-correlation QC Fingerprint, sample correlation TSS / Gene profile plots Peak calling Filtering blacklisted regions Reporting NGI-ChIPseq FastQ BAM HTML BED
  • 13. NGI stockholm Nextflow • Tool to manage computational pipelines • Handles interaction with compute infrastructure • Easy to learn how to run, minimal oversight required
  • 15. NGI stockholm Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } https://www.nextflow.io/
  • 16. NGI stockholm Default: Run locally, assume software is installed Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } process { executor = 'slurm' clusterOptions = { "-A b2017123" } cpus = 1 memory = 8.GB time = 2.h $fastqc { module = ['bioinfo-tools', ‘FastQC'] } } Submit jobs to SLURM queue Use environment modules
  • 17. NGI stockholm Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } process { executor = 'slurm' clusterOptions = { "-A b2017123" } cpus = 1 memory = 8.GB time = 2.h $fastqc { module = ['bioinfo-tools', ‘FastQC'] } } docker { enabled = true } process { container = 'biocontainers/fastqc' cpus = 1 memory = 8.GB time = 2.h } Run locally, use docker container for all software dependencies
  • 20. NGI stockholm Running NGI-ChIPseq Step 1: Install Nextflow • Uppmax - load the Nextflow module module load nextflow • Anywhere (including Uppmax) - install Nextflow curl -s https://get.nextflow.io | bash Step 2: Try running NGI-ChIPseq pipeline nextflow run SciLifeLab/NGI-ChIPseq --help
  • 21. NGI stockholm Running NGI-ChIPseq Step 3: Choose your reference • Common organism - use iGenomes --genome GRCh37 • MACS peak calling config file --macsconfig config.csv Step 4: Organise your data • One (if single-end) or two (if paired-end) FastQ per sample • Everything in one directory, simple filenames help!
  • 22. NGI stockholm Running NGI-ChIPseq Step 5: Run the pipeline on your data • Remember to run detached from your terminal screen / tmux / nohup Step 6: Check your results • Read the Nextflow log and check the MultiQC report Step 7: Delete temporary files • Delete the ./work directory, which holds all intermediates
  • 23. NGI stockholm Using UPPMAX nextflow run SciLifeLab/NGI-ChIPseq --project b2017123 --genome GRCh37 --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Default config is for UPPMAX • Knows about central iGenomes references • Uses centrally installed software
  • 24. NGI stockholm Using other clusters nextflow run SciLifeLab/NGI-ChIPseq -profile hebbe --bwaindex ./ref --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Can run just about anywhere • Supports local, SGE, LSF, SLURM, PBS/Torque, HTCondor, DRMAA, DNAnexus, Ignite, Kubernetes
  • 25. NGI stockholm Using Docker nextflow run SciLifeLab/NGI-ChIPseq -profile docker --fasta genome.fa --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Can run anywhere with Docker • Downloads required software and runs in a container • Portable and reproducible.
  • 26. NGI stockholm Using AWS nextflow run SciLifeLab/NGI-ChIPseq -profile aws --genome GRCh37 --macsconfig p.txt --reads "s3://my-bucket/*_{1,2}.fq.gz" --outdir "s3://my-bucket/results/" • Runs on the AWS cloud with Docker • Pay-as-you go, flexible computing • Can launch from anywhere with minimal configuration
  • 27. NGI stockholm Input data ERROR ~ Cannot find any reads matching: XXXX NB: Path needs to be enclosed in quotes! NB: Path requires at least one * wildcard! If this is single-end data, please specify
 --singleEnd on the command line. --reads '*_R{1,2}.fastq.gz' --reads '*.fastq.gz' --singleEnd --reads *_R{1,2}.fastq.gz --reads '*.fastq.gz' --reads sample.fastq.gz
  • 28. NGI stockholm Read trimming • Pipeline runs TrimGalore! to remove adapter contamination and low quality bases automatically • Use --notrim to disable this • Some library preps also include additional adapters --clip_r1 [int] --clip_r2 [int] --three_prime_clip_r1 [int] --three_prime_clip_r2 [int]
  • 29. NGI stockholm Blacklist filtering • Some parts of the reference genome collect incorrectly mapped reads • Good practice to remove these peaks • Pipeline has ENCODE regions for Human & Mouse • Can pass own BED file of custom regions --blacklist_filtering --blacklist regions.bed
  • 30. NGI stockholm Broad Peaks • Some chromatin profiles don't have narrow, sharp peaks • For example, H3K9me3 & H3K27me3 • MACS2 can call peaks in "broad peak" mode • Pipeline uses default qvalue cutoff of 0.1 --broad
  • 31. NGI stockholm Extending Read Length • When using single-end data, sequenced read length is shorter than the sequence fragment length • For DeepTools, need to "extend" the read length • Set to 100bp by default. Use this parameter to customise this value. • Expected fragment length - sequence read length --extendReadsLen [int]
  • 32. NGI stockholm Saving intermediates • By default, the pipeline doesn't save some intermediate files to your final results directory • Reference genome indices that have been built • FastQ files from TrimGalore! • BAM files from STAR (we have BAMs from Picard) --saveReference --saveTrimmed --saveAlignedIntermediates
  • 33. NGI stockholm Resuming pipelines • If something goes wrong, you can resume a stopped pipeline • Will use cached versions of completed processes • NB: Only one hyphen! -resume • Can resume specific past runs • Use nextflow log to find job names -resume job_name
  • 34. NGI stockholm Customising output -name Give a name to your run. Used in logs and reports --outdir Specify the directory for saved results --saturation Run saturation analysis, subsampling reads from 10% - 100% --email Get e-mailed a summary report when the pipeline finishes
  • 35. NGI stockholm Nextflow config files • Can save a config file with defaults • Anything with two hyphens is a params params { email = 'phil.ewels@scilifelab.se' project = "b2017123" } process.$multiqc.module = [] ./nextflow.config ~/.nextflow/config -c /path/to/my.config
  • 36. NGI stockholm NGI-ChIPseq config N E X T F L O W ~ version 0.26.1 Launching `SciLifeLab/NGI-ChIPseq` [deadly_bose] - revision: 28e24c2a2a ========================================= NGI-ChIPseq: ChIP-Seq Best Practice v1.4 ========================================= Run Name : deadly_bose Reads : data/*fastq.gz Data Type : Single-End Genome : GRCh37 BWA Index : /sw/data/uppnex/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/ MACS Config : data/macsconfig.txt Saturation analysis : false MACS broad peaks : false Blacklist filtering : false Extend Reads : 100 bp Current home : /home/phil Current user : phil Current path : /home/phil/demo_data/ChIP/Human/test Working dir : /home/phil/demo_data/ChIP/Human/test/work Output dir : ./results R libraries : /home/phil/R/nxtflow_libs/ Script dir : /home/phil/GitHub/NGI-ChIPseq Save Reference : false Save Trimmed : false Save Intermeds : false Trim R1 : 0 Trim R2 : 0 Trim 3' R1 : 0 Trim 3' R2 : 0 Config Profile : UPPMAX UPPMAX Project : b2017001 E-mail Address : phil.ewels@scilifelab.se ====================================
  • 37. NGI stockholm Version control $fastqc.module = ['FastQC/0.7.2'] $trim_galore.module = ['TrimGalore/0.4.1', 'FastQC/0.7.2'] $bw.module = ['bwa/0.7.8', 'samtools/1.5'] $samtools.module = ['samtools/1.5', 'BEDTools/2.26.0'] $picard.module = ['picard/2.10.3', 'samtools/1.5'] $phantompeakqualtools.module = ['phantompeakqualtools/1.1'] $deepTools.module = ['deepTools/2.5.1'] $ngsplot.module = ['samtools/1.5', 'R/3.2.3', 'ngsplot/2.61'] $macs.module = ['MACS/2.1.0'] $saturation.module = ['MACS/2.1.0', 'samtools/1.5'] $saturation_r.module = ['R/3.2.3']
  • 38. NGI stockholm Version control • Pipeline is always released under a stable version tag • Software versions and code reproducible • For full reproducibility, specify version revision when running the pipeline nextflow run SciLifeLab/NGI-ChIPseq -r v1.3
  • 39. Conclusion • Use NGI-ChIPseq to prepare your data if you want: • To not have to remember every parameter for every tool • Extreme reproducibility • Ability to run on virtually any environment • Now running for all ChIPseq projects at NGI-Stockholm
  • 41. Conclusion NGI stockholm SciLifeLab/NGI-RNAseq https://github.com/ SciLifeLab/NGI-MethylSeq SciLifeLab/NGI-smRNAseq SciLifeLab/NGI-ChIPseq Acknowledgements Phil Ewels Chuan Wang Jakub Westholm Rickard Hammarén Max Käller Denis Moreno NGI Stockholm Genomics Applications Development Group support@ngisweden.se http://opensource.scilifelab.se