SlideShare uma empresa Scribd logo
1 de 38
1
How to be a bioinformatician
Christian Frech, PhD
St. Anna Children’s Cancer Research Institute, Vienna, Austria
Talk at University of Applied Sciences, Hagenberg, Austria
April 23rd, 2014
What is a bioinformatician?
2
Informatician Statistician
Biologist
Data
scientist
Modified from http://blog.fejes.ca/?p=2418
Bioinformatician vs. computational biologist
 Asks biological questions
 Analyzes & interprets
biological data
 Runs existing programs
 Ad hoc scripting
 Perl, R, Python
3
 IT savvy
 Builds & maintains
biological databases &
Web sites
 Designs & implements
clever algorithms
 C/C++, Java, Python
Bioinformatician Computational
biologist
Grasp of computational subjectsmore less
Grasp of biological subjectsless more
or vice versa
Why do we need bioinformaticians?
 Amount of generated biological data requires sophisticated
computing for data management and analysis
 Programmers lack biological knowledge
 Biologists don‟t program
 The two don‟t understand each other
4
http://www.youtube.com/watch?v=Hz1fyhVOjr4
Latest Illumina sequencer shipped last
week (HiSeq v4 reagent kit) outputs
1 terabase (TB) of data in 6 days1!
Biologists talks to statistician
1 http://www.illumina.com/products/hiseq-sbs-kit-v4.ilmn
What are bioinformaticians doing?
5
6
What are bioinformaticians doing?
Word cloud from manuscript titles published in Bioinformatics from Jan 2013 to April 2014
Challenges as bioinformatician
 Biology is complex, not black and white
 As many exceptions as rules (e.g.: define “gene”)
 No single optimal solution to a problem
 Results interpretable in many ways (story telling, cherry picking)
 Understanding the biological question
 Field is moving incredibly fast
 Lack of standards, immature/abandoned software
 Standard of today obsolete tomorrow
 Much time spent on collecting/cleaning-up data, troubleshooting errors
 Stay flexible, don‟t overinvest in single platform/technology
 Hundreds of software tools and databases out there
 Easy to get lost
 Important to understand their strengths and weaknesses
8
Which tools should I use?
9
179 tools
Heard of: 65%
Used: 30%
10
http://omictools.com/
Things to have in your bioinformatics
toolbox
 Linux command line
 Scripting language with
associated Bio* library (BioPerl,
BioPython, R/Bioconductor, …)
 Basic statistical tests, regression,
p-values, maximum likelihood,
multiple testing correction
 Sequence alignment
(FASTA & BLAST)
 Biological databases
 Regular expressions
 Sequencing technologies
 Web technologies (HTML, XML, …)
11
 Advanced R skills
 Parallel/distributed computing
 DBMS, SQL
 (Semi-)compiled language (C/C++, Java)
 Dimensionality reduction (e.g. PCA)
 Cluster analysis
 Support Vector Machines
 Hidden Markov models
 Web framework (e.g. Django)
 Version control system (e.g. Git)
 Advanced text editor (Emacs, vim)
 IDE (e.g. Eclipse, NetBeans)
Must haves Highly recommended
Requirement
Recommended
Language
Speed matters, low-level programming
Rich-client enterprise application development
Text file processing (regex)
Statistical analysis, fancy plots
Rapid prototyping, readable & maintainable scripts
Workflow automation
What programming language should I learn?
12Be a jack of all trades, master of ONE!
Perl on decline, R and Python gaining popularity
13
http://computationalproteomic.blogspot.co.uk/2013/10/which-are-best-programming-
languages.html
http://openwetware.org/wiki/Image:Most_Popular_Bioinformatics_Programming_Languages.png
Perl most popular bioinformatics
programming language in 2008
R and Python take the lead in 2014
Top 10 most common and/or
annoying mistakes in bioinformatics
14
Inspired by “What Are The Most Common Stupid Mistakes In Bioinformatics?” (https://www.biostars.org/p/7126/)
Top-10 most common/annoying mistakes in bioinformatics
# 10
Using genome coordinates with wrong
genome version
(for example, using gene coordinates from human genome
version hg18 but reference sequence from version hg19)
15
Top-10 most common/annoying mistakes in bioinformatics
# 9
Forgetting to process the second strand of
DNA sequence
16
Top-10 most common/annoying mistakes in bioinformatics
# 8
Processing second strand of DNA sequence,
but taking reverse instead of reverse
complement sequence
17
Top-10 most common/annoying mistakes in bioinformatics
# 7
Not accounting for different human
chromosomes names between
UCSC and Ensembl
Example:
UCSC: “chr1”
Ensembl: “1”
18
Top-10 most common/annoying mistakes in bioinformatics
# 6
Assuming the alphabetical order of
chromosome names is
“chr1”, “chr2”, “chr3”, …
when in fact it is
“chr1”, “chr10”, “chr11”, …
19
Top-10 most common/annoying mistakes in bioinformatics
# 5
Assuming „tab‟ field separator
when in fact it is „blank‟
(or vice versa)
(look almost identical in text editor)
20
Top-10 most common/annoying mistakes in bioinformatics
# 4
Assuming DNA sequence consists of only
four letters (A, T, C, G) while in fact
there is a fifth
21
„N‟ for missing base
(„X‟ for missing amino acid)
Top-10 most common/annoying mistakes in bioinformatics
# 3
Forgetting to use dos2unix on a Windows text file
before processing it under Linux
plus spending 1 hour to debug the problem
plus being tricked by this multiple times
Text file line breaks differ between platforms:
Linux (LF); Windows (CR+LF); classic Mac (CR).
22
Top-10 most common/annoying mistakes in bioinformatics
# 2
When importing data into MS Excel, letting it
auto-convert HUGO gene names into dates
and forgetting about it
(e.g., tumor suppressor gene “DEC1” will be converted to “1-DEC” on import)
~30 genes in total
23
#1
Off-by-one error
There are only two common problems in bioinformatics:
(1) lack of standards, (2) ID conversion, and
(3) off-by-one errors
24
http://en.wikipedia.org/wiki/Off-by-one_error
Top-10 most common/annoying mistakes in bioinformatics
Ten personal recommendations for
your future work as bioinformatician
25
#1 - Learn Linux!
 Most bioinformatics tools not available
on Windows
 Linux file systems better for many and/or very large files
 Command line interface (CLI) has advantages over
graphical user interface (GUI)
 Recorded command history (reproducibility)
 Key stroke to re-run analysis, instead of repeating 100 mouse
clicks
 Linux CLI (Shell) much more powerful than Windows CLI
26
# 2 - Embrace the “Unix tools philosophy”
 Small programs (“tools”) instead of monolithic applications
 Designed for simple, specific tasks that are performed well
(awk, cat, grep, wc, etc.)
 Many and well documented parameters
 Combined with Unix pipes (read from STDIN, write to STDOUT)
 cut -f 3 myfile.txt | sort | uniq
 Advantages
 Great flexibility, easy re-use of existing tools
 Intermediate output can be stored and inspected for troubleshooting
 Complex tasks can be performed quickly with shell „one-liners‟
 This paradigm fits bioinformatics well, where often many
heterogeneous data files need to be processed in many
different ways
27http://www.linuxdevcenter.com/lpt/a/302
Example NGS use case demonstrating the power
of the Unix tools philosophy
 Explanation
 „samtools mpileup‟ piles up short reads from the input BAM file for
each position in the reference genome
 „bcftools view‟ calls the variants
 „vcfutils vcf2fq‟ computes the consensus sequence
 The resulting FASTA sequence is redirected to the output file cns.fq
 By knowing available tools and their parameters, bioinformatics
„wizards‟ can get complex stuff done in almost no time
28
samtools mpileup -uf ref.fa aln.bam |
bcftools view -cg - |
vcfutils.pl vcf2fq > cns.fq
http://samtools.sourceforge.net/mpileup.shtml
#3 - Don’t reinvent the wheel
 Coding is fun, but look
around before you hack
into your keyboard
 Don‟t write the 29th FASTA
file parser if proven solutions
are available
 BioPerl
 BioPython
 Bioconductor
29
#4 - If you happen to invent a wheel, …
 Document source and parameters well
 Use version control system (git, svn)
 Deposit code in public repository
 sourceforge.net
 github.com
 Write test cases
30
# 5 - Automate pipelines
with GNU/Make
 Developed in 1970s to build executables from
source files
 Incredibly useful for data-driven workflows as well
 Automatic error checking
 Parallelization (utilize multiple cores)
 Incremental builds (re-start your pipeline from point of failure)
 Bug-free
 Get started at
http://www.bioinformaticszen.com/post/decomplected-workflows-makefiles/
31
# 6 - Value your time
 Architecture vs. accomplishment
 “Perfect is the enemy of the good” -- Voltaire
 OO design and normalized databases are nice, but can be an
overkill if requirements change from analysis to analysis
 Automate what can be automated
 Reproducibility
 Easy to repeat analysis with slightly changed parameters
 BUT: Don‟t spend two days automating a one-time
analysis that can be done manually in 10 minutes
32
# 7 – Make use of free online resources to learn
about specialized topics
 www.coursera.org
 Bioinformatics Algorithms
(https://www.coursera.org/course/bioinformatics)
 Computing for Data Analysis
(https://www.coursera.org/course/compdata)
 R Programming
(https://www.coursera.org/course/rprog)
 https://www.edx.org/
 Data Analysis for Genomics (https://www.edx.org/course/harvardx/harvardx-
ph525x-data-analysis-genomics-1401#.U1TUbXV52R8)
 Introduction to Biology (https://www.edx.org/course/mitx/mitx-7-00x-
introduction-biology-secret-1768#.U1TVL3V52R8)
 http://rosalind.info/problems/locations/
33
# 8 - Become an expert
 Identify an area of interest
and get really good at it
 Work at places where you
can learn from the best
 Spend time abroad
 Great experience
 Labs/companies will not only hire you for what you
know, but who you know
34
# 9 - Decide early on if you want to stay in
academia or go into industry
35
Academia Industry
• PhD highly recommended
• Take your time to find
compatible supervisor
+ Freedom to pursue own ideas
+ Very flexible working hours
+ Work independently
- Steep & competitive career
ladder (postdoc >> PI/prof)
- Lower pay
- Publish or perish
• PhD beneficial (to get in), but
not necessarily required for
daily work (e.g. build/maintain
databases)
+ More frequent (positive)
feedback
+ Higher pay
+ Job security
- More (external) deadlines
- Higher pressure to get things
done
See also “Ten Simple Rules for Choosing between Industry and Academia” (David B. Searls, 2009)
# 10 - Stay informed & get connected
 Follow literature and blogs
 http://en.wikipedia.org/wiki/List_of_bioinformatics_journals
 http://www.homolog.us/blogs/blog/2012/07/27/how-to-stay-
current-in-bioinformaticsgenomics/
 Subscribe via RSS feeds
 http://feedly.com or others
 Platform independent (e.g. read on your phone)
 Bioinformatics Q&A forums
 http://www.biostars.org (highly recommended)
 http://seqanswers.com/ (focus on NGS)
 http://www.reddit.com/r/bioinformatics/ (student-oriented)
 Other
 http://bioinformatics.org – fosters collaboration in bioinformatics
 http://www.researchgate.net – “Facebook” for researchers
 German bioinformatics group on XING (https://www.xing.com/net/pri485482x/bin)
36
Conclusion
 As bioinformatician, you will be at the
forefront of one of the greatest scientific
enterprises of our time
 Biologists overwhelmed with massive
data sets
 YOU will get to see exciting results first
 Requires integration of knowledge from many domains
 IT, biology, medicine, statistics, math, …
 Knowing your informatics toolbox AND understanding the biological
question is what makes you very valuable
37
Thank you!
Christian Frech
frech.christian@gmail.com
38
Further Reading
 “So you want to be a computational biologist?”
http://www.nature.com/nbt/journal/v31/n11/full/nbt.2740.html
 “What It Takes to Be a Bioinformatician”
http://nav4bioinfo.wordpress.com/2013/03/19/what-it-takes-to-be-a-bioinformatician/
 “The alternative „what it takes to be a bioinformatician‟”
https://biomickwatson.wordpress.com/2013/03/18/the-alternative-what-it-takes-to-be-a-bioinformatician/
 “So You Want To Be a Computational Biologist, Or A Bioinformatician?”
http://www.checkmatescientist.net/2013/11/so-you-want-to-be-computational.html
 “Being a bioinformatician is hard”
http://www.bioinformaticszen.com/post/being-a-bioinformatician-is-hard/
 “How not to be a bioinformatician”
http://www.scfbm.org/content/7/1/3
 “Ten Simple Rules for Reproducible Computational Research”
http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003285
 “Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia”
http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002001;jsessionid=6D5D844E0E2
E21C9E565378C7F714D76
 “A Quick Guide for Developing Effective Bioinformatics Programming Skills”
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000589
 “What Is Really the Salary of a Bioinformatician/Computational Biologist?”
http://www.homolog.us/blogs/blog/2014/04/02/what-is-really-the-salary-of-a-bioinformaticiancomputational-
biologist/
39

Mais conteúdo relacionado

Mais procurados

Finding a Scientific Article
Finding a Scientific ArticleFinding a Scientific Article
Finding a Scientific ArticleMcCain Library
 
A comparative study of Clustering for Gene expression data in Bioinformatics
A comparative study of Clustering for Gene expression data in BioinformaticsA comparative study of Clustering for Gene expression data in Bioinformatics
A comparative study of Clustering for Gene expression data in BioinformaticsBegum Rokeya Universtiy, Rangpur
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303Bruno Mmassy
 
Microarray data analysis _ by Ritesh Kumar
Microarray data analysis _ by Ritesh KumarMicroarray data analysis _ by Ritesh Kumar
Microarray data analysis _ by Ritesh KumarRITESH KUMAR
 
Impact factor of journals
Impact factor of journalsImpact factor of journals
Impact factor of journalsDr. Pinki Insan
 
Mucosal associated
Mucosal associatedMucosal associated
Mucosal associatedSmawi GH
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its toolsGaurav Diwakar
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoverySunghwan Kim
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 

Mais procurados (20)

Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Finding a Scientific Article
Finding a Scientific ArticleFinding a Scientific Article
Finding a Scientific Article
 
Protein database
Protein  databaseProtein  database
Protein database
 
A comparative study of Clustering for Gene expression data in Bioinformatics
A comparative study of Clustering for Gene expression data in BioinformaticsA comparative study of Clustering for Gene expression data in Bioinformatics
A comparative study of Clustering for Gene expression data in Bioinformatics
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
Microarray data analysis _ by Ritesh Kumar
Microarray data analysis _ by Ritesh KumarMicroarray data analysis _ by Ritesh Kumar
Microarray data analysis _ by Ritesh Kumar
 
Impact factor of journals
Impact factor of journalsImpact factor of journals
Impact factor of journals
 
Mucosal associated
Mucosal associatedMucosal associated
Mucosal associated
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug Discovery
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Fasta
FastaFasta
Fasta
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
BLAST
BLASTBLAST
BLAST
 
Structural databases
Structural databases Structural databases
Structural databases
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 
Clustal
ClustalClustal
Clustal
 

Destaque

The Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsThe Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsGood Funnel
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased OverviewPhilip Bourne
 
Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Robert (Rob) Salomon
 
Ap Chapter 21
Ap Chapter 21Ap Chapter 21
Ap Chapter 21smithbio
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification Senthil Natesan
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsSaramita De Chakravarti
 
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura AdamMapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adammadalladam
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataPhilip Bourne
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomesmadalladam
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 

Destaque (13)

The Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer InterviewsThe Marketer's Guide To Customer Interviews
The Marketer's Guide To Customer Interviews
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
 
Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1Flow Cytometry Training : Introduction day 1 session 1
Flow Cytometry Training : Introduction day 1 session 1
 
Ap Chapter 21
Ap Chapter 21Ap Chapter 21
Ap Chapter 21
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
 
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura AdamMapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
 
Gene concept
Gene conceptGene concept
Gene concept
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 

Semelhante a How to be a bioinformatician

Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesAnnika Eriksson
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibilityc.titus.brown
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptMohmdUmer
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 

Semelhante a How to be a bioinformatician (20)

Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
Open64 compiler
Open64 compilerOpen64 compiler
Open64 compiler
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
HPC For Bioinformatics
HPC For BioinformaticsHPC For Bioinformatics
HPC For Bioinformatics
 

Último

Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 

Último (20)

Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 

How to be a bioinformatician

  • 1. 1 How to be a bioinformatician Christian Frech, PhD St. Anna Children’s Cancer Research Institute, Vienna, Austria Talk at University of Applied Sciences, Hagenberg, Austria April 23rd, 2014
  • 2. What is a bioinformatician? 2 Informatician Statistician Biologist Data scientist Modified from http://blog.fejes.ca/?p=2418
  • 3. Bioinformatician vs. computational biologist  Asks biological questions  Analyzes & interprets biological data  Runs existing programs  Ad hoc scripting  Perl, R, Python 3  IT savvy  Builds & maintains biological databases & Web sites  Designs & implements clever algorithms  C/C++, Java, Python Bioinformatician Computational biologist Grasp of computational subjectsmore less Grasp of biological subjectsless more or vice versa
  • 4. Why do we need bioinformaticians?  Amount of generated biological data requires sophisticated computing for data management and analysis  Programmers lack biological knowledge  Biologists don‟t program  The two don‟t understand each other 4 http://www.youtube.com/watch?v=Hz1fyhVOjr4 Latest Illumina sequencer shipped last week (HiSeq v4 reagent kit) outputs 1 terabase (TB) of data in 6 days1! Biologists talks to statistician 1 http://www.illumina.com/products/hiseq-sbs-kit-v4.ilmn
  • 6. 6 What are bioinformaticians doing? Word cloud from manuscript titles published in Bioinformatics from Jan 2013 to April 2014
  • 7. Challenges as bioinformatician  Biology is complex, not black and white  As many exceptions as rules (e.g.: define “gene”)  No single optimal solution to a problem  Results interpretable in many ways (story telling, cherry picking)  Understanding the biological question  Field is moving incredibly fast  Lack of standards, immature/abandoned software  Standard of today obsolete tomorrow  Much time spent on collecting/cleaning-up data, troubleshooting errors  Stay flexible, don‟t overinvest in single platform/technology  Hundreds of software tools and databases out there  Easy to get lost  Important to understand their strengths and weaknesses 8
  • 8. Which tools should I use? 9 179 tools Heard of: 65% Used: 30%
  • 10. Things to have in your bioinformatics toolbox  Linux command line  Scripting language with associated Bio* library (BioPerl, BioPython, R/Bioconductor, …)  Basic statistical tests, regression, p-values, maximum likelihood, multiple testing correction  Sequence alignment (FASTA & BLAST)  Biological databases  Regular expressions  Sequencing technologies  Web technologies (HTML, XML, …) 11  Advanced R skills  Parallel/distributed computing  DBMS, SQL  (Semi-)compiled language (C/C++, Java)  Dimensionality reduction (e.g. PCA)  Cluster analysis  Support Vector Machines  Hidden Markov models  Web framework (e.g. Django)  Version control system (e.g. Git)  Advanced text editor (Emacs, vim)  IDE (e.g. Eclipse, NetBeans) Must haves Highly recommended
  • 11. Requirement Recommended Language Speed matters, low-level programming Rich-client enterprise application development Text file processing (regex) Statistical analysis, fancy plots Rapid prototyping, readable & maintainable scripts Workflow automation What programming language should I learn? 12Be a jack of all trades, master of ONE!
  • 12. Perl on decline, R and Python gaining popularity 13 http://computationalproteomic.blogspot.co.uk/2013/10/which-are-best-programming- languages.html http://openwetware.org/wiki/Image:Most_Popular_Bioinformatics_Programming_Languages.png Perl most popular bioinformatics programming language in 2008 R and Python take the lead in 2014
  • 13. Top 10 most common and/or annoying mistakes in bioinformatics 14 Inspired by “What Are The Most Common Stupid Mistakes In Bioinformatics?” (https://www.biostars.org/p/7126/)
  • 14. Top-10 most common/annoying mistakes in bioinformatics # 10 Using genome coordinates with wrong genome version (for example, using gene coordinates from human genome version hg18 but reference sequence from version hg19) 15
  • 15. Top-10 most common/annoying mistakes in bioinformatics # 9 Forgetting to process the second strand of DNA sequence 16
  • 16. Top-10 most common/annoying mistakes in bioinformatics # 8 Processing second strand of DNA sequence, but taking reverse instead of reverse complement sequence 17
  • 17. Top-10 most common/annoying mistakes in bioinformatics # 7 Not accounting for different human chromosomes names between UCSC and Ensembl Example: UCSC: “chr1” Ensembl: “1” 18
  • 18. Top-10 most common/annoying mistakes in bioinformatics # 6 Assuming the alphabetical order of chromosome names is “chr1”, “chr2”, “chr3”, … when in fact it is “chr1”, “chr10”, “chr11”, … 19
  • 19. Top-10 most common/annoying mistakes in bioinformatics # 5 Assuming „tab‟ field separator when in fact it is „blank‟ (or vice versa) (look almost identical in text editor) 20
  • 20. Top-10 most common/annoying mistakes in bioinformatics # 4 Assuming DNA sequence consists of only four letters (A, T, C, G) while in fact there is a fifth 21 „N‟ for missing base („X‟ for missing amino acid)
  • 21. Top-10 most common/annoying mistakes in bioinformatics # 3 Forgetting to use dos2unix on a Windows text file before processing it under Linux plus spending 1 hour to debug the problem plus being tricked by this multiple times Text file line breaks differ between platforms: Linux (LF); Windows (CR+LF); classic Mac (CR). 22
  • 22. Top-10 most common/annoying mistakes in bioinformatics # 2 When importing data into MS Excel, letting it auto-convert HUGO gene names into dates and forgetting about it (e.g., tumor suppressor gene “DEC1” will be converted to “1-DEC” on import) ~30 genes in total 23
  • 23. #1 Off-by-one error There are only two common problems in bioinformatics: (1) lack of standards, (2) ID conversion, and (3) off-by-one errors 24 http://en.wikipedia.org/wiki/Off-by-one_error Top-10 most common/annoying mistakes in bioinformatics
  • 24. Ten personal recommendations for your future work as bioinformatician 25
  • 25. #1 - Learn Linux!  Most bioinformatics tools not available on Windows  Linux file systems better for many and/or very large files  Command line interface (CLI) has advantages over graphical user interface (GUI)  Recorded command history (reproducibility)  Key stroke to re-run analysis, instead of repeating 100 mouse clicks  Linux CLI (Shell) much more powerful than Windows CLI 26
  • 26. # 2 - Embrace the “Unix tools philosophy”  Small programs (“tools”) instead of monolithic applications  Designed for simple, specific tasks that are performed well (awk, cat, grep, wc, etc.)  Many and well documented parameters  Combined with Unix pipes (read from STDIN, write to STDOUT)  cut -f 3 myfile.txt | sort | uniq  Advantages  Great flexibility, easy re-use of existing tools  Intermediate output can be stored and inspected for troubleshooting  Complex tasks can be performed quickly with shell „one-liners‟  This paradigm fits bioinformatics well, where often many heterogeneous data files need to be processed in many different ways 27http://www.linuxdevcenter.com/lpt/a/302
  • 27. Example NGS use case demonstrating the power of the Unix tools philosophy  Explanation  „samtools mpileup‟ piles up short reads from the input BAM file for each position in the reference genome  „bcftools view‟ calls the variants  „vcfutils vcf2fq‟ computes the consensus sequence  The resulting FASTA sequence is redirected to the output file cns.fq  By knowing available tools and their parameters, bioinformatics „wizards‟ can get complex stuff done in almost no time 28 samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq http://samtools.sourceforge.net/mpileup.shtml
  • 28. #3 - Don’t reinvent the wheel  Coding is fun, but look around before you hack into your keyboard  Don‟t write the 29th FASTA file parser if proven solutions are available  BioPerl  BioPython  Bioconductor 29
  • 29. #4 - If you happen to invent a wheel, …  Document source and parameters well  Use version control system (git, svn)  Deposit code in public repository  sourceforge.net  github.com  Write test cases 30
  • 30. # 5 - Automate pipelines with GNU/Make  Developed in 1970s to build executables from source files  Incredibly useful for data-driven workflows as well  Automatic error checking  Parallelization (utilize multiple cores)  Incremental builds (re-start your pipeline from point of failure)  Bug-free  Get started at http://www.bioinformaticszen.com/post/decomplected-workflows-makefiles/ 31
  • 31. # 6 - Value your time  Architecture vs. accomplishment  “Perfect is the enemy of the good” -- Voltaire  OO design and normalized databases are nice, but can be an overkill if requirements change from analysis to analysis  Automate what can be automated  Reproducibility  Easy to repeat analysis with slightly changed parameters  BUT: Don‟t spend two days automating a one-time analysis that can be done manually in 10 minutes 32
  • 32. # 7 – Make use of free online resources to learn about specialized topics  www.coursera.org  Bioinformatics Algorithms (https://www.coursera.org/course/bioinformatics)  Computing for Data Analysis (https://www.coursera.org/course/compdata)  R Programming (https://www.coursera.org/course/rprog)  https://www.edx.org/  Data Analysis for Genomics (https://www.edx.org/course/harvardx/harvardx- ph525x-data-analysis-genomics-1401#.U1TUbXV52R8)  Introduction to Biology (https://www.edx.org/course/mitx/mitx-7-00x- introduction-biology-secret-1768#.U1TVL3V52R8)  http://rosalind.info/problems/locations/ 33
  • 33. # 8 - Become an expert  Identify an area of interest and get really good at it  Work at places where you can learn from the best  Spend time abroad  Great experience  Labs/companies will not only hire you for what you know, but who you know 34
  • 34. # 9 - Decide early on if you want to stay in academia or go into industry 35 Academia Industry • PhD highly recommended • Take your time to find compatible supervisor + Freedom to pursue own ideas + Very flexible working hours + Work independently - Steep & competitive career ladder (postdoc >> PI/prof) - Lower pay - Publish or perish • PhD beneficial (to get in), but not necessarily required for daily work (e.g. build/maintain databases) + More frequent (positive) feedback + Higher pay + Job security - More (external) deadlines - Higher pressure to get things done See also “Ten Simple Rules for Choosing between Industry and Academia” (David B. Searls, 2009)
  • 35. # 10 - Stay informed & get connected  Follow literature and blogs  http://en.wikipedia.org/wiki/List_of_bioinformatics_journals  http://www.homolog.us/blogs/blog/2012/07/27/how-to-stay- current-in-bioinformaticsgenomics/  Subscribe via RSS feeds  http://feedly.com or others  Platform independent (e.g. read on your phone)  Bioinformatics Q&A forums  http://www.biostars.org (highly recommended)  http://seqanswers.com/ (focus on NGS)  http://www.reddit.com/r/bioinformatics/ (student-oriented)  Other  http://bioinformatics.org – fosters collaboration in bioinformatics  http://www.researchgate.net – “Facebook” for researchers  German bioinformatics group on XING (https://www.xing.com/net/pri485482x/bin) 36
  • 36. Conclusion  As bioinformatician, you will be at the forefront of one of the greatest scientific enterprises of our time  Biologists overwhelmed with massive data sets  YOU will get to see exciting results first  Requires integration of knowledge from many domains  IT, biology, medicine, statistics, math, …  Knowing your informatics toolbox AND understanding the biological question is what makes you very valuable 37
  • 38. Further Reading  “So you want to be a computational biologist?” http://www.nature.com/nbt/journal/v31/n11/full/nbt.2740.html  “What It Takes to Be a Bioinformatician” http://nav4bioinfo.wordpress.com/2013/03/19/what-it-takes-to-be-a-bioinformatician/  “The alternative „what it takes to be a bioinformatician‟” https://biomickwatson.wordpress.com/2013/03/18/the-alternative-what-it-takes-to-be-a-bioinformatician/  “So You Want To Be a Computational Biologist, Or A Bioinformatician?” http://www.checkmatescientist.net/2013/11/so-you-want-to-be-computational.html  “Being a bioinformatician is hard” http://www.bioinformaticszen.com/post/being-a-bioinformatician-is-hard/  “How not to be a bioinformatician” http://www.scfbm.org/content/7/1/3  “Ten Simple Rules for Reproducible Computational Research” http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003285  “Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia” http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002001;jsessionid=6D5D844E0E2 E21C9E565378C7F714D76  “A Quick Guide for Developing Effective Bioinformatics Programming Skills” http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000589  “What Is Really the Salary of a Bioinformatician/Computational Biologist?” http://www.homolog.us/blogs/blog/2014/04/02/what-is-really-the-salary-of-a-bioinformaticiancomputational- biologist/ 39

Notas do Editor

  1. Version 5
  2. Funny rant about bioinformatics, not to be taken literally:http://madhadron.com/posts/2012-03-26-a-farewell-to-bioinformatics.html