SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
SQL, noSQL or no database at all?
Are databases still a core skill?
Neil Saunders
COMPUTATIONAL INFORMATICS
www.csiro.au
Databases: Slide 2 of 24
alternative title: should David Lovell learn databases?
Databases: Slide 3 of 24
actual recent email request
Hi Neil,
I was wondering if you could help me with something. I am trying to put
together a table but it is rather slow by hand. Do you know if you can
help me with this task with a script? If it is too much of your time,
don’t worry about it. Just thought I’d ask before I start.
The task is:
The targets listed in A tab need to be found in B tab then the entire row
copied into C tab. Then the details in column C of C tab then need to be
matched with the details in D tab so that the patients with the mutations
are listed in row AG and AH of C tab.
Again, if this isn’t an easy task for you then don’t worry about it.
Databases: Slide 4 of 24
sounds like a database to me (c. 2004)
Databases: Slide 5 of 24
database design is a profession in itself
-- KEGG_DB schema
CREATE TABLE ec2go (
ec_no VARCHAR(16) NOT NULL, -- EC number (with "EC:" prefix)
go_id CHAR(10) NOT NULL -- GO ID
);
CREATE TABLE pathway2gene (
pathway_id CHAR(8) NOT NULL, -- KEGG pathway long ID
gene_id VARCHAR(20) NOT NULL -- Entrez Gene or ORF ID
);
CREATE TABLE pathway2name (
path_id CHAR(5) NOT NULL UNIQUE, -- KEGG pathway short ID
path_name VARCHAR(80) NOT NULL UNIQUE -- KEGG pathway name
);
-- Indexes.
CREATE INDEX Ipathway2gene ON pathway2gene (gene_id);
Databases: Slide 6 of 24
know your ORM from your MVC
(do you DSL?)
http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
Databases: Slide 7 of 24
my one tip for today: use ORM
= object relational mapping
#!/usr/bin/ruby
require ’sequel’
# connect to UCSC Genomes MySQL server
DB = Sequel.connect(:adapter => "mysql", :host => "genome-mysql.cse.ucsc.edu",
:user => "genome", :database => "hg19")
# instead of "SELECT count(*) FROM knownGene"
DB.from(:knownGene).count
# => 82960
# instead of "SELECT name, chrom, txStart FROM knownGene LIMIT 1"
DB.from(:knownGene).select(:name, :chrom, :txStart).first
# => {:name=>"uc001aaa.3", :chrom=>"chr1", :txStart=>11873}
# instead of "SELECT name FROM knownGene WHERE chrom == ’chrM’"
DB.from(:knownGene).where(:chrom => "chrM").all
# => [{:name=>"uc004coq.4"}, {:name=>"uc022bqo.2"}, {:name=>"uc004cor.1"}, {:name=>"uc004cos.5"},
# {:name=>"uc022bqp.1"}, {:name=>"uc022bqq.1"}, {:name=>"uc022bqr.1"}, {:name=>"uc031tga.1"},
# {:name=>"uc022bqs.1"}, {:name=>"uc011mfi.2"}, {:name=>"uc022bqt.1"}, {:name=>"uc022bqu.2"},
# {:name=>"uc004cov.5"}, {:name=>"uc031tgb.1"}, {:name=>"uc004cow.2"}, {:name=>"uc004cox.4"},
# {:name=>"uc022bqv.1"}, {:name=>"uc022bqw.1"}, {:name=>"uc022bqx.1"}, {:name=>"uc004coz.1"}]
Databases: Slide 8 of 24
don’t want to CREATE? you still might want to SELECT
Question: How to map a SNP to a gene around +/- 60KB ?
I am looking at a bunch of SNPs. Some of them are part of genes,
but other are not. I am interested to look up +60KB or -60KB of
those SNPs to get details about some nearby genes. Please share
your experience in dealing with such a situation or thoughts on
any methods that can do this. Thanks in advance.
http://www.biostars.org/p/413/
Databases: Slide 9 of 24
example SELECT
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e ’
select
K.proteinID, K.name, S.name,
S.avHet, S.chrom, S.chromStart,
K.txStart, K.txEnd
from snp130 as S
left join knownGene as K on
(S.chrom = K.chrom and not(K.txEnd + 60000 < S.chromStart or
S.chromEnd + 60000 < K.txStart))
where
S.name in ("rs25","rs100","rs75","rs9876","rs101")
’
Databases: Slide 10 of 24
example SELECT result
Databases: Slide 11 of 24
let’s talk about noSQL
http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html
Databases: Slide 12 of 24
(potentially) a good fit for biological data
Databases: Slide 13 of 24
many data sources are “key-value ready”
(or close enough)
http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json
[
{
"2821": "GPI; glucose-6-phosphate isomerase [KO:K01810] [EC:5.3.1.9]",
"2539": "G6PD; glucose-6-phosphate dehydrogenase [KO:K00036] [EC:1.1.1.49]",
"25796": "PGLS; 6-phosphogluconolactonase [KO:K01057] [EC:3.1.1.31]",
...
"5213": "PFKM; phosphofructokinase, muscle [KO:K00850] [EC:2.7.1.11]",
"5214": "PFKP; phosphofructokinase, platelet [KO:K00850] [EC:2.7.1.11]",
"5211": "PFKL; phosphofructokinase, liver [KO:K00850] [EC:2.7.1.11]"
}
]
Databases: Slide 14 of 24
schema-free: save first, worry later
(= agile)
#!/usr/bin/ruby
require "mongo"
require "json/pure"
require "open-uri"
db = Mongo::Connection.new.db(’kegg’)
col = db.collection(’genes’)
j = JSON.parse(open("http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json").read)
j.each do |g|
gene = Hash.new
g.each_pair do |key, val|
gene[:_id] = key
gene[:desc] = val
col.save(gene)
end
end
Ruby code to save JSON from the TogoWS REST service
Databases: Slide 15 of 24
example application - PMRetract
ask later if interested
http://pmretract.heroku.com/
https://github.com/neilfws/PubMed/tree/master/retractions
Databases: Slide 16 of 24
when rows + columns != database
- sometimes a database is overkill
Databases: Slide 17 of 24
example 1 - R/IRanges
Databases: Slide 18 of 24
example 2 - bedtools
http://bedtools.readthedocs.org/en/latest/
Databases: Slide 19 of 24
example 3 - unix join (and the shell in general)
Databases: Slide 20 of 24
when are databases good?
- when data are updated frequently
- when multiple users do the updating
- when queries are complex or ever-changing
- as backends to web applications
Databases: Slide 21 of 24
when are databases not/less good?
- for basic “set operations”
- for sequence data [1]
(?)
[1] no time to discuss BioSQL, GBrowse/Bio::DB::GFF, BioDAS etc.
Databases: Slide 22 of 24
so how did I answer that email?
options(java.parameters = "-Xmx4g")
library(XLConnect)
wb <- loadWorkbook("˜/Downloads/NGS Target list Tumour for Neil.xlsx")
s1 <- readWorksheet(wb, sheet = 1, startCol = 1, endCol = 1, header = F)
s2 <- readWorksheet(wb, sheet = 2, startCol = 1, endCol = 32, header = T)
s4 <- readWorksheet(wb, sheet = 4, startCol = 1, endCol = 3, header = T)
# then use gsub, match, %in% etc. to clean and join the data
# ...
Read spreadsheet into R using the XLConnect package, then “munge”

Mais conteúdo relacionado

Mais procurados

MUC - Moodle Universal Cache
MUC - Moodle Universal CacheMUC - Moodle Universal Cache
MUC - Moodle Universal CacheTim Hunt
 
Book integrated assignment
Book integrated assignmentBook integrated assignment
Book integrated assignmentAkash gupta
 
Running ms sql stored procedures in mule
Running ms sql stored procedures in muleRunning ms sql stored procedures in mule
Running ms sql stored procedures in muleAnilKumar Etagowni
 
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam
 
MySQL5.7で遊んでみよう
MySQL5.7で遊んでみようMySQL5.7で遊んでみよう
MySQL5.7で遊んでみようyoku0825
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung MosbachJohannes Hoppe
 

Mais procurados (9)

Cookies
CookiesCookies
Cookies
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
MUC - Moodle Universal Cache
MUC - Moodle Universal CacheMUC - Moodle Universal Cache
MUC - Moodle Universal Cache
 
Book integrated assignment
Book integrated assignmentBook integrated assignment
Book integrated assignment
 
Running ms sql stored procedures in mule
Running ms sql stored procedures in muleRunning ms sql stored procedures in mule
Running ms sql stored procedures in mule
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
 
MySQL5.7で遊んでみよう
MySQL5.7で遊んでみようMySQL5.7で遊んでみよう
MySQL5.7で遊んでみよう
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach
 

Destaque

MongoDB to Cassandra
MongoDB to CassandraMongoDB to Cassandra
MongoDB to Cassandrafredvdd
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBrian Enochson
 
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013K.Mohamed Faizal
 

Destaque (8)

MongoDB to Cassandra
MongoDB to CassandraMongoDB to Cassandra
MongoDB to Cassandra
 
SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
 

Semelhante a SQL, noSQL or no database at all? Are databases still a core skill?

Why re-use core classes?
Why re-use core classes?Why re-use core classes?
Why re-use core classes?Levi Waldron
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Ludovico Caldara
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators iammutex
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndicThreads
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL DatabaseMohammad Alghanem
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document StoreDave Stokes
 
Open Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraOpen Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraSveta Smirnova
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개r-kor
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management Systempsathishcs
 
Open Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second eraOpen Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second eraAlexander Korotkov
 
Server-Side Development for the Cloud
Server-Side Developmentfor the CloudServer-Side Developmentfor the Cloud
Server-Side Development for the CloudMichael Rosenblum
 
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...Dave Stokes
 
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresDynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresBrent Ozar
 
Open Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational DatabaseOpen Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational DatabaseDave Stokes
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!Dave Stokes
 
All Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesAll Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesDave Stokes
 

Semelhante a SQL, noSQL or no database at all? Are databases still a core skill? (20)

Why re-use core classes?
Why re-use core classes?Why re-use core classes?
Why re-use core classes?
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Let your DBAs get some REST(api)
Let your DBAs get some REST(api)
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document Store
 
Open Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraOpen Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second era
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Open Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second eraOpen Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second era
 
Server-Side Development for the Cloud
Server-Side Developmentfor the CloudServer-Side Developmentfor the Cloud
Server-Side Development for the Cloud
 
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
 
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresDynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
 
Open Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational DatabaseOpen Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational Database
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!
 
All Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesAll Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for Newbies
 

Mais de Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansNeil Saunders
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedNeil Saunders
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesNeil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitNeil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for youNeil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?Neil Saunders
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsNeil Saunders
 

Mais de Neil Saunders (12)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 

Último

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptxCherry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsDeepika Singh
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Cherry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfCherry
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 

Último (20)

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Early Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdfEarly Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdf
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 

SQL, noSQL or no database at all? Are databases still a core skill?

  • 1. SQL, noSQL or no database at all? Are databases still a core skill? Neil Saunders COMPUTATIONAL INFORMATICS www.csiro.au
  • 2. Databases: Slide 2 of 24 alternative title: should David Lovell learn databases?
  • 3. Databases: Slide 3 of 24 actual recent email request Hi Neil, I was wondering if you could help me with something. I am trying to put together a table but it is rather slow by hand. Do you know if you can help me with this task with a script? If it is too much of your time, don’t worry about it. Just thought I’d ask before I start. The task is: The targets listed in A tab need to be found in B tab then the entire row copied into C tab. Then the details in column C of C tab then need to be matched with the details in D tab so that the patients with the mutations are listed in row AG and AH of C tab. Again, if this isn’t an easy task for you then don’t worry about it.
  • 4. Databases: Slide 4 of 24 sounds like a database to me (c. 2004)
  • 5. Databases: Slide 5 of 24 database design is a profession in itself -- KEGG_DB schema CREATE TABLE ec2go ( ec_no VARCHAR(16) NOT NULL, -- EC number (with "EC:" prefix) go_id CHAR(10) NOT NULL -- GO ID ); CREATE TABLE pathway2gene ( pathway_id CHAR(8) NOT NULL, -- KEGG pathway long ID gene_id VARCHAR(20) NOT NULL -- Entrez Gene or ORF ID ); CREATE TABLE pathway2name ( path_id CHAR(5) NOT NULL UNIQUE, -- KEGG pathway short ID path_name VARCHAR(80) NOT NULL UNIQUE -- KEGG pathway name ); -- Indexes. CREATE INDEX Ipathway2gene ON pathway2gene (gene_id);
  • 6. Databases: Slide 6 of 24 know your ORM from your MVC (do you DSL?) http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
  • 7. Databases: Slide 7 of 24 my one tip for today: use ORM = object relational mapping #!/usr/bin/ruby require ’sequel’ # connect to UCSC Genomes MySQL server DB = Sequel.connect(:adapter => "mysql", :host => "genome-mysql.cse.ucsc.edu", :user => "genome", :database => "hg19") # instead of "SELECT count(*) FROM knownGene" DB.from(:knownGene).count # => 82960 # instead of "SELECT name, chrom, txStart FROM knownGene LIMIT 1" DB.from(:knownGene).select(:name, :chrom, :txStart).first # => {:name=>"uc001aaa.3", :chrom=>"chr1", :txStart=>11873} # instead of "SELECT name FROM knownGene WHERE chrom == ’chrM’" DB.from(:knownGene).where(:chrom => "chrM").all # => [{:name=>"uc004coq.4"}, {:name=>"uc022bqo.2"}, {:name=>"uc004cor.1"}, {:name=>"uc004cos.5"}, # {:name=>"uc022bqp.1"}, {:name=>"uc022bqq.1"}, {:name=>"uc022bqr.1"}, {:name=>"uc031tga.1"}, # {:name=>"uc022bqs.1"}, {:name=>"uc011mfi.2"}, {:name=>"uc022bqt.1"}, {:name=>"uc022bqu.2"}, # {:name=>"uc004cov.5"}, {:name=>"uc031tgb.1"}, {:name=>"uc004cow.2"}, {:name=>"uc004cox.4"}, # {:name=>"uc022bqv.1"}, {:name=>"uc022bqw.1"}, {:name=>"uc022bqx.1"}, {:name=>"uc004coz.1"}]
  • 8. Databases: Slide 8 of 24 don’t want to CREATE? you still might want to SELECT Question: How to map a SNP to a gene around +/- 60KB ? I am looking at a bunch of SNPs. Some of them are part of genes, but other are not. I am interested to look up +60KB or -60KB of those SNPs to get details about some nearby genes. Please share your experience in dealing with such a situation or thoughts on any methods that can do this. Thanks in advance. http://www.biostars.org/p/413/
  • 9. Databases: Slide 9 of 24 example SELECT mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e ’ select K.proteinID, K.name, S.name, S.avHet, S.chrom, S.chromStart, K.txStart, K.txEnd from snp130 as S left join knownGene as K on (S.chrom = K.chrom and not(K.txEnd + 60000 < S.chromStart or S.chromEnd + 60000 < K.txStart)) where S.name in ("rs25","rs100","rs75","rs9876","rs101") ’
  • 10. Databases: Slide 10 of 24 example SELECT result
  • 11. Databases: Slide 11 of 24 let’s talk about noSQL http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html
  • 12. Databases: Slide 12 of 24 (potentially) a good fit for biological data
  • 13. Databases: Slide 13 of 24 many data sources are “key-value ready” (or close enough) http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json [ { "2821": "GPI; glucose-6-phosphate isomerase [KO:K01810] [EC:5.3.1.9]", "2539": "G6PD; glucose-6-phosphate dehydrogenase [KO:K00036] [EC:1.1.1.49]", "25796": "PGLS; 6-phosphogluconolactonase [KO:K01057] [EC:3.1.1.31]", ... "5213": "PFKM; phosphofructokinase, muscle [KO:K00850] [EC:2.7.1.11]", "5214": "PFKP; phosphofructokinase, platelet [KO:K00850] [EC:2.7.1.11]", "5211": "PFKL; phosphofructokinase, liver [KO:K00850] [EC:2.7.1.11]" } ]
  • 14. Databases: Slide 14 of 24 schema-free: save first, worry later (= agile) #!/usr/bin/ruby require "mongo" require "json/pure" require "open-uri" db = Mongo::Connection.new.db(’kegg’) col = db.collection(’genes’) j = JSON.parse(open("http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json").read) j.each do |g| gene = Hash.new g.each_pair do |key, val| gene[:_id] = key gene[:desc] = val col.save(gene) end end Ruby code to save JSON from the TogoWS REST service
  • 15. Databases: Slide 15 of 24 example application - PMRetract ask later if interested http://pmretract.heroku.com/ https://github.com/neilfws/PubMed/tree/master/retractions
  • 16. Databases: Slide 16 of 24 when rows + columns != database - sometimes a database is overkill
  • 17. Databases: Slide 17 of 24 example 1 - R/IRanges
  • 18. Databases: Slide 18 of 24 example 2 - bedtools http://bedtools.readthedocs.org/en/latest/
  • 19. Databases: Slide 19 of 24 example 3 - unix join (and the shell in general)
  • 20. Databases: Slide 20 of 24 when are databases good? - when data are updated frequently - when multiple users do the updating - when queries are complex or ever-changing - as backends to web applications
  • 21. Databases: Slide 21 of 24 when are databases not/less good? - for basic “set operations” - for sequence data [1] (?) [1] no time to discuss BioSQL, GBrowse/Bio::DB::GFF, BioDAS etc.
  • 22. Databases: Slide 22 of 24 so how did I answer that email? options(java.parameters = "-Xmx4g") library(XLConnect) wb <- loadWorkbook("˜/Downloads/NGS Target list Tumour for Neil.xlsx") s1 <- readWorksheet(wb, sheet = 1, startCol = 1, endCol = 1, header = F) s2 <- readWorksheet(wb, sheet = 2, startCol = 1, endCol = 32, header = T) s4 <- readWorksheet(wb, sheet = 4, startCol = 1, endCol = 3, header = T) # then use gsub, match, %in% etc. to clean and join the data # ... Read spreadsheet into R using the XLConnect package, then “munge”