Tata AIG General Insurance Company - Insurer Innovation Award 2024
Building A Web Application To Monitor PubMed Retraction Notices
1. Building a Web Application to Monitor PubMed
Retraction Notices
Neil Saunders
CSIRO Mathematics, Informatics and Statistics
Building E6B, Macquarie University Campus
North Ryde
December 1, 2011
3. Project Aims
Monitor PubMed for retractions
Retrieve retraction data and store locally for analysis
Develop web application to display retraction data
8. EInfo example script
#!/usr/bin/ruby
require ’rubygems’
require ’bio’
require ’hpricot’
require ’open-uri’
Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new
url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db="
ncbi.einfo.each do |db|
puts "Processing #{db}..."
File.open("#{db}.txt", "w") do |f|
doc = Hpricot(open("#{url + db}"))
(doc/’//fieldlist/field’).each do |field|
name = (field/’/name’).inner_html
fullname = (field/’/fullname’).inner_html
description = (field/’description’).inner_html
f.write("#{name},#{fullname},#{description}n")
end
end
end
9. EInfo script - output
ALL,All Fields,All terms from all searchable fields
UID,UID,Unique number assigned to publication
FILT,Filter,Limits the records
TITL,Title,Words in title of publication
WORD,Text Word,Free text associated with publication
MESH,MeSH Terms,Medical Subject Headings assigned to publication
MAJR,MeSH Major Topic,MeSH terms of major importance to publication
AUTH,Author,Author(s) of publication
JOUR,Journal,Journal abbreviation of publication
AFFL,Affiliation,Author’s institutional affiliation and address
...
10. MongoDB Overview
MongoDB is a so-called “NoSQL” database
Key features:
Document-oriented
Schema-free
Documents stored in collections
http://www.mongodb.org/
11. Saving to a database collection: ecount
#!/usr/bin/ruby
require "rubygems"
require "bio"
require "mongo"
db = Mongo::Connection.new.db(’pubmed’)
col = db.collection(’ecount’)
Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new
1977.upto(Time.now.year) do |year|
all
= ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})
term
= ncbi.esearch_count("Retraction of Publication[ptyp] #{year}[dp]",
{"db" => "pubmed"})
record = {"_id" => year, "year" => year, "total" => all,
"retracted" => term, "updated_at" => Time.now}
col.save(record)
puts "#{year}..."
end
puts "Saved #{col.count} records."
21. Sinatra Application Code - main.rb
# main.rb
configure do
# a bunch of config stuff goes here
# DB = connection to MongoDB database
# timeline
timeline = DB.collection(’timeline’)
set :data, timeline.find.to_a.map { |e| [e[’date’], e[’count’]] }
end
# views
get "/" do
haml :index
end