The document summarizes the agenda and presentations for CrossRef's 2010 annual member meeting workshops in London. The workshops will cover system updates on the query and new deposit systems, CrossMark, CrossCheck, metadata quality, cited-by linking, DOI workflow issues, and books. There will also be a "boot camp" session. Regarding the system rewrite, the query system is nearly complete but the deposit system rewrite will begin in January 2011 and take until mid-year to develop with extensive testing. The goal is to improve performance, extensibility, and CrossRef's control over and insight into the systems.
1. 1
CrossRef 2010 Annual Member Meeting - London
Page 1
CrossRef Annual Meeting – London
Workshops
15 November 2010
2. 2
CrossRef 2010 Annual Member Meeting - London
Page 2
Workshops Agenda
9:30-10:00 Coffee & Tea
10:00-11:30 System Update ….. Andrew Gilmartin, Senior Software Developer
Chuck Koscher, Director of Technology
11:30-12:00 CrossMark …………Geoff Bilder, Director of Strategic Initiatives
12:00-12:30 CrossCheck ………. Kirsty Meddings, Product Manager
12:30-1:15 Lunch
1:15-2:15 Metadata Quality ….Patricia Feeney, Product Support Manager
2:15-2:45 Cited-by Linking ……Carol Anne Meyer, Business Development and Marketing Manager
Chuck Koscher
2:45-3:00 Break
3:00-4:00 DOI Workflow Issues, Working with Vendors ……. Carol Anne Meyer
4:00-4:45 Boot Camp …………Carol Anne Meyer
Tim Pickard, System Support Analyst/Administrator
4:45-5:15 Books ……………….Carol Anne Meyer,
3. 3
CrossRef 2010 Annual Member Meeting - London
Page 3
System Update
System status
Rewrite review
Rewrite implementation
Discussion
9. 9
CrossRef 2010 Annual Member Meeting - London
Page 9
System status
Deposit processing
Suspended for 2+ weekends for Oracle DB upgrade (to 11g)
Processing times remain the same. (50% under 5 min, 30% more
under 1 hour)
Large re-deposits (Elsevier plans for 2011)
Schema relatively unchanged in 2+ years (keep adding MIME types)
Deposit focus areas for 2011 (other than the re-write)
Investigating a PDF upload option
(for depositing a DOI and the article’s references)
Modify WebDeposit to allow users to edit an existing DOI’s metadata
Maintenance on NLM DTD deposit tool
12. 12
CrossRef 2010 Annual Member Meeting - London
Page 12
System rewrite
The Query System (QS), where are we?
Its taking longer than we thought.
QS is 99% ready, periodically in service since starting mid Sept.
Last vexing problem solved (database connection dead-lock)?
Performance improvement is very encouraging.
Metrics and measurement capability greatly improved.
The Deposit System (DS), where are we?
Initial design discussions have been held, documentation is under way.
Implementation to start in January
Development will take until mid year, then lots of testing
Data clean up will be part of the migration process (mainly titles)
13. 13
CrossRef 2010 Annual Member Meeting - London
Page 13
⋅ Modularity of design
⋅ Utility of APIs where possible
⋅ Data stores that enable XML capabilities
⋅ Minimize dependency on proprietary systems
•That CrossRef should ultimately own the intellectual property in the software at
the heart of its operations
• That CrossRef should not risk or jeopardize the reliability and throughput offered
by the existing system
• That CrossRef should remain free to develop further applications for other
purposes which need to interface to the reference-linking systems and/or its data
System rewrite
Rewrite 2 Working Group – Final report November 2008
14. 14
CrossRef 2010 Annual Member Meeting - London
Page 14
O Unit testing (regression testing)
O Scriptable data ingestion work flow
F Richer metadata querying capability
F Integrated data harvesting capabilities
F Dealing with references using other character sets
F Crawling of content to ingest it Vs. making deposits
F Depositing of non journal content
F Matching unstructured references using full text of equiv
F Querying of non journal content
F Real time, cited-by queries - with data-driven APIs
F More content types, including language variants
F More granular typing of journal articles
F Improved reporting facilities
F More useful user interface for members
System rewrite Rewrite 2 Working Group – Final report November 2008
A Solve NFS issue
A Federate architecture
A Database redesign
A Redesign event notification model (replace email)
O Improved title management and control
O Better publisher/member management model
O Daily testing/monitoring (data integrity)
O Built in health and status monitoring
O Performance improvements and queue management
Now Soon Later
15. 15
CrossRef 2010 Annual Member Meeting - London
Page 15
System rewrite
Technical Objectives
Rework a 9 year old system
Address a declining performance situation
Improve administrative aspects (better control and reporting)
Facilitate extensibility
Staff’s better able to respond due to operational insight
Business Objectives
Develop internal capabilities ($ for every change Atypon makes)
Secure an independent path (continuity)
Benefit of being on a ‘shared’ platform nearing zero
Maintain access to technical expertise
16. 16
CrossRef 2010 Annual Member Meeting - London
Page 16
Late 2010 thru mid 2011
HAProxy
HTTP Traffic
MySQLLucene BerkelyDB
FrontEnd QS
(Spring)
(Tomcat)
Deposit System
(old Atypon EDS)
BackEnd ServicesActive MQ
(messaging)
Oracle
(prime)
Oracle
(active-stndby) Constant
Replication
Oracle Group
New System
External messaging
(email, etc)
System rewrite
17. 17
CrossRef 2010 Annual Member Meeting - London
Page 17
Q3 2011
HAProxy
HTTP Traffic
MySQLLucene BerkelyDB
FrontEnd QS
(Spring)
(Tomcat)
BackEnd ServicesActive MQ
(messaging)
Oracle
(prime)
Oracle
(active-stndby) Constant
Replication
Oracle Group
New System
External messaging
(email, etc)
Deposit Processing
FrontEnd DS
(Spring)
(Tomcat)
• File Upload
• Deposit reports
System rewrite
18. 18
CrossRef 2010 Annual Member Meeting - London
Page 18
Deposit DB
(prime)
Oracle Group
System rewrite
Deposit DB
(standby)
Oracle
Replication
Query DB
(prime)
Query DB
(secondary)
Oracle
Replication
New Deposit System
Database
Updater
Primary Datacenter
Deposit DB
(prime)
Query DB
(prime)
Recovery Datacenter
19. 19
CrossRef 2010 Annual Member Meeting - London
Page 19
Query system feature changes
Tweaks to the matching logic (discoveries made porting the code)
Fixed some nagging characteristics
Aggregate email notices for alerts
Implement HTTP free-text matching (still needs work, ‘alpha’)
Process free-text references for cited-by (done, stable, uses
refXpress)
Establish better user model:
1. Username & passwords for members (Query and deposit)
2. Registered email address of non members (Query only)
System rewrite
Use
Registration
Form
Receive
Email
Use
Validation
Form
22. 22
CrossRef 2010 Annual Member Meeting - London
Page 22
Uses refXpress to break free-text into XML suitable for
running a metadata query
23. 23
CrossRef 2010 Annual Member Meeting - London
Page 23
Uses QS Formatted Citation Parse to break free-text into
XML suitable for running a metadata query, if that fails uses
QS Formatted Citation Search (with high threshold) to search
Lucene index for a DOI.
24. 24
CrossRef 2010 Annual Member Meeting - London
Page 24
But be careful !
<citation key="b53_366">
<unstructured_citation>
53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T.
Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic
acid-
and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 .
</unstructured_citation>
</citation>
<doi type="journal_article">
10.1034/j.1399-3011.1999.00076.x</doi>
<issn type="print">1397-002X</issn>
<issn type="electronic">1399-3011</issn>
<journal_title>Journal of Peptide Research</journal_title>
<contributors>
<contributor sequence="first" contributor_role="author">
<given_name>O.S.</given_name>
<surname>Gudmundsson</surname>
</contributor>
</contributors>
<volume>53</volume>
<issue>4</issue>
<first_page>383</first_page>
<last_page>392</last_page>
<year media_type="print">1999</year>
<publication_type>full_text</publication_type>
<article_title>
The effect of conformation on the membrane permeation of
coumarinic acid- and phenylpropionic acid-based cyclic
prodrugs of opioid peptides
</article_title>
<doi type="journal_article">
10.1034/j.1399-3011.1999.00077.x</doi>
<issn type="print">1397-002X</issn>
<issn type="electronic">1399-3011</issn>
<journal_title>Journal of Peptide Research</journal_title>
<contributors>
<contributor sequence="first" contributor_role="author">
<given_name>O.S.</given_name>
<surname>Gudmundsson</surname>
</contributor>
</contributors>
<volume>53</volume>
<issue>4</issue>
<first_page>403</first_page>
<last_page>413</last_page>
<year media_type="print">1999</year>
<publication_type>full_text</publication_type>
<article_title>
The effect of conformation of the acyloxyalkoxy-based cyclic
prodrugs of opioid peptides on their membrane permeability
</article_title>
Still yields this
But the correct answer is this
25. 25
CrossRef 2010 Annual Member Meeting - London
Page 25
Deposit system feature changes
Parse the XML prior to accepting the upload
Process XML, register DOIs regardless of metadata ingestion
problems
Provide aggregated deposit reports (daily?)
Integrate Schematron checks into deposit process
Robust title ownership model, not based on prefix, with shared
ownership options
Separate deposit metadata organization from query metadata
organization (ex. Allow title substitution
System rewrite