O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

BHL Tech Report

1.444 visualizações

Publicada em

Technical Report to the Biodiversity Heritage Library Institutional Council on 22 Mar 2010 at American Museum of Natural History

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

BHL Tech Report

  1. 1. Technology Review: BHL Institutional Council Mtg 22 Mar 2010
  2. 2. Stats
  3. 3. Now online <ul><li>40,000 titles </li></ul><ul><li>76,000 volumes </li></ul><ul><li>28.7 million pages </li></ul><ul><li>70 million name strings </li></ul><ul><li>58 million confirmed names </li></ul><ul><li>1.4 million unique names </li></ul>
  4. 5. Size of BHL content *today*
  5. 6. Bigger than a breadbox, smaller than a sperm whale http://biodiversitylibrary.org/page/5225013
  6. 7. Usage
  7. 8. 1.1mil visits from 231 countries since launch
  8. 9. Referrers: 2008 - 2009
  9. 10. Referrers: 2010 Jan 1 – Mar 15, 2010
  10. 11. Stats unique to our tools
  11. 12. PDF Articlizing stats
  12. 13. # Items by Library Items   Institution Name 11,476   University of California Libraries (archive.org) 11,244   MBLWHOI Library 9,537   Smithsonian Institution Libraries 6,461   New York Botanical Garden 5,129   Harvard University, MCZ, Ernst Mayr Library 4,932   Gerstein - University of Toronto (archive.org) 3,882   Natural History Museum, London 3,350   Missouri Botanical Garden 2,821   Library of Congress (archive.org) 2,509   University of Illinois Urbana Champaign 2,029   American Museum of Natural History Library 1,996   NCSU Libraries (archive.org) 1,692   UMass Amherst Libraries (archive.org) 1,296   Webster Family Library of Veterinary Medicine (archive.org) 1,216   Robarts - University of Toronto (archive.org) 1,100   Canadiana.org (archive.org) 621   Boston Public Library (archive.org) 579   University of New Hampshire Library (archive.org) 516   Montana State Library (archive.org) 282   Prelinger Library (archive.org)
  13. 14. # Names by Library Names   Institution Name 14,109,080   MBLWHOI Library 12,241,186   Smithsonian Institution Libraries 9,105,969   New York Botanical Garden 7,860,553   Missouri Botanical Garden 5,323,730   University of California Libraries (archive.org) 4,818,365   Harvard University, MCZ, Ernst Mayr Library 4,776,527   Gerstein - University of Toronto (archive.org) 3,050,242   Natural History Museum, London 2,387,731   American Museum of Natural History Library 2,292,570   NCSU Libraries (archive.org) 2,106,182   UMass Amherst Libraries (archive.org) 1,836,281   University of Illinois Urbana Champaign 532,635   Earth Sciences - University of Toronto (archive.org) 518,695   Robarts - University of Toronto (archive.org) 225,357   Canadiana.org (archive.org) 177,283   Boston Public Library (archive.org) 97,663   Library of Congress (archive.org) 83,089   Prelinger Library (archive.org) 75,113   University of Connecticut Libraries (archive.org) 71,512   The Field Museum
  14. 15. “Taxonomic Density” by Library Simple: avg. # names / item Tax. Density Names Items   Institution Name 2,346.4 7,860,553 3,350   Missouri Botanical Garden 1,409.4 9,105,969 6,461   New York Botanical Garden 1,283.5 12,241,186 9,537   Smithsonian Institution Libraries 1,254.8 14,109,080 11,244   MBLWHOI Library 1,244.8 2,106,182 1,692   UMass Amherst Libraries (archive.org) 1,176.8 2,387,731 2,029   American Museum of Natural History Library 1,148.6 2,292,570 1,996   NCSU Libraries (archive.org) 968.5 4,776,527 4,932   Gerstein - University of Toronto (archive.org) 939.4 4,818,365 5,129   Harvard University, MCZ, Ernst Mayr Library 785.7 3,050,242 3,882   Natural History Museum, London 731.9 1,836,281 2,509   University of Illinois Urbana Champaign 463.9 5,323,730 11,476   University of California Libraries (archive.org)
  15. 16. Q. How many species have been reported only once? [Taxacom] <ul><li>As of March 1, 2010, BHL had identified more than 70 million potential name strings across its 28 million digitized pages using uBio's TaxonFinder. 58 million of those name strings were confirmed as a name with a NameBankID. Of that set, 1,491,000 name strings were unique. 329,000 of those unique names were found on a single page in BHL. </li></ul>
  16. 17. Application / Portal
  17. 18. New since November <ul><li>New color scheme </li></ul><ul><li>IA / CDL content </li></ul><ul><ul><li>+ names indexing </li></ul></ul><ul><li>APIs </li></ul><ul><li>OAI interface </li></ul><ul><li>Work on Darwin’s Library annotations </li></ul><ul><li>Primary / Secondary titles enhancements </li></ul><ul><li>Started testing solutions for “orange bag problem” </li></ul><ul><li>Working with EOL on nomenclatural acts service </li></ul>
  18. 19. Consumers <ul><li>EarthCape </li></ul><ul><li>BioGuid </li></ul><ul><li>BioSTOR </li></ul><ul><li>JSTOR – in discussion </li></ul><ul><li>Research projects </li></ul><ul><li>BREC - NSF </li></ul><ul><li>Conjecturator - NSF </li></ul><ul><li>Darwin’s Library – NEH/JISC </li></ul><ul><li>Hong Cui @ University of AZ - NSF </li></ul>
  19. 20. OCR correction using WikiSource http://biostor.org/wiki/Page:Spixiana1999zool.djvu/293
  20. 21. Partnership Statement <ul><li>What, if anything, do we need as an agreement between parties for use of BHL materials? </li></ul><ul><ul><li>Always open access – more a service agreement </li></ul></ul><ul><li>Consider: What is true value of $50 we paid to scan BookX when inserted into other research </li></ul>
  21. 22. Terms of Use / Privacy Policy <ul><li>Need resolution to move forward on publishing APIs </li></ul>
  22. 23. Hardware / Infrastructure
  23. 24. WH cluster <ul><li>Transferred 28,000 volumes from IA </li></ul><ul><ul><li>22TB </li></ul></ul><ul><li>44,000 more in the queue </li></ul><ul><ul><li>Started Friday </li></ul></ul><ul><li>Complete BHL + IA/CDL by May </li></ul><ul><li>Need to discuss implications with BHL-Europe </li></ul>
  24. 25. Cluster ~$17,ooo USD
  25. 26. DuraCloud <ul><li>Pilot has added partners </li></ul><ul><ul><li>BHL </li></ul></ul><ul><ul><li>NYPL </li></ul></ul><ul><ul><li>WGBH </li></ul></ul><ul><ul><li>More to come </li></ul></ul><ul><li>10TB of content uploaded </li></ul><ul><ul><li>Good test set, not complete, not intended to be </li></ul></ul><ul><li>Test download speeds with BHL-E & BHL-Au </li></ul><ul><li>June 30 deadline for uploading without $$ </li></ul>
  26. 27. Global BHL
  27. 30. BHL-Europe
  28. 31. BHL-Europe <ul><li>http://biodiversitylibrary.eu </li></ul><ul><li>Hiring WP2 leader </li></ul><ul><li>Moving bidlist to Vienna </li></ul><ul><li>Building infrastructure </li></ul><ul><li>Getting content </li></ul><ul><li>Submitting metadata to Europeana </li></ul>
  29. 32. BHL-China
  30. 33. BHL-China <ul><li>http://bhl-china.org </li></ul><ul><li>Still working out issues for scanning </li></ul><ul><li>Plan to scan 48,000 books / year </li></ul><ul><ul><li>2 shifts </li></ul></ul><ul><ul><li>10 Scribes </li></ul></ul><ul><li>Excited about Global Tech meeting </li></ul><ul><ul><li>Will come prepared with ideas for change </li></ul></ul>
  31. 35. BHL-Australia
  32. 36. BHL-Australia <ul><li>http://ec2-75-101-224-221.compute-1.amazonaws.com/ </li></ul><ul><ul><li>Took code & easily ran in EC2 </li></ul></ul><ul><li>Offered usability assistance </li></ul><ul><li>Planning workshop in Au in May </li></ul><ul><li>September 2010 relaunch of ALAu </li></ul><ul><li>Ready to go </li></ul>
  33. 37. BHL-Brasil
  34. 38. BHL-Brasil <ul><li>SciELO content ready for import </li></ul><ul><ul><li>Can automate ingest into CiteBank </li></ul></ul>
  35. 39. CiteBank
  36. 40. Ingesting content from Publishers <ul><li>Big publishers - auto ingest </li></ul><ul><ul><li>Machine to machine </li></ul></ul><ul><ul><li>Set up, configure & go </li></ul></ul><ul><li>Small publishers - need help </li></ul><ul><ul><li>Niche content </li></ul></ul><ul><ul><li>Likely to provide some assistance, but will require it </li></ul></ul><ul><li>Individual users – need help </li></ul><ul><ul><li>Need a lot of individual attention </li></ul></ul><ul><ul><li>Big community & opportunity, but takes tending </li></ul></ul>Publishing platform also important
  37. 41. Similar missions / Staffing issues <ul><li>PubMed Central </li></ul><ul><li>PLoS </li></ul><ul><li>JSTOR </li></ul><ul><ul><li>All with multiple staff to handle ingest, inquiries </li></ul></ul>
  38. 42. CiteBank Possibilities <ul><li>Need 2 years of developer work to make it bigger </li></ul><ul><li>Or… </li></ul><ul><li>Need 2 years of content assistance to make it better </li></ul><ul><ul><li>fill data into existing structure </li></ul></ul><ul><li>Biblio is a good start, but needs some tuning for biodiversity literature </li></ul>
  39. 43. TL3: GRIB <ul><li>Taxonomic Literature 3: The Global Reference Index to Biodiversity </li></ul><ul><li>Critical, yet absent: comprehensive list of biodiversity literature, complete with all variants in spelling, known identifiers over time, bibliographic descriptions, and recommendations on how to cite each work. </li></ul><ul><li>*Big* job, but doable </li></ul><ul><ul><li>Modeled on & worked in association with Taxonomic Literature 2 </li></ul></ul>
  40. 44. M agic U nicorn S yndrome Fearing
  41. 45. Reallocation
  42. 46. <ul><li>Radical question: If #BHL could offer you more content or more services, which would you choose? &quot;Both&quot; not an option in this experiment. </li></ul>Posted to Twitter http://twitter.com/chrisfreeland/status/10575364681
  43. 47. “ CONTENT!” <ul><li>@chrisfreeland Given that I make my own services, content is what I want #bhl #allyourdataarebelongtome </li></ul><ul><li>@chrisfreeland at this point of time more people will benefit from more content than more services. unless we treat indexing as service </li></ul>