An invited talk at the American Museum of Natural History, given as part of the Richard Gilder Graduate School Program. New York, U.S.A. November 24, 2008.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Small pieces loosely joined: towards a unified theory of biodiversity for the web
1. Small pieces loosely joined
Towards a unified theory of
biodiversity for the web
Vincent S. Smith
2. Macro taxonomy
The big picture of taxonomic research
Goal…
• Inventory the Earth’s species
• Document their relationships
• “Publish” these data
Data set…
• 1.8 M described spp. (10M names)
• 300M pages (over last 250 years)
• 1.5-3B specimens
People…
• 4-6,000 scientists
• 30-40,000 “pro-amateurs”
• Many more citizen scientists?
3. Micro taxonomy
The practice of taxonomic research
Sociology…
• Parochial
• Specialized experts
• Fragmented & distributed
Methodology…
• Different (domain specific)
• Communities of practice
• Non transferable skills
Output…
• Heterogeneous & scattered
• High volume, low impact
• Hard to find (use)
How do we integrate micro &
macro taxonomy for the Web?
5. What is a Scratchpad?
A website for you & your community
1 2 3
Your data Uploaded & Published & reviewed
tagged on your site
6. What is a Scratchpad?
A website for you & your community
1 2 3
Your data Uploaded & Published & reviewed
tagged on your site
Fast Intuitive Fit for use
7. What can Scratchpads do?
Import, manage, search & browse:
Specimens
DNA & Phylogenies
Literature Images
8. What can Scratchpads do?
Integration & connectivity within & between sites
Specimens
DNA & Phylogenies
Taxonomy
Literature Images
9. What can Scratchpads do?
In summary:
+Administration +Groups +Specimens
-Change your site information -Creating a group -Creating a record
-Change you front page -Subscribing to a group -Importing from a spreadsheet
-Change your logo +Image -Linking specimen & location records
-Activity and access logs -Uploading & basic annotation -Linking specimen & pub. records
+Backup -Linking image & location records +Tasks
-Backing up your data -Linking image & specimen records -Creating a tasklist
-Restoring your data -Linking image & publication records +Taxonomy
+Bibliography -Overlay annotations on images -Importing from a spreadsheet
-Creating a record +Layout -Importing from ClassificationBank
-Importing from a ref. manager -Change your theme -Starting from scratch
-Exporting to a reference manager -Menus -Taxonomy manager
+Blog -Blocks and sidebars -Displaying a classification
-Creating and adding a blog +Locations -Adding names
+Custom Content -Creating a record -Deleting names
-Defining a CCK -Importing from a spreadsheet -Taxonomy & panels
-Importing from a spreadsheet +Pages +Users
-Creating a custom view -Creating, editing, cloning & deleting -Your settings
+Fileshare -Configuring the panels template -Adding a new user
-Creating and using a fileshare +Panels -User roles and permissions
+Forum -Adding & configuring content -Adding and editing user profile fields
-Altering the forum settings -Creating a new panel -Logging in
-Creating a container for a forum -Citing a Panels page +Webform
-Creating a new forum +Phylogeny -Creating and using webforms
-Creating a new topic inside a forum -Adding a phylogenetic tree
11. Current Scratchpads
Ants
Sites: 70+ Bees
Beetles
Users: 850+ Big-headed flies
Birds
Pages: 130k Blackflies
Ciliates
Since March 2007 Cockroaches
Dragon Trees
Dung Beetles
False Buttonweed
Flat worms
Flies
Foraminifera
Fossil Insects
Fungus Gnats
Holometabola
Leaf-miner Flies
Lice
Lichens of Bermuda
Malvaceae
Megalastrum ferns
Milichiid flies
Mosquitoes
Mosses
Nannotax fossils
Nepticuloid moths
Palms
Pearl oysters
Polychaete worms
Scaleworms
Stick insects
Sulawesi Ferns
Termites
Triticid grasses
Weevils
Wood Ferns
12. Scratchpad visitors
Tracking visitors across sites
Key monthly statistics
- 50,000 page views
- 6,000 visitors
- 8 minutes on site
- 50% returning visits
(average per month 08’)
13. Scratchpad applications
A multipurpose, flexible technology
eBooks
4th Edition Howard & Moore, Birds of the world
(fact checking, data compilation, 2010, funding)
14. Scratchpad applications
A multipurpose, flexible technology
eJournals
European Mosquito Bulletin (ISSN 1460-6127), Phasmid Studies (ISSN 0966-0011)
(submission, review, & dissemination of articles)
15. Scratchpad applications
A multipurpose, flexible technology
Image galleries
Nanno fossils, Cockroaches, Stick insects, Flatworms, Grasses, Lichens & many more…
(rapid upload, annotation, & display of images)
16. How do Scratchpads work?
Getting a Scratchpad
Requirements
• Biological focus
• Agree to T&C’s (click-thru)
• CC license “by-nc-sa”
Application
http://scratchpads.eu/apply
• Maintainer
• Scope/Mission/API Keys
• (Sub)domain name
Content
• Unrestricted (overlapping)
• No branding (focus on authors)
• Value added
17. How do Scratchpads work?
Using a Scratchpad
Management
• User categories (maintainer, ed. contrib.)
• Public / private content (flexible groups)
• Admin. page (site settings & behavior)
Data Input
• Content types (biblio, maps, “page” etc)
• Forms, managers, Excel, EndNote etc
• Custom content (add or extend data types)
Tagging (indexing)
• Taxonomy terms (2M +)
• Multiple classifications
• Auto-tagging
18. Autotagging
Indexing data to make it findable
1. Create content
(e.g. reference)
Journal citation
2. Find terms mentions taxon name
(Autotag)
3. Submit
(Index)
19. Autotagging
Indexing data to make it findable
1. Create content
(e.g. reference)
2. Find terms
(Autotag)
Matches taxonomy
term (Drag & Drop)
3. Submit
(Index)
20. Autotagging
Indexing data to make it findable
1. Create content
(e.g. reference)
2. Find terms
(Autotag)
3. Submit Page tagged (indexed)
(Index) with taxon name
21. How do Scratchpads work?
Indexing data to make it findable
• Tagged data can be
presented differently
• For example as part of
a traditional bibliography
• Or as small windows
or “panels” of data
22. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
Types of Scratchpad Panel…
Built with “tagged data”
Personalized
Common instructions Bibliographic
names literature
Taxonomic Files and
hierarchies documents
Photographs & Specimen
illustrations records
Customized Phylogenetic
content trees
23. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
Dynamically built species pages
24. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
Browsed through a taxonomy
25. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
Including 3rd party content
26. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
With data curation tools
27. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
Listing all “authors”
28. How do Scratchpads work?
Integrating data & “publishing” in a Scratchpad
Dated, permanent & citable
29. How do Scratchpads work?
Adjusting the panels layout
Choose which panels to display
30. How do Scratchpads work?
An example based on the Catalogue of Life classification
2 million taxon pages
Open curation at http://catlife.myspecies.info
33. A unified theory of biodiversity?
BHL, EOL and scholarly journals
Biodiversity Heritage Library
• Digitising heritage literature
Encyclopedia of Life
• A web page for every species
Scholarly Journals
• Traditional publishing
34. Biodiversity Heritage Library
“Digitizing biodiversity literature”
• Biodiversity publications since 1469
- 5.4 million books
- 800,000 monographs
- 40,000 periodicals
• Held by Natural History libraries
E.g., NHM holds more than 1M books, 250k
monographs & periodicals, 0.5M artworks
• BHL partnership of 10 Nat. Hist. libraries
• Sharing the digisation of contents
• Focus on out of copyright materials
• Partnership with “Internet Archive”
• Make the contents “findable”
35. Biodiversity Heritage Library
“Digitizing biodiversity literature”
1. Scan (photograph)
2. Extract text (OCR)
3. Find keywords
- Taxonomic names
- Author names
- Citations
- Collection data
- Morphological data
- Descriptions
- Identification keys
- Illustrations
- Photographs
1 scribe machine, 3,500 pages per shift per day
34 scribe machines now in operation
36. Biodiversity Heritage Library
“Digitizing biodiversity literature”
1. Scan
2. Extract text (OCR)
3. Find keywords
- Taxonomic names
- Author names
- Citations
- Collection data
- Morphological data
- Descriptions
- Identification keys
- Illustrations Palma, R.L., and
- Photographs R.L.C. Pilgrim.
2002. A revision
of the genus
Naubates
(Insecta:
Phthiraptera:
Philopteridae).
J. R. Soc. N.Z.
32:7-60.
37. Biodiversity Heritage Library
“Digitizing biodiversity literature”
1. Scan
2. Extract text (OCR)
3. Find keywords
- Taxonomic names
- Author names
- Citations
- Collection data
- Morphological data
- Descriptions
- Identification keys
- Illustrations Palma, R.L., and
- Photographs R.L.C. Pilgrim.
2002. A revision
4. Index of the genus
Naubates
(Insecta:
5. Put on the web Phthiraptera:
Philopteridae).
J. R. Soc. N.Z.
6. 10M pp. to date 32:7-60.
38. Scratchpads and BHL
Creating a community built virtual taxonomic library
Not
Yes
Yet?
Scratchpads as a tool to add articles (and markup) to BHL?
39. Encyclopedia of Life
“A web page for every species”
• A web page for all 1.8M species
• $25m funding (5 years)
- MacArthur and Sloan Foundations
• Multiple audiences
- Science & outreach
• Megascience mashup
- Aggregating data from the web
• 10 years to complete
- First draft 2008, “finished” 2017!
• Struggling to find an identity?
- Competition, vetting, growth, credit
• A possible publishing platform?
- LifeDesks / Scratchpads
40. Journals Articles
Scholarly communication in taxonomy & systematics
• Fragmented
• Mostly commercial
• Data poor
• Fixed audience
- Hard to repurpose
• Possible role for EoL?
- Web publishing platform (cf Wikipedia)
• Zootaxa
- 15% n. spp; 50 spp. a week!
• Scratchpads / EoL / Zootaxa Biodiversity
- MS Word Template (markup) Journals
- Simultaneous publication
41. Summary
“Small pieces loosely joined”
1. Bringing data together
Biodiversity studies are data rich, poorly archived & ever changing
2. Bringing people together
Biodiversity researchers are few in number, fragmented & highly distributed
3. Bringing science together
Biodiversity science demands a different approach to addressing BIG questions
BIG IS DIFFERENT
New opportunities & new challenges!
42. Thanks…
Simon Rycroft Dave Roberts Kehan Harman
Ben Scott Edward Baker Irina Brake Vladimir Blagoderov