The "names and taxa" information space is often thought of as being composed of three layers:
Taxonomic concepts
Code governed nomenclatural acts
Name occurrences
In many circumstances the distinction of these layers is blurred, leading to confusion and inefficiencies in information management. To date, IPNI has been mainly concerned with the middle layer comprising ICBN governed nomenclatural acts, and is formed of three key components: curated data, information services to expose this data, and dedicated editorial staff to provide nomenclatural expertise.
IPNI will be advanced from its current state to better connect to the layers above (taxonomic concepts) and below (name occurrences). This will require the expansion of data holdings, improved linkages, and the development of information services and associated workflows. These will be offered to key actors including name authors, publishers, taxonomists and managers of biodiversity information.
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Advancing the International Plant Names Index (IPNI)
1. Advancing the International Plant Names Index (IPNI) Nicky Nicolson, Alan Paton, Jim Croft, James Macklin, Paul Morris, Greg Whitbread, Kanchi Gandhi
28. Where IPNI data are placed Any name occurrence: e.g. specimens, reports, literature citation concepts Standard form of name
29.
30.
31. Links to the Concept Layer Example The Plant List
32.
33.
34.
Notas do Editor
Elecronic publication Lisd plant list example
Sample IPNI record
Standardised author
Standardised publication title and collation
Distribution
Type specimen information
Links to other IPNI records
Code annotation (on linked record)
Full record history
Resolvable persistent identifier (LSID), returns structured data in a standard format.
Could mention issues with this here – names aren’t entered until the hard copy arrives at K / HUH library – we estimate 2 year time lag between publication data and entry to IPNI. Stats derived from 2004 onwards. IK editors discussion: Could do some analysis on this with sandwich student
Orange: authors (88.6%) Green: publication titles Author standardisation: 1% rise requires creation of over 25,000 links Checking intensive - often ambiguity in the non-standard, unlinked abbreviations, e.g. un-standardised string ' Henr. ' was found to be: Henrickson in this string: ( Henr . ) S.L.Welsh & Crompton Henrard in this string: ( Henr . ) Clayton
These shown as number of epithets modified per month July 2010 is when we did a big OCR fix
Screenshot showing propagation of errors on next slide
OCR error translated mc -> rne – dates from IK digitisation T he old version persists in many datasets that have been derived from IPNI. Linking (via persistent identifier – as described in later slide) would ensure that derived datasets benefit from this kind of curation.
Can mention the GTI work here
Stats page on the site at http://www.ipni.org/stats.html contains these tables from 2004 onwards This is data for most recent full year (2010)
BUT the response to user queries has very little visibility - point to point email, only visible to participants, even though the issues discussed may be of wider relevance
NN/AP: Perhaps we should add the average number of searches per day to the stats page.
Division of labour btw nomenclature and taxonomy: IPNI handles citation of name, reference and authorship and objective links such as combination – basionym. Checklists handle taxonomic synonymy and references supporting the assertion of concepts Referencing datasets benefit from ongoing curation of IPNI data.
This string translation higher value than purely lexical approach as an editor has checked it. Edit distance – number of single character transpositions required to modify one string into another NN 2011-07-14: here is another example which might better explain what I am on about: Plectranthus macrophyl i us -> Plectranthus macrophyl l us and Plectranthus m i crophyllus -> Plectranthus m a crophyllus same edit distance (1 character) BUT: former is high value – checked by editor, latter programmatically derived and a much more dangerous assumption to make
Data structure 10 years old – needs re-engineering to deal with requirements. NN: Moved data structure point to the notes as the crux is not just the data structure but the idea that we have a single data structure – technically I’d like us to split between top copy version for editing and multiple (dumber, flatter) slaves to service API calls etc – these can be hit as hard as we like without impacting on the editors workflow. Faceting – different routes to the data.