View a video recording here: https://vimeo.com/195024485
Franz & Sterner @ #TDWG16 - "A new power balance is needed for trustworthy biodiversity data". Talk # 1134, Friday, December 09, 2016, 11:30 am. Session Contributed Papers 05: Data Gaps, Trust, Knowledge Acquisition. See https://mbgserv18.mobot.org/ocs/index.php/tdwg/tdwg2016/schedConf/program
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity data
1. A new power balance is needed
for trustworthy biodiversity data
Please
@taxonbytes
Nico Franz1 & Beckett W. Sterner1
With contributions by Edward Gilbert1, Andrew Johnston1,
Guanyang Zhang1, Bertram Ludäscher2 & Alan Weakley3
1 School of Life Sciences, Arizona State University
2 iSchool, University of Illinois at Urbana-Champaign
3 Herbarium, University of North Carolina at Chapel Hill
TDWG 2016 – Biodiversity Information Standards
December 09, 2016 – Instituto Tecnológico de Costa Rica (#TDWG16)
@ http://www.slideshare.net/taxonbytes/franz-sterner-tdwg-2016-new-power-balance-needed-for-trustworthy-biodiversity-data
2. Largely derived from doi:10.3897/rio.2.e10610
91dd0ee1-8a37-4efc-85b7-8176874cf5be
3. Premise: We agree that there are significant data quality issues
91dd0ee1-8a37-4efc-85b7-8176874cf5be
Aggregated Australian millipede data 'taken to the cleaners'
4. Premise: We agree that there are significant data quality issues
91dd0ee1-8a37-4efc-85b7-8176874cf5be
Aggregated Australian millipede data 'taken to the cleaners'
Aggregators respond to the charges
5. Premise: We agree that there are significant data quality issues
91dd0ee1-8a37-4efc-85b7-8176874cf5be
Aggregated Australian millipede data 'taken to the cleaners'
Aggregators respond to the charges
But this leaves open the question(s):
Who (exactly) is responsible for
how much of each particular issue?
6. We seem to disagree on the question of responsibility assignment(s)
91dd0ee1-8a37-4efc-85b7-8176874cf5be
Source: Belbin et al. 2013. A specialist's audit […]: An 'aggregator's' perspective. doi:10.3897/zookeys.305.5438
Page 73
7. Often enough, aggregators respond by:
• Acknowledging the general issues and their relevance.
• Pointing to many issues that effectively reside "with the sources".
• Calling for more collaboration across all levels; as well as new tools and
annotation options that "motivate and empower" the research community.
91dd0ee1-8a37-4efc-85b7-8176874cf5be
Source: Belbin et al. 2013. A specialist's audit […]: An 'aggregator's' perspective. doi:10.3897/zookeys.305.5438
Page 74
8. Thesis: For taxonomy integration, this both wrong and self-defeating
91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
9. 91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
• By design, these "backbones" are rarely attributable to individual (expert)
authors, but instead are newly created systematic theories that only appear
at the system level.
Thesis: For taxonomy integration, this both wrong and self-defeating
10. 91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
• By design, these "backbones" are rarely attributable to individual (expert)
authors, but instead are newly created systematic theories that only appear
at the system level.
• Data are aggregated accordingly; yet backbone-driven modifications may
newly disrupt the original integrity of submitted data packages.
Thesis: For taxonomy integration, this both wrong and self-defeating
11. 91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
• By design, these "backbones" are rarely attributable to individual (expert)
authors, but instead are newly created systematic theories that only appear
at the system level.
• Data are aggregated accordingly; yet backbone-driven modifications may
newly disrupt the original integrity of submitted data packages.
• By deflecting on responsibilities, aggregators may cause additional self-harm.
Ultimately, the power balance – as presently built in – must shift to bring
experts back into the process of licensing succinct, trustworthy data packages.
Thesis: For taxonomy integration, this both wrong and self-defeating
13. Taxonomic views of a frequently revised organismal lineage
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")
14. Snapshot of a more frequently revised organismal lineage
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")
• Vertical sections identify taxonomic concept regions
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
15. Snapshot of a more frequently revised organismal lineage
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")
• Vertical sections identify taxonomic concept regions
• Colors identify lineages of taxonomic names (epithets) in use
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
16. Snapshot of a more frequently revised organismal lineage
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids)
• Vertical sections identify taxonomic concept regions
• Colors identify lineages of taxonomic names (epithets) in use
• There is no consensus! Five incongruent schemata are used concurrently
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
17. Further diagnosis:
If incongruent taxonomies are endorsed
– locally, provisionally, and democratically –
then what is the impact for
aggregated biodiversity data?
18. Further diagnosis:
Taxonomy becomes a variable
that we need to represent,
and thereby control for
(at the system level)
19. The 'consensus'
• Query: "Where do these orchid
species occur?"
• Same set of 250 orchid specimens,
according to 4 taxonomies.
"Controllingthetaxonomicvariable" Example: the Cleistes use case
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
20. The 'consensus' The 'bible'
"Controllingthetaxonomicvariable"
• Query: "Where do these orchid
species occur?"
• Same set of 250 orchid specimens,
according to 4 taxonomies.
Example: the Cleistes use case
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
21. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
"Controllingthetaxonomicvariable"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
22. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
23. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
Expert views
are in conflict
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
24. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
Expert views
are in conflict
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
25. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
Impact:
Name-based aggregation has created
a novel synthesis that nobody believes in
"Controllingthetaxonomicvariable"
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
26. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
"Just
bad"
Expert views
are in conflict
Solution:
Instead of aggregating
an artificial 'consensus',
…
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
27. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
"Just
bad"
Expert views
are reconciled
Solution:
Instead of aggregating
an artificial 'consensus',
build translation services
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
28. Challenges:
How can we redesign aggregation to yield
high-quality biodiversity data packages?
29. Challenges:
How can we redesign aggregation to yield
high-quality biodiversity data packages?
What does this mean for Darwin Core1
and how we use this aggregation standard?
1 Wieczorek et al. 2012. Darwin Core: an evolving […]. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715
30. Preview of solution with eight steps
• DwC is insufficient, and part of the problem
31. # 1: Represent only taxonomic concept labels (TCLs) 1
• Syntax (TCL): taxonomic name [author, year, page] sec. source
1 Multi-taxonomy input/alignment visualizations generated with Euler/X toolkit: https://github.com/EulerProject/EulerX
Cleistes divaricata
sec. Gregg & Catling 1993
Pogonia
sec. Brown & Wunderlin 1997
32. # 1: DwC score keeping TCLs are optional; < 1% realized?
• TCL ~ DwC: nameAccordingTo
• SCAN: 19,722 of nearly 9 million records have TCLs (0.2%)
• Lack of enforcement to use TCLs makes standard less big data-ready
"Who authors GBIF's Backbone?"
https://storify.com/taxonbytes/who-authors-gbif-s-backbone
33. # 2: Represent each source coherently (Parent-Child relationships)
• Syntax (PC): TCL1 is a child/parent of TCL2 [where TCL1/2 = same source]
Cleistesiopsis bifaria sec. Pans. & de Barr. 2008
is a child of
Cleistesiopsis sec. Pans. & de Barr. 2008
34. # 2: DwC score keeping Not (adequately) represented
• PC ~ DwC: genus, family, order (etc.; higherClassification)
• However, higher-level names in DwC are not modeled as TCLs
• Taxonomic coherence of sources cannot be preserved with DwC alone
DwC record with higherClassification
(BDJ)
35. # 3: Do not force a single hierarchy onto all tip-level TCLs
• Syntax (PC): Tip-level TCL1 , TCL2 , etc. [where TCL1/2 = different sources]
36. # 3: DwC score keeping Optional Not (ever?) practiced
• No PC ~ DwC: infra-/specificEpithet only
• Typically, a single, 'unitary' higher-level classification is represented
• Combinations of algorithmic and social practices achieve the single hierarchy
"Who authors GBIF's Backbone?"
https://storify.com/taxonbytes/who-authors-gbif-s-backbone
37. # 4: Link TCLs via expert-provided RCC–5 articulations
• Syntax (RCC–5): TCL1 {==, >, <, ><, !} TCL2 [where TCL1/2 = diff. sources]
• RCC–5 = Region Connection Calculus
• 14 articulations provided by: http://tinyurl.com/Weakley-Flora-2015
Cleistes bifaria "Coastal Populations" sec. Smith et al. 2004
== (is congruent with)
Cleistesiopsis oricamporum sec. Brown & Pans. 2009
==
38. Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (semantics: set constraints)
== < > >< !
• Two regions N, M are either:
• congruent (N == M)
• properly inclusive (N < M)
• inversely properly inclusive (N > M)
• overlapping (N >< M)
• exclusive of each other (N ! M)
• RCC–5 articulations answer the query: "can we join regions N and M?"
• Taxonomies have multiple RCC–5 alignable components: nodes (parents,
children), node-associated traits, even node-anchoring specimens
40. Oscillating meanings of the epithet hyalites – 1911 to 2003
Phenotypicdiversity
Type-anchorednameidentityrelations
Source: Vane-Wright. 2003. Indifferent philosophy versus […]. Syst. Biodiv. 1: 3–11. doi:10.1017/S1477200003001063
41. # 5: Identify occurrence records only to TCLs
Records:
EKY39235
MTSU003611
NCSC00040204
…
Records:
BOON8098
CLEMS0061133
WILLI39399
…
Records:
GMUF-0039355
IBE006808
USCH58399
…
Records:
CONV0006268
MDKY00006482
NCU00038930
…
Records:
BRYV0023582, BRYV0023584
KHD00032030, MISS0016604
MMNS000227, NCSC00040206
USMS_000002923, USMS_000002924
VSC0053223, VSC0065528
…
Records:
ARIZ393087
DBG39049
USCH51217
…
Records:
NCU00040710
USCH96248
VSC0053218
…
Records:
CLEMS0012881
FUGR0003293
GA023130
…
Records:
BOON8100
NCSC00040210
SJNM45487
…
Records:
GA023144
LSU00012494
MISS0016608
…
Records:
IBE006810, IND-0012374, MMNS000227
Records:
NY8654
• Syntax (ID): Occurrence / organism is identified to TCL
"CLEMS0012881"
is identified to
Cleistes divaricata sec. Smith et al. 2004
[additional ID metadata]
42. DwC record with Identification metadata
(BDJ)
# 5: DwC score keeping ID metadata optional; > 50% realized
• ID ~ DwC: Identification, (date)identified(By), identificationReference
• SCAN: 4,715,277 of nearly 9 million records have ID metadata (52.5%)
• Enforcement…still also require use of TCLs
43. # 6: Generate comprehensive, consistent RCC–5 alignments
• Euler/X is a toolkit that infers logically consistent RCC–5 alignments
44. # 6: Generate comprehensive, consistent RCC–5 alignments
• Valued-added: MIR – set of Maximally Informative Relations containing
the RCC–5 articulation for every possible TCL pair scalability
Reasonerinference
46. The 'consensus' The 'bible'
The (formerly)
federal 'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
Impact:
"Please select your preference (A – D);
we can perform all translations"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
47. • We can now respond to queries such as:
• "Show all specimens identified to the taxonomic name Cleistes divaricata"
• Returns many records resolves incongruent lineage of name usages
# 8: "Do you trust us now?" Aggregation as a translational service
48. • We can now respond to queries such as:
• "Show all specimens identified to the taxonomic name Cleistes divaricata"
• Returns many records resolves incongruent lineage of name usages
• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"
• Returns record subset resolving only one narrowly circumscribed concept
# 8: "Do you trust us now?" Aggregation as a translational service
49. # 8: "Do you trust us now?" Aggregation as a translational service
• We can now respond to queries such as:
• "Show all specimens identified to the taxonomic name Cleistes divaricata"
• Returns many records resolves incongruent lineage of name usages
• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"
• Returns record subset resolving only one narrowly circumscribed concept
• "Now show specimens identified to the TCL Cleistes divaricata sec. RAB 1968,
yet translated into the more granular TCLs sec. Weakley 2015"
• Returns (again) many records, yet represents and contrasts two treatments,
as opposed to providing the ambiguous lineage view (above)
• "Show all specimens with ambiguous 2010/2015 TCL identifications…" (etc.)
50. Conclusion – designing trusted biodiversity data services
• The Darwin Core standard for aggregating biodiversity data:
(1) Has under-utilized options for better representing taxonomic expertise
(2) Is part of a design paradigm that undermines the plurality of expertise
51. • The Darwin Core standard for aggregating biodiversity data:
(1) Has under-utilized options for better representing taxonomic expertise
(2) Is part of a design paradigm that undermines the plurality of expertise
• Solutions are in development that realize data aggregation via translational
services – not as disenfranchising "backbones" – and without disrupting the
formation of expert-licensed, high-quality biodiversity data packages
Conclusion – designing trusted biodiversity data services
52. • The Darwin Core standard for aggregating biodiversity data:
(1) Has under-utilized options for better representing taxonomic expertise
(2) Is part of a design paradigm that undermines the plurality of expertise
• Solutions are in development that realize data aggregation via translational
services – not as disenfranchising "backbones" – and without disrupting the
formation of expert-licensed, high-quality biodiversity data packages
• All of us – not just aggregators – "own" the responsibility of designing
systems where the plurality of taxonomic expertise is fairly accommodated
Conclusion – designing trusted biodiversity data services
53. Acknowledgments & links to products
• Cleistes use case: Alan Weakley (UNC)
• Euler/X toolkit: Shizhuo Yu (UC Davis)
• Other data issues, discussions: Andrew Johnston, Guanyang Zhang
• NSF DEB–1155984, DBI–1342595 (PI Franz)
• NSF IIS–118088, DBI–1147273 (PI Ludäscher)
• Euler/X code @ https://github.com/EulerProject/EulerX
• Franz et al. 2016. Two influential primate classifications logically aligned.
Systematic Biology 65(4): 561–582. Link
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The simple semantics of RCC-5 makes this a rather generic vocabulary for representing advancement in phylogenetic knowledge. At the same time, the onus is on the phylogeneticists to apply the articulations in auch ways that the desired query services are actually obtained.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.
The more one looks, the more complicated it gets. Notice also the node labeling, or lack thereof.