Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.
4. What
is
an
iden.fier:
“Each
iden3fier
refers
to
one
and
only
one
thing”
(Coyle
2006).
“An
associa(on
between
a
string
and
a
thing”
(Kunze
2003).
“A
stated
associa(on
between
a
symbol
and
a
thing;
that
the
symbol
may
be
used
to
unambiguously
refer
to
the
thing
within
a
given
context”
(Campbell
2007).
4
6. When
is
the
iden.fier
“good
enough”?
Unique
and
persistent
-‐
within
a
given
context.
“The
common
experience
is
that
an
iden3fier
is
created
within
a
system
or
within
a
context,
and
that
at
a
later
date
it
needs
to
be
used
in
another
or
larger
context”
(Coyle
2006).
Expanding
context:
• Within
one
museum
collec+on
(catalog
number).
• Within
a
network
between
museum
collec+ons
(collec+on
code
+
catalogue
number).
• Within
biodiversity
informa.on
network
(ins+tu+on
code
+
collec+on/dataset
code
+
catalogue
number).
• At
the
Internet
(e.g.
hbp
URI,
DOI,
LSID,
etc…)
• …
larger
contexts
are
possible
to
imagine
in
the
future!!
6
8. Iden+fy
the
thing
that
you
care
about
• The
specimen
itself
(the
physical
en+ty)
• Image
of
the
specimen
• Descrip+on
of
the
specimen
• Loca+on
where
the
specimen
was
captured
• The
occurrence
event
when
the
specimen
was
captured
• …
8
10. Term
name:
occurrenceID
Iden+fier:
hbp://rs.tdwg.org/dwc/terms/occurrenceID
Class:
hbp://rs.tdwg.org/dwc/terms/Occurrence
Defini+on:
An
iden+fier
for
the
Occurrence
(as
opposed
to
a
par+cular
digital
record
of
the
occurrence).
In
the
absence
of
a
persistent
global
unique
iden.fier,
construct
one
from
a
combina+on
of
iden+fiers
in
the
record
that
will
most
closely
make
the
occurrenceID
globally
unique.
Comment:
For
a
specimen
in
the
absence
of
a
bona
fide
global
unique
iden+fier,
for
example,
use
the
form:
"urn:catalog:[ins.tu.onCode]:
[collec.onCode]:[catalogNumber]".
Examples:
"urn:lsid:nhm.ku.edu:Herps:32",
"urn:catalog:FMNH:Mammal:145732".
For
discussion
see
hbp://code.google.com/p/darwincore/wiki/
Occurrence
10
11. Iden.fiers
for
museum
collec.ons
The
longevity
of
museums
lead
to:
“The
need
to
use
iden(fiers
from
our
past
in
the
current
highly-‐
networked
digital
systems”
(Coyle
2006
[talking
about
libraries]).
Specify
a
namespace
for
the
iden+fiers?
• URI
–
uniform
resource
iden+fier
(unique
in
the
context
of
the
web).
• URN
–
uniform
resource
name
(name
not
+ed
to
loca+on).
• URL
–
uniform
resource
locator
(network
loca+on
as
iden+fier).
• PURL
–
persistent
URL
(commitment
to
service
longevity).
Something
else…?
• DOI
–
digital
object
iden+fier
• ARK
–
archival
resource
key
• UUID
–
universal
unique
iden+fier
11
15. • Globally
unique
• Scalability,
number
of
IDs
• Community
acceptance
• Long-‐term
life-‐cycle
• Resolvable,
resolu+on
service(s)
• Cost
per
iden+fier
• People-‐friendly
or
machine-‐friendly
• Solu+on
for
the
genera+on
of
new
IDs
– Central
genera+on,
PID
issuer
– Distributed
genera.on
at
source
15
16. • A
UUID
is
a
16-‐octet
(128-‐bit)
36-‐chars
number.
• Example:
C37E3F9B-‐BCAF-‐4479-‐8EB7-‐3346A2DB2373
• The
probability
of
one
duplicate
would
be
about
50%
if
every
person
on
earth
create
600
million
UUIDs.
• Allows
for
easy
genera.on
at
source
in
a
distributed
network.
16
17. Iden+fier
Resolver
Specimen
Loca+on
The
resolver
is
a
system
to
resolve
loca+ons
from
iden+fiers,
enabling
retrieval
even
when
the
loca+on
changes.
17
18. PURL
technology
provides
a
robust
resolu+on
service
ready
for
the
future
-‐
and
a
stable
solu+on
that
is
working
well
right
now.
PURL
for
the
NHM-‐resolver:
hbp://purl.org/nhmuio/id/[PID]
The
NHM-‐PURL
redirects
here:
hbp://gbif.no/resolver/[PID]
Could
with
few
modifica+ons
redirect
e.g.
here:
hCp://gbif.org/resolver/[PID]
18
24. • Quick
Response
Code
(QR
code).
• A
type
of
matrix
barcode
(or
two-‐
dimensional
code).
• Popular
due
to
its
fast
readability
and
large
storage
capacity.
• The
use
of
QR
Codes
is
free
of
any
license.
• The
QR
Code
is
clearly
defined
and
published
as
an
ISO
standard.
• Invented
in
Japan
by
the
Toyota
subsidiary
Denso
Wave
in
1994.
24
26. UUID
QR
codes
for
museum
objects
at
NHM-‐UiO
provides:
• Machine-‐readable
iden.fiers
(using
a
simple
smart
phone
-‐
or
a
barcode
reader)
• Allows
for
new
and
efficient
workflows
for
collec+on
management.
• Deployment
for
stable
iden.fiers
appropriate
for
data-‐basing.
26
29. • Peer
review
op+on
for
biodiversity
data
sets.
• Authors
get
scien+fic
credit
for
data
publica+on.
• Mee+ng
concerns
over
data
quality.
• Mee+ng
concerns
over
data
cita.on
mechanism.
• Towards
à
Each
data
set
published
through
GBIF
accompanied
by
a
data
paper…?
29
30. Why
publish
your
data
• Citable
publica+on
• Establish
scien+fic
priority
• Increase
collabora+on
• Link
data
to
bigger
network
• Re-‐use
and
mul+ply
effect
• Respond
to
funding
requirements
hbp://biodiversitydatajournal.com/
Smith
V,
Georgiev
T,
Stoev
P,
Biserkov
J,
Miller
J,
Livermore
L,
Baker
E,
Mietchen
D,
Couvreur
T,
Mueller
G,
Dikow
T,
Helgen
K,
Frank
J,
Agos+
D,
Roberts
D,
Penev
L
(2013)
Beyond
dead
trees:
integra+ng
the
scien+fic
process
in
the
Biodiversity
Data
Journal.
Biodiversity
Data
Journal
1:
e995.
DOI:
10.3897/BDJ.1.e995
30
32. Status
27.
August
2014
GBIF
enables
free
and
open
access
to
biodiversity
data
online.
We
are
an
interna+onal
government-‐ini+ated
and
funded
ini+a+ve
focused
on
making
biodiversity
data
available
to
all
and
anyone,
for
scien+fic
research,
conserva+on
and
sustainable
development.
32
33. GBIF
provides
a
data
discovery
system
that
is
dependent
on
resolvable
stable
iden3fiers
for
efficient
func3onality
global
registry
data
portal
33
35. Slide
1:
Image
source:
TU
GRAZ,
Austria,
hbp://campusonline.tugraz.at/organisa+on/campusonline.
Fair
use
ra+onale:
The
image
is
used
to
illustrate
the
principle
of
stable
and
persistent
iden+fiers
forming
the
glue
to
connect
data
objects.
Slide
3:
George:
George
Orwell,
George
Harrison,
George
Bush,
George
Bush
jr,
George
Soros,
George
Washington,
Boy
George,
George
(Seinfeld),
George
Lucas,
George
Clooney,
Prince
George
of
Cambridge,
King
George
III
of
England,
George
Armstrong
Custer,
Georges
Enescu,
Curious
George,
St
George
in
New
Brunswick,
George
Coleman,
George
Eliot.
Fair
use
ra+onale:
Images
of
people
and
places
named
George
from
an
Internet
search.
These
images
are
used
here
to
illustrate
the
weakness
of
using
a
human-‐friendly
iden+fier/name,
and
that
in
the
global
society
context,
many
people
and
places
are
named
George,
leading
to
a
name
ambiguity
problem.
We
will
not
know
which
George
it
is
referred
to.
Slide
5:
Photo:
Sancya/AP./
Published:
03/31/2009
3:58:00,
hbp://www.nydailynews.com/news/money/pile-‐unsold-‐cars-‐graveyards-‐gallery-‐1.45144
Fair
use
ra+onale:
The
image
is
used
to
illustrate
the
principle
of
uniqueness
of
iden+fiers
within
a
given
context
-‐
such
as
here
car
license
number
plates.
The
car
license
number
is
unlikely
to
be
globally
unique
in
a
larger
context
such
as
e.g.
the
Internet.
Slide
6:
Illustra+on
retrieved
from
hbp://www.hypnosisinmelbourne.com.au/index.php?p=49.
Fair
use
ra+onale:
The
image
is
used
to
illustrate
the
principle
of
expanding
context
that
stable
iden+fiers
can
be
subject
to.
An
iden+fier
used
in
a
par+cular
context,
such
as
the
Internet,
could
be
exposed
to
a
larger
context
at
a
later
future
+me.
Slide
7:
Fair
use
ra+onale:
The
image
is
of
unknown
source,
retrieved
from
an
Internet
search.
The
image
is
used
to
illustrate
the
principle
of
expanding
context
that
stable
iden+fiers
can
be
subject
to.
An
iden+fier
used
in
a
par+cular
context,
such
as
the
Internet,
could
be
exposed
to
a
larger
context
at
a
later
future
+me.
Slide
14:
Image:
This
is
Cape
Canaveral
(M.
Sasek,
1963),
hbp://blog.miroslavsasek.com/wp-‐content/uploads/2009/05/moon-‐birdwatchers-‐400.jpg
by
Miroslav
Šašek(1916-‐1980),
hbp://www.miroslavsasek.com/,
hbp://www.ilike.org.uk/2009/05/this_is_m_sasek.html.
Fair
use
ra+onale:
The
image
is
used
here
to
illustrate
the
principle
of
aiming
at
naming
an
observed
organism
re-‐using
common
exis+ng
persistent
iden+fiers.
Slide
23:
Photo:
J.Schulzki.
Fair
use
ra+onale:
The
image
is
used
to
illustrate
the
principle
of
machine-‐readable
labels.
The
handling
of
luggage
n
an
airport
context
(or
the
handling
of
parcels
and
lebers
in
a
postal
service
context)
could
serve
as
an
inspira+on
for
developing
robo+zed
handling
of
museum
specimens
-‐
if
these
specimens
are
given
machine-‐readable
labels.
Slide
34:
Image:
Gary
Larson,
The
Far
Side
Observer,
October
1987,
hbp://i227.photobucket.com/albums/dd202/tomcat600/gary-‐larson-‐oct-‐1987.gif.
Fair
use
ra+onale:
This
drawing
is
assumed
to
be
copyrighted
by
Gary
Larson
and
used
here
under
a
fair
use
claim.
The
image
is
used
to
illustrate
the
principle
of
naming
all
things
using
persistent
iden+fiers.
The
images
are
used
in
an
educa+onal
and
not-‐for-‐profit,
non-‐commercial
purpose.
35