Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Global Biodiversity Information Facility (GBIF) - 2012
1. Seksjonsmøte:
Seksjon for konservering og forskningsteknikk (CONSERV)
Global Biodiversity Information Facility
GBIF Norway
Dag Endresen and Christian Svindseth
GBIF Norway, NHM-UiO
Natural History Museum, University of Oslo (NHM-UiO)
Global Biodiversity Information Facility (GBIF)
7 November 2012
2. Topics
• What is GBIF?
• GBIF data portal
• Darwin Core (DwC), DwC archive
• Persistent identifiers (UUID)
• Data paper, citation of data sets
2
3. GBIF enables free and open access to
biodiversity data online.
We are an international government-initiated
and funded initiative focused on making
biodiversity data available to all and anyone,
for scientific research, conservation and
sustainable development.
Status data portal
October 2012
3
4. OECD
Global
Science
Forum
recommenda8on
(1999):
“[E]stablish
and
support
a
distributed
system
of
interlinked
and
interoperable
modules
(databases,
so7ware
and
networking
tools,
search
engines,
analy;cal
algorithms,
etc.)
that
together
will
form
a
Global
Biodiversity
Informa;on
Facility
(GBIF)”.
5. 1. Information infrastructure – an
Internet-based index of a globally
distributed network of
interoperable databases that
contain primary biodiversity data.
2. Community-developed tools,
standards and protocols – the
tools data providers need to format
and share their data.
3. Capacity-building and training –
and access to a global expert
community.
5
7. GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records.
GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.
8. GBIF portal: 16,064,074 records with coordinates from a total of 17,268,452 records.
GBIF Norway: 11,777,738 records are provided FROM Norwegian data publishers.
10. Slide
developed
by
Donald
Hobern,
2012
GBIF’s
unique
role
• Registry
of
biodiversity
data
resources
• Tools
and
support
for
biodiversity
data
publica8on
• Network
development
at
na8onal,
regional
and
global
levels
• Global
virtual
natural
history
collec8on
• Cross-‐domain
linkage
between
data
from
collec8ons,
ecology
and
genomics
• Access
to
biodiversity
data
for
GIS
analysis
and
environmental
monitoring
– Aggregated
presence
data
– Site-‐based
survey
data
(samples,
presence/absence)
10
11. Slide
developed
by
Donald
Hobern,
2012
Improving
fitness-‐for-‐use
Aggregate
• Progressive
improvement
– Data
indexes
• Centralised
discovery
Data
Indexes
• Standardisa8on
of
persistent
iden8fiers
• Consistent
metadata
– Data
quality
• Inconsistencies
within
records
• Valida8on
against
metadata
Data
Quality
• Outlier
detec8on
• Metrics
per
record
and
per
data
set
– Expert
cura8on
• Interface
with
taxon
expert
groups
• Incorporate
findings
of
data
users
Expert
• Need
efficient
researcher-‐friendly
tools
Cura6on
12. Slide
developed
by
Donald
Hobern,
2012
Organisa8onal
partnerships
• Some
poten8al
data
collabora8ons
– GBIF-‐mediated
occurrence
data
• Maps,
lists
of
countries
recorded
• Localise
content
in
EOL,
etc.
– BHL
literature
• User
annota8ons
to
extract
occurrence
records
• Link
original
(and
other)
descrip8ons
to
taxonomy
– EOL
species
informa8on
• Support
EOL
as
global
species
informa8on
aggregator
• Include
EOL
summary
box
on
each
GBIF
species
page
– Catalogue
of
Life
• IPT
to
publish
global
and
regional
species
databases
• GBIF
infrastructure
to
support
construc8on
of
CoL
12
13. Slide
developed
by
Donald
Hobern,
2012
Unifying
species
data
Ecological
Genomics
Monitoring
Darwin
Core
Integrated access for
records of the
occurrence of any
species:
• What?
Collec6ons
• When?
• Where?
• What evidence?
• Data owner?
• Link to full record
Presence only
Slide
developed
by
Donald
Hobern
14. Slide
developed
by
Donald
Hobern,
2012
Unifying
species
data
Ecological
Darwin
Core
Genomics
Monitoring
+
Core
Survey
Fields
Darwin
Core
Sample
Id
Integrated access for Method
Id
Fully compatible with
records of the Rela8ve
abundance
existing Darwin Core
...
occurrence of any data, plus:
species:
• Which species were
• What? recorded together?
Collec6ons
• Which sets of data are
• When?
• Where? directly comparable?
• What evidence? • Which species were
• Data owner? most abundant in each
• Link to full record sample?
Presence only Presence/absence
15. Darwin Core – a vocabulary of terms
Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and
Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard.
PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715
17. Seman8c
MediaWiki
a
forum
for
discussion
and
development
of
terminology.
http://terms.gbif.org/
17
18. Darwin Core Archive (DwC-A)
v DwC-A publish DwC records including terms
from DwC-A extensions.
v Simple text based format.
v Zipped single file archive.
Germplasm.txt
18
20. Controlled value vocabularies
• Country codes
• Language
• Basis of record
• Taxonomic rank
• Nomenclatural status
• Life form
• Life stage
• Geological time periods
• chronostratigraphy
• magnetostratigraphy
• Species interactions
• saproxylic interactions
• pollinators
• …
20
21. • Persistent identifiers (UUID, QR code)
• Data set metadata descriptions (data paper)
• Data rescue, scientific reports and student work
• Continue digitization efforts
• Biodiversity literature (BHL)
21
23. • Scalability,
number
of
IDs
• Community
acceptance
• Long-‐term
life-‐cycle
• Resolvable,
resolu8on
service(s)
• Cost
per
iden8fier
• People-‐friendly
or
machine-‐friendly
• Genera8on
of
IDs
– Central
genera8on,
PID
issuer
– Distributed
genera6on
at
source
23
24. • A
UUID
is
a
16-‐octet
(128-‐bit)
number.
• Example:
C37E3F9B-‐BCAF-‐4479-‐8EB7-‐3346A2DB2373
• The
probability
of
one
duplicate
would
be
about
50%
if
every
person
on
earth
owns
600
million
UUIDs.
• Allows
for
easy
genera6on
at
source
in
a
distributed
network.
24
25. • Quick
Response
Code
(QR
code).
• A
type
of
matrix
barcode
(or
two-‐
dimensional
code).
• Popular
due
to
its
fast
readability
and
large
storage
capacity.
• The
use
of
QR
Codes
is
free
of
any
license.
• The
QR
Code
is
clearly
defined
and
published
as
an
ISO
standard.
• Invented
in
Japan
by
the
Toyota
subsidiary
Denso
Wave
in
1994.
25
26. UUID: C37E3F9B-BCAF-4479-8EB7-3346A2DB2373
QR code for all museum objects at
NHM-UiO would provide:
• Machine-readable using an
ordinary smart phone (or PDA).
• Allows for new and efficient
workflows for collection
management.
• Deployment for stable identifiers
appropriate for data-basing.
26
27. • Peer
review
op8on
for
biodiversity
data.
• Authors
get
credit
for
data
publica8on.
• Mee8ng
concerns
over
data
quality.
• Mee8ng
concerns
over
data
cita6on
mechanism.
• Metadata
formats:
Ecological
Metadata
Language
(EML),
Dublin
Core,
Darwin
Core,
Natural
Collec8ons
Descrip8ons
(NCD)…
• Towards
à
Each
data
set
published
through
GBIF
accompanied
by
a
data
paper…?
27
28.
29. Data rescue activity:
Many species occurrence data are
“hidden” in reports and
documents produced by
universities, research institutes,
public agencies and the university
museums.
Collaboration project with Artsdatabanken
Photo by: Niklas Bildhauer
30. 270 years of literature
- since Carl Linnaeus and his Systema Naturae (1735)
And a potential source of biodiversity data
Biodiversity Heritage Library
a consortium of natural history and botanical libraries
http://www.biodiversitylibrary.org/
à BHL Norway…?
30
31. A
book
scanner
at
the
Internet
Archive
headquarters
in
San
Francisco,
California
Photo by: Dvortygirl
32. The Millennium Ecosystem Assessment showed that human actions
often lead to irreversible losses in the diversity of life, and these losses
have been more rapid in the past 50 years than ever before in human
history.
Biological diversity is key to resilience – the ability of natural and social
systems to adapt to change, and is essential for nearly every aspect of
human well-being.
Because human threats to biodiversity occur across large spatial and
temporal scales, biodiversity and ecosystem monitoring, forecasting,
and risk assessments require data to be organised in a globally-
accessible, integrated infrastructure.
GBIF’s Data Portal provides this infrastructure.
32
34. Furthermore, I
think that we
need persistent
identifiers!
Cato the Elder ended all his speeches in
the senate of Rome with: "Ceterum
autem censeo Carthaginem esse
delendam" (English: "Furthermore, I
think Carthage must be destroyed").
34