Presentation on the Darwin Core germplasm extension for the "1st International e-Conference on Germplasm Data Interoperability: Session 2", 11th December 2013 (https://sites.google.com/site/germplasminteroperability/). Publishing germplasm information on plant genetic resources and their traits using the Darwin Core standard and the germplasm extension for genebanks.
2. Why
did
we
make
a
germplasm
extension
for
Darwin
Core?
à Upgrade
germplasm
data
pathways
to
use
web
services
The
objecNve
was
to
enable
sharing
of
germplasm
informaNon
using
the
standard
web-‐service
based
biodiversity
data
publishing
toolkits
maintained
by
the
Global
Biodiversity
InformaNon
Facility
(GBIF)
and
the
Biodiversity
InformaNon
Standards
(TDWG).
à Upgrade
data
types
to
include
trait
data
The
objecNve
was
to
expand
on
the
germplasm
data
types
published
to
germplasm
data
portal
from
basic
passport
data
to
include
in
parNcular
crop
trait
informaNon.
2
3. PotenNal
of
the
GBIF
technology
2,106,765
records
of
germplasm
data
(status
2013)
hTp://data.gbif.org/datasets/network/2
hTp://www.gbif.org/network/ae3a42e4-‐5829-‐4210-‐8d8a-‐84b0cbda47bc
Using
GBIF/TDWG
technology
(and
contribuNng
to
its
development),
the
PGR
community
can
more
easily
establish
specific
PGR
networks
without
duplicaNng
GBIF's
work.
The
compaNbility
of
data
standards
between
PGR
and
biodiversity
collecNons
made
it
possible
to
integrate
the
worldwide
germplasm
collecNons
into
the
biodiversity
community
(TDWG,
GBIF).
3
4. MulNple-‐purpose
data
export
services
European
Crop
Databases
European
EURISCO
Catalog
Genebank
dataset
GBIF
Global Crop
Registries
4
5. genesys-‐pgr.org
2,348,549
records
of
germplasm
accessions
The
GENESYS
gateway
to
geneNc
resources
provides
access
to
informaNon
on
more
than
2.3
million
genebank
accessions,
hTp://www.genesys-‐pgr.org/
6. 1,074,136
records
of
germplasm
accessions
The
European
GeneNc
Resources
Search
Catalogue
(EURISCO)
receives
data
from
the
NaNonal
Inventories
(NI)
and
provides
access
to
all
ex
situ
PGR
accessions
in
Europe,
hTp://eurisco.ecpgr.org
6
7. (10
databases)
(8
databases)
(10
databases)
(6
databases)
(8
databases)
(22
databases)
A
total
of
64
ECPGR
Central
Crop
Databases
have
been
established
by
individual
insNtutes
and
the
ECPGR
Working
Groups.
The
databases
hold
passport
data
and,
to
varying
degrees,
characterizaNon
and
primary
evaluaNon
data
of
the
major
collecNons
of
the
respecNve
crops
in
Europe,
hTp://www.ecpgr.cgiar.org/germplasm_databases/central_crop_databases.html
7
8. Possible
Upgraded
PGR
Network
Model
v
v
The National Inventory (NI)
endorse all national gene
banks for EURISCO.
v
ECPGR Crop databases can
access passport data from
EURISCO and additional crop
specific data from the gene
bank IPT interface.
v
IllustraNon
from
the
GBIF
annual
report
2009,
page
47.
Each dataset is shared from
the holding gene bank.
Standard data sharing tools
ensure that the genebank
dataset is available to other
relevant decentralized
thematic, regional or global
networks.
8
14. Mapping
of
MCPD
à
ABCD
v2.06
was
required
before
using
BioCASE
National Inventory Code
Institute Code
Accession Number
Collecting Number
Collecting Institute Code
Genus
Species
Species Authority
„Subtaxa“
„Subtaxa“ Authority
Common Crop Name
Accession Name
Acquisition Date
Country of Origin
Location of Collection Site
Latitude of CS
Longitude of CS
Elevation of CS
Collecting Date of Sample
Breeding Institute Code
Biological Status of Accession
Ancestral Data
Collecting/Acquisition Source
Donor Institute Code
Donor Accession Number
Other Identification (Number) associated
with the accession
Location of Safety Duplicates
Type of Germplasm Storage
Remarks
Decoded Collecting Institute
Decoded Breeding Institute
Decoded Donor Institute
Decoded Safety Duplication Location
Accession URL
Highlight in green good match, orange acceptable match, red no match
(was included as PGR extension in ABCD v2.06).
Helmut
Knüpffer
IPK
Gatersleben
Walter
Berendsohn
BGBM,
Berlin
Berendsohn,
W.
and
H.
Knüpffer
(2004
-‐
2006).
Dral
mapping
of
Eurisco
descriptors
to
ABCD
2.06.
Available
at
hTp://www.bgbm.org/tdwg/codata/Schema/Mappings/EURISCO-‐2-‐ABCD.pdf
14
15. 2005
:
BioCASE
demo
Genebank/germplasm
extension
to
the
ABCD
v2.06
15
17. Mapping
of
MCPD
à
Darwin
Core
was
required
before
using
the
GBIF
IPT
The
Darwin
Core
germplasm
extension
was
required
for
meaningful
descripNon
of
germplasm
data
sets
using
Darwin
Core
and
the
GBIF
IPT.
A
mapping
of
MCPD
terms
to
Darwin
Core.
Plus
some
addiGonal
terms
to
describe
germplasm:
•
breeding/culNvaNon
event
(source:
MCPD),
•
crop
trait
experiments
(source:
EPGRIS3/ECPGR),
•
and
internaNonal
crop
treaty
regulaNons.
The
first
DRAFT
version
was
released
in
August
2009.
17
19. Darwin
Core
“The
Darwin
Core
is
primarily
based
on
taxa,
their
occurrence
in
nature
as
documented
by
observa;ons,
specimens,
and
samples,
and
related
informa;on.”
•
a
well-‐defined
standard
core
vocabulary
•
a
flexible
framework
to
maximize
re-‐usability
•
approved
as
TDWG
standard
2009
hTp://rs.tdwg.org/dwc/
Wieczorek
J.,
D.
Bloom,
R.
Guralnick,
S.
Blum,
M.
Döring,
R.
Giovanni,
T.
Robertson,
D.
Vieglais
(2012).
Darwin
Core:
An
Evolving
Community-‐
Developed
Biodiversity
Data
Standard.
PLoS
ONE
7(1):
e29715.
doi:10.1371/journal.pone.0029715
19
20. Darwin
Core
star
schema
Can relate elements
one-to-one or
one-to-many.
1:many
1:many
1:many
1:many
1:1
Germplasm
Breeder
Trait
Audubon
core
20
21. Darwin
Core
Archive
(DwC-‐A)
v
DwC-A publish Darwin Core records including extensions
Simple text based format
v
Zipped single file archive
v
Germplasm.txt
21
22. Darwin
Core
extension
for
genebanks
The
Darwin
Core
extension
for
genebanks
is
an
extension
to
the
Darwin
Core
standard.
Provides
a
mapping
of
MCPD
terms
and
Darwin
Core
terms.
And
it
includes
addiNonal
terms
required
for
describing
germplasm
resources
that
were
missing
in
Darwin
Core.
• Endresen,
D.,
S.
Gaiji,
and
T.
Robertson
(2009).
Darwin
Core
Germplasm
extension
and
deployment
in
the
GBIF
infrastructure.
Proceedings
of
TDWG
2009,
Montpellier,
France.
Bioversity
InformaNon
Standards
(TDWG).
• Endresen,
D.T.F.
and
H.
Knüpffer
(2012).
The
Darwin
Core
extension
for
genebanks
opens
up
new
opportuniNes
for
sharing
genebank
data
sets.
Biodiversity
InformaNcs
8:11-‐29.
22
23. Darwin
Core
extension
for
genebanks
Namespace (SKOS/RDF) (stable version)
hTp://purl.org/germplasm/germplasmTerm#
Code repository (stable version)
hTp://code.google.com/p/darwincore-‐germplasm
Community discussion (development version)
hTp://terms.tdwg.org/wiki/Germplasm
23
37. Some
proposed
addiNons
In situ conservation (proposed)
IUCNCategory, numberOfSeeds, bioRegion, inSituCountry,
inSituRecoveryDateStarted, inSituRecoveryInstitute,
inSituRecoveryRemarks
Germplasm distribution
Perhaps add new terms to facilitate the reporting of germplasm
distribution and standards material transfer (SMTA) agreements for the
International Treaty for Genetic Resources for Food and Agriculture
(ITPGRFA).
Germplasm management
The Millennium Seed Bank (Kew) contributed feedback to the DwC-G
modeling and proposed to include terminology for seed management.
• Seed processing terms
• Seed cleaning
• Seed germination testing
37
41. Work-‐flow
for
Vocabulary
management
1.
Mint
and
maintain
concepts
and
terms,
in
domain-‐expert
working
groups.
2.
Release
final
version
as
a
Concept
Vocabulary.
3.
Publish
at
the
GBIF
Resources
Repository.
REUSE
terms
from
published
concept
vocabularies
and
ontologies
when
designing
new
applicaNon
schema
such
as
DwC-‐A
controlled
term
and
value
vocabularies.
2
Concept
Vocabulary
(rdf,
skos)
Term
Wiki
For
vocabulary
development
1
3
Resources
Repository
hTp://terms.tdwg.org/wiki/
hTp://rs.gbif.org/terms/
41
43. Example:
master
SKOS/RDF
resource
en
es
zh
ja
[
[
[
[
hTp://rs.gbif.org/terms/dwc/dwc_translaNons.rdf
43
44. Vocabularies/ontologies
• Provide
a
shared
understanding
of
what
we
mean
when
describing
biodiversity
enNNes.
• What
kind
of
thing
or
property.
• A
list
of
things
we
as
a
community
can
agree
upon
the
meaning
of.
• “Concept
repository”
with
terms
idenNfied
by
URIs.
TDWG
Technical
Roadmap
2008
(convened
by
Roger
Hyam).
Photo
CC-‐by-‐3.0
by
Hannes
Grobe/
AWI.
Palaeoclimate
archives.
44
46. “Things
can
happen
in
a
band,
or
any
type
of
collabora;on,
that
would
not
otherwise
happen”
(Jim
Coleman,
Jazz-‐musician).
GBIF, Global Biodiversity Information Facility
http://www.gbif.org
TDWG, Biodiversity Information Standards
http://www.tdwg.org
BioCASE, The Biological Collection Access Service
for Europe
http://www.biocase.org
Bioversity International
http://www.bioversityinternational.org
NordGen, The Nordic Genetic Resources Center
http://www.nordgen.org
46