"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Tdwg 1-remsen
1. Taxonomic Databases Working Group Annual Meeting 2011 GBIF: The challenges of intra- and inter-operability at large scales David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF) TDWG 2011
2.
3.
4.
5. 2007 Today 70 million 2010 2008 2009 147 million 180 million 201 million 302 million Growth
6. 2007 Today 70 million 2010 2008 2009 147 million 180 million 201 million 302 million Growth Need for a new standard identified
17. Empower Users through interoperability Enable scientific research that has never before been possible
18. Change in suitability for cultivating common bean across the world, from present to 2020, showing a global loss in suitability, especially in Africa. Using biodiversity data: Ecological Niche Modeling
24. BIODIVERCITIES TECHNICAL REFERENCE GROUP Advocacy SERVICES ENTRY POINTS FOR CITIES BiodiverCities BiodiverCities Advisory Committee: High-level coordination group On invitation: outstanding cities and selected organisations. GLOBAL PARTNERSHIP Technical Support Policy Consultation Cities in Biodiversity Hotspots URBIS & more to come Profiling Tools & Resources LAB Guidebook TEEB Report & Manual Guidelines & Case Studies LBSAPs Durban Commitment and more... www.iclei.org/biodiversity LAB Pioneer Biodiversity & Climate Change LAB Pioneer Biodiversity & CEPA LAB Pioneer The goal of the BiodiverCities Programme is to guide, support, capacitate and motivate local governments and their partners to integrate biodiversity and ecosystem-based planning into all aspects of policy, decision making and implementation activities to result in enhanced biodiversity conservation and more sustainable local economic development. Acknowledgement of accountability and responsibility for the health and well-being of communities and recognition of biodiversity and essential ecosystem services as the foundation of our existence are core components of the goal. BiodiverCities Programme
This presentation provides an overview of the GBIF network in the context of improvements in intra-operability – the integration of data within the network – a recent focus on sustainability and use of the GBIF network and the bridging gaps in local,national and international uptake of the network.
The challenge of intra-operability within GBIF has its roots in our broad mission statement and the federated nature of the GBIF network. For those not familiar with GBIF, our mission is to facilitate free and open access to primary biodiversity data worldwide – that is data pertaining to the occurrence of a species in nature at specific place and time. To this end, the GBIF network provides access to over 300 million records originating in nearly 9300 different databases.
Achieving integration and intra-operability of this federated dataset is required to effectively assess and deliver these data for scientific use. Interoperability is facilitated through the use of a limited set of standard data formats and protocols. The most recent addition to these is the Darwin Core Archive format, a data export format that follows the Darwin Core text guidelines and utilises simply HTTP as the transfer protocol. This format has been integrated into a suite of data publishing solutions that simplify the data publication process. In addition, Darwin Core itself has been expanded to enable the publication of taxonomic data through the same text guidelines, enabling a new data type to be mobilised through GBIF that directly impacts taxonomic interoperability.
This graph shows the growth of the GBIF occurrence index since 2007.
We realised we were hitting practical limits of indexing latency and scale in 2008 and deployed DarwinCore Archive in 2009. Uptake of this format has been primarily responsible for a 50% increase in mobilised data in the past year.
This outreach extends to a new suite of data publishing guides and tools that provide details on data formats, checklist metadata, and checklist publishing tools.
We have improved geo-referencing processes that enable us to better match records to their intended country of origin. This shows raw data originating in the United States.
This is how the data looks like after improved interpretation. We can now recognise international waters and offshore islands.
Without access to sufficient authoritative taxonomic data, we have been forced to rely on less-accurate classification data originating in occurrence datasets. These datasets often contain errors such as illustrated here where a European bird species was mistakenly placed in the hummingbird family.
In 2011 the number of taxonomic authority files published through the network has doubled thanks to promotional efforts within the GBIF network and partnerships that include other taxonomic initiatives.
With access to a wider array of authoritative taxonomic sources, we are able to match more taxa and improve the taxonomic backbone used to organise all species data records.
This improved taxonomic reconciliation extends to the resolution of homonyms – names for different taxa that are spelled alike. Relying solely on taxonomic information within occurrence data sources provides a confusing array of possible homonyms. Relying on taxonomic authority files reveals there are exactly two genera with this name and includes a common name to help distinguish them.
GBIFs strategy on ensuring sustainability has been on an increased focus on demonstrating scientific utility of the data mobilised through the GBIF network. We now have a work programme dedicated to assessing and reporting on this use.
One output of this activity has been a documented increase in the scientific use of data. Last year 152 papers cited the use of GBIF-mediated data and this year we anticipate that number to be even higher.
We have also been proactive in communicating exactly how these data contribute to scientific processes through our communications portal.
GBIFs future sustainability and relevance as a scientific support network are dependent on its being able to enable scientific research that would otherwise not be possible.
Using GBIF data in ecological niche modelling is one of the most common uses. Species occurrence data is geo-spatially integrated with additional data types such as climatic data to create an ecological profile for the species. In the example illustrated here, the model outputs project changes in distribution of a crop species based on possible climate change scenarios.
In this example, occurrence data from the GBIF network has been geospatially joined with world protected area boundaries to generate provisional species lists and data distribution summaries for the protected area.
Occurrence data has been combined with IUCN species range maps both to validate the distribution and identify potential gaps in coverage.
The Wallace Initiative provides a site where interested parties can access biodiversity and climate data both current and predicted with respect to various global climate models
Researchers at Lancaster University have utilised GBIF data mining tools and occurrence index to extract over 65,000 species names from the US and Worlds Patent indices and determine the distribution of these species among the worlds nations in order to inform Access and Benefit Sharing processes demanded by developing countries.
At the local level, GBIF is partnering with ICLEI – Local Governments for Sustainability to provide access to biodiversity data that falls within urban areas.
Primary biodiversity data was integrated urban polygon data for over 40 major metropolitan areas.
The occurrence data was integrated with urban geospatial data and and taxonomic data was extracted to provide a provisional species list for each city.
The GBIF network supports the development of enriched national data portals. BISON is an effort within the US to develop a national data portal. It starts by undertaking advanced geospatial processing of GBIF data originating within the United States to provide county-level resolution. It utilises OpenLayers to enable integration with other nationally relevant thematic layers such as climate, soil and demographic data.
At the international level, GBIF is building bridges to networks such as the International Association of Impact Assessments that offer new sources of biodiversity data collected through the environmental impact assessment process. This includes the provisions of data publishing tools and best practices that insert the GBIF network into the EIA documentation process.