Introducing LucidWorks App for Splunk Enterprise webinar
Pangaea providing access to geoscientific data using apache lucene java
1. PANGAEA - Providing access to
geoscientific data using Apache
Lucene Java
Uwe Schindler
PANGAEA / SD DataSolutions GmbH, uschindler@pangaea.de
2. My Background
My main focus is on development of Lucene Java.
Implemented fast numerical search and maintaining the new
attribute-based text analysis API.
Studied Physics at the University of Erlangen-Nuremberg and
work as consultant and software architect for PANGAEA
(Publishing Network for Geoscientific & Environmental Data)
in Bremen, Germany, where I implemented the portal's geo-
spatial retrieval functions with Lucene Java.
Talks about Lucene at various international conferences like
ApacheCon EU/US, Lucene Eurocon, Berlin Buzzwords and
various local meetups.
I am committer and PMC member of Apache Lucene and Solr.
3. since 1993
Information system for earth system science data hosted by AWI &
MARUM
2001
Mandate of the International Council for Science (ICSU):
World Data Center for Marine Environmental Sciences (WDC-
MARE)
2007
Mandate of the World Meteorological Organisation (WMO):
World Radiation Monitoring Center (WRMC)
2010 (certification in progress)
Mandate of the World Meteorological Organisation (WMO):
Data Collection and Processing Center (DCPC)
About PANGAEA
4. Nuclear Radiation
Tokyo, Japan
WDC Co-ordination Offices
Washington DC, USA
Beijing, China
Meteorology
Asheville NC, USA
Beijing, China
Obninsk, Russia
Oceaography
Obninsk, Russia
Silver Spring MD, USA
Tianjin, China
Paleoclimatology
Boulder CO, USA
Marine Geology and Geophysics
Boulder CO, USA
Moscow, Russia
Remotely Sensed Land Data
Sioux Falls SD, USA
Renewable Resources and Environment
Beijing, China
Recent Crustal Movements
Ondrejov, Czech Republic
Airglow
Mitaka,Japan
Astronomy
Beijing, China
Atmospheric Trace Gases
Oak Ridge TN, USA
Aurora
Tokyo, Japan
Cosmic Rays
Toyokawa, Japan
Geology
Beijing, China
Human Interactions in the Environment
Palisades NY, USA
Ionosphere
Tokyo, Japan
Earth Tides
Brussels, Belgium
Geomagnetism
Copenhagen, Denmark
Edinburgh, UK
Kyoto, Japan
Colaba, India
Glaciology
Boulder CO, USA
Cambridge, UK
Lanzhou, China
Marine Environmental Sciences
Bremen, Germany, (2001)
Rotation of the Earth
Obninsk, Russia
Washington DC, USA
Satellite Information
Greenbelt MD, USA
Rockets and Satellites
Obninsk, Russia
Seismology
Denver CO, USA
Beijing, China
Solar Radio Emission
Nagano, Japan
Space Science
Beijing, China
Space Science Satellites
Kanagawa, Japan
Solar Activity
Meudon, France
Soils
Wageningen, The Netherlands
Sunspot Index
Brussels, Belgium
Solar Terrestrial Physics
Boulder CO, USA
Didcot Oxon, UK
Moscow, Russia
Haymarket, Australia
Solid Earth Geophysics
Beijing, China
Boulder CO, USA
Moscow, Russia
Network of World Data Centers
Geophysical Year 1957
5. Why do we need Data Libraries?
- Good scientific practice
- Needed for verification of scientific
work
- Good availability of data for large
scale and complex scientific
approaches
-
than reproduction
10. Archiving and publication of
scientific data
Data acquisition
Quality assurance
Long-term availability and access
11. Long term archive
Open access & non restricted data
o Creative Commons license
Data accepted from individual scientists,
institutes, and science projects
Long term funding for basic operation
o hardware, software, system management &
organisation
Long term preservation of data
o Technical: security, migration of media,
o Usability: preserving the integrity & semantics of
data sets
17. Indexing contents from relational
database with dynamic updates
Data Set
Staffs
Projects
Data Series
Events
Update Log
XML Data Set
Description
(Metadata)
18. Indexed Information
Textual metadata: citation (authors, title),
abstract, measurement parameters,
methods, associated projects, comments,
documentation including field info for all
XML schema element types)
Fulltext data set contents
Geographical information:
latitude/longitude/BBOX/track, dates,
geological age, depth/elevation
[NumericField/NumericRangeQuery]
Soon: Fulltext of attached external documentation
21. Apache Lucene
as fast Key-Value Store
Lucene is used for almost every query on the
web-client
of keyword terms indexed for quick
retrieval of data sets
Example: Lookup of datsets related to
publications using DOI PANGAEA is hit by
hundreds of DOI lookup queries per second
from scientific publishers:
22. Apache Lucene
as fast Key-Value Store
Lucene is used for almost every query on the
web-client
of keyword terms indexed for quick
retrieval of data sets
Example: Lookup of datsets related to
publications using DOI PANGAEA is hit by
hundreds of DOI lookup queries per second
from scientific publishers: