SlideShare uma empresa Scribd logo
1 de 25
PANGAEA - Providing access to
geoscientific data using Apache
Lucene Java
Uwe Schindler
PANGAEA / SD DataSolutions GmbH, uschindler@pangaea.de
My Background
My main focus is on development of Lucene Java.
Implemented fast numerical search and maintaining the new
attribute-based text analysis API.
Studied Physics at the University of Erlangen-Nuremberg and
work as consultant and software architect for PANGAEA
(Publishing Network for Geoscientific & Environmental Data)
in Bremen, Germany, where I implemented the portal's geo-
spatial retrieval functions with Lucene Java.
Talks about Lucene at various international conferences like
ApacheCon EU/US, Lucene Eurocon, Berlin Buzzwords and
various local meetups.
I am committer and PMC member of Apache Lucene and Solr.
since 1993
Information system for earth system science data hosted by AWI &
MARUM
2001
Mandate of the International Council for Science (ICSU):
World Data Center for Marine Environmental Sciences (WDC-
MARE)
2007
Mandate of the World Meteorological Organisation (WMO):
World Radiation Monitoring Center (WRMC)
2010 (certification in progress)
Mandate of the World Meteorological Organisation (WMO):
Data Collection and Processing Center (DCPC)
About PANGAEA
Nuclear Radiation
Tokyo, Japan
WDC Co-ordination Offices
Washington DC, USA
Beijing, China
Meteorology
Asheville NC, USA
Beijing, China
Obninsk, Russia
Oceaography
Obninsk, Russia
Silver Spring MD, USA
Tianjin, China
Paleoclimatology
Boulder CO, USA
Marine Geology and Geophysics
Boulder CO, USA
Moscow, Russia
Remotely Sensed Land Data
Sioux Falls SD, USA
Renewable Resources and Environment
Beijing, China
Recent Crustal Movements
Ondrejov, Czech Republic
Airglow
Mitaka,Japan
Astronomy
Beijing, China
Atmospheric Trace Gases
Oak Ridge TN, USA
Aurora
Tokyo, Japan
Cosmic Rays
Toyokawa, Japan
Geology
Beijing, China
Human Interactions in the Environment
Palisades NY, USA
Ionosphere
Tokyo, Japan
Earth Tides
Brussels, Belgium
Geomagnetism
Copenhagen, Denmark
Edinburgh, UK
Kyoto, Japan
Colaba, India
Glaciology
Boulder CO, USA
Cambridge, UK
Lanzhou, China
Marine Environmental Sciences
Bremen, Germany, (2001)
Rotation of the Earth
Obninsk, Russia
Washington DC, USA
Satellite Information
Greenbelt MD, USA
Rockets and Satellites
Obninsk, Russia
Seismology
Denver CO, USA
Beijing, China
Solar Radio Emission
Nagano, Japan
Space Science
Beijing, China
Space Science Satellites
Kanagawa, Japan
Solar Activity
Meudon, France
Soils
Wageningen, The Netherlands
Sunspot Index
Brussels, Belgium
Solar Terrestrial Physics
Boulder CO, USA
Didcot Oxon, UK
Moscow, Russia
Haymarket, Australia
Solid Earth Geophysics
Beijing, China
Boulder CO, USA
Moscow, Russia
Network of World Data Centers
Geophysical Year 1957
Why do we need Data Libraries?
- Good scientific practice
- Needed for verification of scientific
work
- Good availability of data for large
scale and complex scientific
approaches
-
than reproduction
Geosciences before 1900
Turin papyrus,
~1160 BC
William Smith, 1815Glomar challenger, 1875
ENIAC, 1944
Technical Improvements
Magnetometer
Development of the global
climate
The last 1300 years
Thousands of years before present
Thousands of years before present
0
5
10
15
20
25
30
1970 1980 1990 2000 2010
Publications
Data
?
Information increase in empirical sciences
Archiving and publication of
scientific data
Data acquisition
Quality assurance
Long-term availability and access
Long term archive
Open access & non restricted data
o Creative Commons license
Data accepted from individual scientists,
institutes, and science projects
Long term funding for basic operation
o hardware, software, system management &
organisation
Long term preservation of data
o Technical: security, migration of media,
o Usability: preserving the integrity & semantics of
data sets
Contents
Data Types in PANGAEA
IRD
( gr av/ 10 cm3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
IRD
( gr av/ 10 cm3)
Sand
( %)
CaCO3
( %)
TOC
( %)
Radio
( %/ sand)
Smect
( %/ clay)
PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1
Age (kyr) max. : 233.55 kyr PS1389-3ff
0.0
100.0
200.0
0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100
54° 0' 54° 0'
54°30' 54°30'
55° 0' 55° 0'
55°30' 55°30'
11°
11°
12°
12°
13°
13°
14°
14°
15°
15°
World vector shore line
Grain size class KOLP A
Grain size class KOEHN2
Grain size class KOEHN
Geochemistry
Grain size class KOLP B
Grain size class KOLP DIN
20 m
Scale: 1:2695194 at Latitude 0°
Source: Baltic Sea Research Institute, Warnemünde.
Profiles => doi:10.1594/PANGAEA.701299
Time series => doi:10.1594/PANGAEA.323487
Sea bed photos => doi:10.1594/PANGAEA.319877
Distributes samples => doi:10.1594/PANGAEA.51749
Complex data => doi:10.1594/PANGAEA.108079
Air photos => doi:10.1594/PANGAEA.323540
Audio record => doi:10.1594/PANGAEA.339110
unclassified
Sediment
Water
Corals
Atmosphere Ice
Total number of data sets ~ 1 million
Data items ~ 8 billions
Statistics (9/2010)
Now the technical details :-)
Sybase
ASE
MiddlewareWebserver
Editorial
system
PANGAEA
search
engine
PANGAEA -
Architecture
Harddisk
+ tape (silo)
RDB
Apache
Lucene
Google
Maps /
Earth
Indexing contents from relational
database with dynamic updates
Data  Set
Staffs
Projects
Data  Series
Events
Update  Log
XML  Data  Set
Description
(Metadata)
Indexed Information
Textual metadata: citation (authors, title),
abstract, measurement parameters,
methods, associated projects, comments,
documentation including field info for all
XML schema element types)
Fulltext data set contents
Geographical information:
latitude/longitude/BBOX/track, dates,
geological age, depth/elevation
[NumericField/NumericRangeQuery]
Soon: Fulltext of attached external documentation
Geo-Retrieval with Lucene
Using scored queries
with KML regions as filters
Apache Lucene
as fast Key-Value Store
Lucene is used for almost every query on the
web-client
of keyword terms indexed for quick
retrieval of data sets
Example: Lookup of datsets related to
publications using DOI PANGAEA is hit by
hundreds of DOI lookup queries per second
from scientific publishers:
Apache Lucene
as fast Key-Value Store
Lucene is used for almost every query on the
web-client
of keyword terms indexed for quick
retrieval of data sets
Example: Lookup of datsets related to
publications using DOI PANGAEA is hit by
hundreds of DOI lookup queries per second
from scientific publishers:
PRESENTATION
Live
Contact
Uwe Schindler
PANGAEA - Publishing Network for Geoscientific &
Environmental Data
MARUM, Leobener Str., 28359 Bremen, Germany
uschindler@pangaea.de
SD DataSolutions GmbH
Wätjenstr. 49, 28213 Bremen, Germany
uschindler@sd-datasolutions.de
Thank you!
Know more about Apache Lucene at
www.lucidimaginatin.com

Mais conteúdo relacionado

Destaque

The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketPaul Williamson
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...Lucidworks (Archived)
 
Presentacion Ingles
Presentacion InglesPresentacion Ingles
Presentacion Inglestanica
 
Presentation
PresentationPresentation
Presentationtarodnova
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlersLucidworks (Archived)
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条彰 村地
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemLucidworks (Archived)
 
The scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena GomezThe scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena Gomeztanica
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Windows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作るWindows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作る彰 村地
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Tennis
TennisTennis
Tennisaritz
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Integrating Advanced Text Analytics into Solr
Integrating Advanced Text Analytics into SolrIntegrating Advanced Text Analytics into Solr
Integrating Advanced Text Analytics into SolrLucidworks (Archived)
 
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Marty Kaszubowski
 
Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucidworks (Archived)
 

Destaque (20)

Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
 
The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the market
 
What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
 
Presentacion Ingles
Presentacion InglesPresentacion Ingles
Presentacion Ingles
 
Presentation
PresentationPresentation
Presentation
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlers
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search Problem
 
The scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena GomezThe scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena Gomez
 
Maroon5
Maroon5Maroon5
Maroon5
 
Windows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作るWindows 8 で魅力的なWeb サイトを作る
Windows 8 で魅力的なWeb サイトを作る
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Tennis
TennisTennis
Tennis
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Integrating Advanced Text Analytics into Solr
Integrating Advanced Text Analytics into SolrIntegrating Advanced Text Analytics into Solr
Integrating Advanced Text Analytics into Solr
 
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
Center for Enterprise Innovation (CEI) Summary for HREDA, 9-25-14
 
Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010
 
What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9What’s New in Apache Lucene 2.9
What’s New in Apache Lucene 2.9
 

Semelhante a Pangaea providing access to geoscientific data using apache lucene java

What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshotsdatacite
 
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...datacite
 
C7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
C7.05: Ocean Observations Research Coordination Network - Hans-Peter PlagC7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
C7.05: Ocean Observations Research Coordination Network - Hans-Peter PlagBlue Planet Symposium
 
Toward a Global Coral Reef Observatory
Toward a Global Coral Reef ObservatoryToward a Global Coral Reef Observatory
Toward a Global Coral Reef ObservatoryLarry Smarr
 
TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...
TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...
TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...grssieee
 
DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...
DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...
DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...Deltares
 
Edwards climate data detectives - yale 2-2015
Edwards   climate data detectives - yale 2-2015Edwards   climate data detectives - yale 2-2015
Edwards climate data detectives - yale 2-2015Paul Edwards
 
PERICLES Preserving space data
PERICLES Preserving space dataPERICLES Preserving space data
PERICLES Preserving space dataPERICLES_FP7
 
IGARSS 2011 - FR3.T02 Clement ALBINET.ppt
IGARSS 2011 - FR3.T02 Clement ALBINET.pptIGARSS 2011 - FR3.T02 Clement ALBINET.ppt
IGARSS 2011 - FR3.T02 Clement ALBINET.pptgrssieee
 
Cyberinfrastructure for Ocean Observing
Cyberinfrastructure for Ocean ObservingCyberinfrastructure for Ocean Observing
Cyberinfrastructure for Ocean ObservingLarry Smarr
 
IGARSS 2011 - TU4.T05 Clement ALBINET.ppt
IGARSS 2011 - TU4.T05 Clement ALBINET.pptIGARSS 2011 - TU4.T05 Clement ALBINET.ppt
IGARSS 2011 - TU4.T05 Clement ALBINET.pptgrssieee
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesLarry Smarr
 
Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...
Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...
Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...Sajid Pareeth
 
土壤及地下水年度盛會在南港展覽館登場
土壤及地下水年度盛會在南港展覽館登場土壤及地下水年度盛會在南港展覽館登場
土壤及地下水年度盛會在南港展覽館登場epaslideshare
 
The role of the British Oceanographic Data Centre (BODC) in IODE
The role of the British Oceanographic Data Centre (BODC) in IODEThe role of the British Oceanographic Data Centre (BODC) in IODE
The role of the British Oceanographic Data Centre (BODC) in IODEwardappeltans
 
SC7 Workshop 2: Space Data for Secure Societies
SC7 Workshop 2: Space Data for Secure SocietiesSC7 Workshop 2: Space Data for Secure Societies
SC7 Workshop 2: Space Data for Secure SocietiesBigData_Europe
 
CNES OpenSearch implementations
CNES OpenSearch implementationsCNES OpenSearch implementations
CNES OpenSearch implementationsGasperi Jerome
 
Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...
Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...
Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...IES / IAQM
 
Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...
Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...
Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...STUK - Säteilyturvakeskus
 

Semelhante a Pangaea providing access to geoscientific data using apache lucene java (20)

What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshots
 
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
 
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
2013 DataCite Summer Meeting - Introducing DataCite services (Jan Brase - Dat...
 
C7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
C7.05: Ocean Observations Research Coordination Network - Hans-Peter PlagC7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
C7.05: Ocean Observations Research Coordination Network - Hans-Peter Plag
 
Toward a Global Coral Reef Observatory
Toward a Global Coral Reef ObservatoryToward a Global Coral Reef Observatory
Toward a Global Coral Reef Observatory
 
TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...
TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...
TH2.L10.5: OVERVIEW ON CALIBRATION AND VALIDATION ACTIVITIES FOR ESA’S SOIL M...
 
DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...
DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...
DSD-INT 2017 Moth plant dispersion Modelling based on synoptic weather patter...
 
Edwards climate data detectives - yale 2-2015
Edwards   climate data detectives - yale 2-2015Edwards   climate data detectives - yale 2-2015
Edwards climate data detectives - yale 2-2015
 
PERICLES Preserving space data
PERICLES Preserving space dataPERICLES Preserving space data
PERICLES Preserving space data
 
IGARSS 2011 - FR3.T02 Clement ALBINET.ppt
IGARSS 2011 - FR3.T02 Clement ALBINET.pptIGARSS 2011 - FR3.T02 Clement ALBINET.ppt
IGARSS 2011 - FR3.T02 Clement ALBINET.ppt
 
Cyberinfrastructure for Ocean Observing
Cyberinfrastructure for Ocean ObservingCyberinfrastructure for Ocean Observing
Cyberinfrastructure for Ocean Observing
 
IGARSS 2011 - TU4.T05 Clement ALBINET.ppt
IGARSS 2011 - TU4.T05 Clement ALBINET.pptIGARSS 2011 - TU4.T05 Clement ALBINET.ppt
IGARSS 2011 - TU4.T05 Clement ALBINET.ppt
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
 
Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...
Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...
Inter-sensor comparison of lake surface temperatures derived from MODIS, AVHR...
 
土壤及地下水年度盛會在南港展覽館登場
土壤及地下水年度盛會在南港展覽館登場土壤及地下水年度盛會在南港展覽館登場
土壤及地下水年度盛會在南港展覽館登場
 
The role of the British Oceanographic Data Centre (BODC) in IODE
The role of the British Oceanographic Data Centre (BODC) in IODEThe role of the British Oceanographic Data Centre (BODC) in IODE
The role of the British Oceanographic Data Centre (BODC) in IODE
 
SC7 Workshop 2: Space Data for Secure Societies
SC7 Workshop 2: Space Data for Secure SocietiesSC7 Workshop 2: Space Data for Secure Societies
SC7 Workshop 2: Space Data for Secure Societies
 
CNES OpenSearch implementations
CNES OpenSearch implementationsCNES OpenSearch implementations
CNES OpenSearch implementations
 
Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...
Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...
Dr Clare Ostle, Marine Biological Association - An overview of Climate Linked...
 
Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...
Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...
Jyväskylän yliopiston kiihdytinlaboratorio - perustutkimusta ja uusia avauksi...
 

Mais de Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 

Mais de Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 

Pangaea providing access to geoscientific data using apache lucene java

  • 1. PANGAEA - Providing access to geoscientific data using Apache Lucene Java Uwe Schindler PANGAEA / SD DataSolutions GmbH, uschindler@pangaea.de
  • 2. My Background My main focus is on development of Lucene Java. Implemented fast numerical search and maintaining the new attribute-based text analysis API. Studied Physics at the University of Erlangen-Nuremberg and work as consultant and software architect for PANGAEA (Publishing Network for Geoscientific & Environmental Data) in Bremen, Germany, where I implemented the portal's geo- spatial retrieval functions with Lucene Java. Talks about Lucene at various international conferences like ApacheCon EU/US, Lucene Eurocon, Berlin Buzzwords and various local meetups. I am committer and PMC member of Apache Lucene and Solr.
  • 3. since 1993 Information system for earth system science data hosted by AWI & MARUM 2001 Mandate of the International Council for Science (ICSU): World Data Center for Marine Environmental Sciences (WDC- MARE) 2007 Mandate of the World Meteorological Organisation (WMO): World Radiation Monitoring Center (WRMC) 2010 (certification in progress) Mandate of the World Meteorological Organisation (WMO): Data Collection and Processing Center (DCPC) About PANGAEA
  • 4. Nuclear Radiation Tokyo, Japan WDC Co-ordination Offices Washington DC, USA Beijing, China Meteorology Asheville NC, USA Beijing, China Obninsk, Russia Oceaography Obninsk, Russia Silver Spring MD, USA Tianjin, China Paleoclimatology Boulder CO, USA Marine Geology and Geophysics Boulder CO, USA Moscow, Russia Remotely Sensed Land Data Sioux Falls SD, USA Renewable Resources and Environment Beijing, China Recent Crustal Movements Ondrejov, Czech Republic Airglow Mitaka,Japan Astronomy Beijing, China Atmospheric Trace Gases Oak Ridge TN, USA Aurora Tokyo, Japan Cosmic Rays Toyokawa, Japan Geology Beijing, China Human Interactions in the Environment Palisades NY, USA Ionosphere Tokyo, Japan Earth Tides Brussels, Belgium Geomagnetism Copenhagen, Denmark Edinburgh, UK Kyoto, Japan Colaba, India Glaciology Boulder CO, USA Cambridge, UK Lanzhou, China Marine Environmental Sciences Bremen, Germany, (2001) Rotation of the Earth Obninsk, Russia Washington DC, USA Satellite Information Greenbelt MD, USA Rockets and Satellites Obninsk, Russia Seismology Denver CO, USA Beijing, China Solar Radio Emission Nagano, Japan Space Science Beijing, China Space Science Satellites Kanagawa, Japan Solar Activity Meudon, France Soils Wageningen, The Netherlands Sunspot Index Brussels, Belgium Solar Terrestrial Physics Boulder CO, USA Didcot Oxon, UK Moscow, Russia Haymarket, Australia Solid Earth Geophysics Beijing, China Boulder CO, USA Moscow, Russia Network of World Data Centers Geophysical Year 1957
  • 5. Why do we need Data Libraries? - Good scientific practice - Needed for verification of scientific work - Good availability of data for large scale and complex scientific approaches - than reproduction
  • 6. Geosciences before 1900 Turin papyrus, ~1160 BC William Smith, 1815Glomar challenger, 1875
  • 8. Development of the global climate The last 1300 years Thousands of years before present Thousands of years before present
  • 9. 0 5 10 15 20 25 30 1970 1980 1990 2000 2010 Publications Data ? Information increase in empirical sciences
  • 10. Archiving and publication of scientific data Data acquisition Quality assurance Long-term availability and access
  • 11. Long term archive Open access & non restricted data o Creative Commons license Data accepted from individual scientists, institutes, and science projects Long term funding for basic operation o hardware, software, system management & organisation Long term preservation of data o Technical: security, migration of media, o Usability: preserving the integrity & semantics of data sets
  • 13. Data Types in PANGAEA IRD ( gr av/ 10 cm3) Sand ( %) CaCO3 ( %) TOC ( %) Radio ( %/ sand) Smect ( %/ clay) IRD ( gr av/ 10 cm3) Sand ( %) CaCO3 ( %) TOC ( %) Radio ( %/ sand) Smect ( %/ clay) IRD ( gr av/ 10 cm3) Sand ( %) CaCO3 ( %) TOC ( %) Radio ( %/ sand) Smect ( %/ clay) IRD ( gr av/ 10 cm3) Sand ( %) CaCO3 ( %) TOC ( %) Radio ( %/ sand) Smect ( %/ clay) IRD ( gr av/ 10 cm3) Sand ( %) CaCO3 ( %) TOC ( %) Radio ( %/ sand) Smect ( %/ clay) PS1389-3 PS1390-3 PS1431-1 PS1640-1 PS1648-1 Age (kyr) max. : 233.55 kyr PS1389-3ff 0.0 100.0 200.0 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 0 20 0 100 0 15 0 0. 5 0 50 0 100 54° 0' 54° 0' 54°30' 54°30' 55° 0' 55° 0' 55°30' 55°30' 11° 11° 12° 12° 13° 13° 14° 14° 15° 15° World vector shore line Grain size class KOLP A Grain size class KOEHN2 Grain size class KOEHN Geochemistry Grain size class KOLP B Grain size class KOLP DIN 20 m Scale: 1:2695194 at Latitude 0° Source: Baltic Sea Research Institute, Warnemünde. Profiles => doi:10.1594/PANGAEA.701299 Time series => doi:10.1594/PANGAEA.323487 Sea bed photos => doi:10.1594/PANGAEA.319877 Distributes samples => doi:10.1594/PANGAEA.51749 Complex data => doi:10.1594/PANGAEA.108079 Air photos => doi:10.1594/PANGAEA.323540 Audio record => doi:10.1594/PANGAEA.339110
  • 14. unclassified Sediment Water Corals Atmosphere Ice Total number of data sets ~ 1 million Data items ~ 8 billions Statistics (9/2010)
  • 15. Now the technical details :-)
  • 17. Indexing contents from relational database with dynamic updates Data  Set Staffs Projects Data  Series Events Update  Log XML  Data  Set Description (Metadata)
  • 18. Indexed Information Textual metadata: citation (authors, title), abstract, measurement parameters, methods, associated projects, comments, documentation including field info for all XML schema element types) Fulltext data set contents Geographical information: latitude/longitude/BBOX/track, dates, geological age, depth/elevation [NumericField/NumericRangeQuery] Soon: Fulltext of attached external documentation
  • 20. Using scored queries with KML regions as filters
  • 21. Apache Lucene as fast Key-Value Store Lucene is used for almost every query on the web-client of keyword terms indexed for quick retrieval of data sets Example: Lookup of datsets related to publications using DOI PANGAEA is hit by hundreds of DOI lookup queries per second from scientific publishers:
  • 22. Apache Lucene as fast Key-Value Store Lucene is used for almost every query on the web-client of keyword terms indexed for quick retrieval of data sets Example: Lookup of datsets related to publications using DOI PANGAEA is hit by hundreds of DOI lookup queries per second from scientific publishers:
  • 24. Contact Uwe Schindler PANGAEA - Publishing Network for Geoscientific & Environmental Data MARUM, Leobener Str., 28359 Bremen, Germany uschindler@pangaea.de SD DataSolutions GmbH Wätjenstr. 49, 28213 Bremen, Germany uschindler@sd-datasolutions.de
  • 25. Thank you! Know more about Apache Lucene at www.lucidimaginatin.com