Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
GeoServer on Steroids
1. GeoServer on steroids
All you wanted to know about how to make GeoServer faster
but you never asked (or you did and no one answered)
Ing. Andrea Aime, GeoSolutions
Ing. Simone Giannecchini, GeoSolutions
FOSS4G 2011, Denver
12th-16th September 2011
2. GeoSolutions
Founded in Italy in late 2006
Expertise
• Image Processing, GeoSpatial Data Fusion
• Java, Java Enterprise, C++, Python
• JPEG2000, JPIP, Advanced 2D visualization
Supporting/Developing FOSS4G projects
GeoTools, GeoServer
GeoBatch, GeoNetwork
Clients
Public Agencies
Private Companies
http://www.geo-solutions.it
FOSS4G 2011, Denver
12th-16th September 2011
4. Raster Data CheckList
Objectives
Fast extraction of a subset of the data
Fast extraction of overviews
Check-list
Avoid having to open a large number of files per
request
Avoid parsing of complex structures
Avoid on-the-fly reprojection (if possible)
Get to know your bottlenecks
CPU vs Disk Access Time vs Memory
Experiment with
Format, compression, different color models, tile size,
overviews, configuration (in GeoServer of course)
FOSS4G 2011, Denver
12th-16th September 2011
5. Problematic Formats
PNG/JPEG direct serving
Bad formats (especially in Java)
No tiling (or rarely supported)
Chew a lot of memory and CPU for decompression
Mitigate with external overviews
NetCDF/grib1 and similar formats
Complex formats (often with many subdatasets)
Often contains un-calibrated data
Must usually use multiple dimensions
Use ImageMosaic
Must usually massage the data before serving
e.g. transpose X,Y,
FOSS4G 2011, Denver
12th-16th September 2011
6. Problematic Formats
Ascii Grid, GTOPO30, IDRISI and similar formats are bad
ASCII formats are bad
No internal tiling, no compression, no internal
overviews
JPEG2000 (with Kakadu)
Extensible and rich, not (always) fast
Can be difficult to tune for performance (might
require specific encoding options)
ECW and MrSID
Why bother it’s proprietary?
FOSS4G 2011, Denver
12th-16th September 2011
7. Choosing Formats and Layouts
To remember: GeoTiff is a swiss knife
But you don’t want to cut a tree with it!
Tremendously flexible, good fir for most (not all) use
cases
BigTiff pushes the GeoTiff limits farther
Single File VS Mosaic VS Pyramids
Use single GeoTiff when
Overviews and Tiling stay within 4GB
No additional dimensions
Consider BigTiff for very large file (> 4 GB)
Support for tiling
Support for Overviews
Can be inefficient with very large files + small tiling
FOSS4G 2011, Denver
12th-16th September 2011
8. Choosing Formats and Layouts
Use ImageMosaic when:
A single file gets too big (inefficient seeks, too much metadata
to read, etc..)
Multiple Dimensions (time, elevation, others..)
Avoid mosaics made of many very small files
Single granules can be large
Use Tiling + Overviews + Compression on granules
Use ImagePyramid when:
Tremendously large dataset
Too many files / too large files
Need to serve at all scales
Especially low resolution
For single granules (< 2Gb) GeoTiff is generally a good fit
FOSS4G 2011, Denver
12th-16th September 2011
9. Choosing Formats and Layouts
Examples:
Small dataset: single 2GB GeoTiff file
Medium dataset: single 40GB BigTiff
Large dataset: 400GB mosaic made of 10GB BigTiff
files
Extra large: 4TB of imagery, built as pyramid of
mosaics of BigTiff/GeoTiff files to keep the file count
low
FOSS4G 2011, Denver
12th-16th September 2011
10. GeoTiff preparation
STEP 0: get to know your data
gdalinfo utility is your friend CheckList
Missing CRS
Add a .prj file
Fix with gdal_translate
Missing georeferencing
Add a World File
Fix with gdal_translate
Bad Tiling
Fix with gdal_translate
Missing Overviews
Use gdaladdo
Compression
Use gdal_translate
FOSS4G 2011, Denver
12th-16th September 2011
11. GeoTiff preparation
STEP 1: fix and optimize with gdal_translate
Inner Tiling
gdal_translate -co "TILED=YES" -co "BLOCKXSIZE=512" -co
"BLOCKYSIZE=512" in.tif out.tif
Check also GeoTiff driver creation options here
CRS and GeoReferencing
gdal_translate –a_srs “EPSG:32619” –a_ullr 285409.2 2014405.2
287536.8 2011947.6 in.tif out.tif
STEP 2: add overviews with gdal_addo
Leverages on tiff support for multipage files and reduced
resolution pages
gdaladdo -r cubic output.tif 2 4 8 16 32 64 128
Choose the resampling algorithm wisely
Chose the tile size and compression wisely (use
GDAL_TIFF_OVR_BLOCKSIZE)
Consider external overviews
FOSS4G 2011, Denver
12th-16th September 2011
13. GeoTiff preparation
Compression
Consider when disk speed/space is an issue
Control it with gdal_translate and creation options
GeoTiff tiles can be compressed
LZW/Deflate are good for lossless compression
JPEG is good for visually lossless compression
From experience
Use LZW/Deflate on geophysical data (DEM,
acquisitions)
USE JPEG visually lossless with Photometric
Interpretation to YCbCr for RGB
FOSS4G 2011, Denver
12th-16th September 2011
14. Time, Elevation and other
dimensions
Use Cases:
MetOc data (support for time, elevation)
Data with additional indipendent dimensions
WorkFlow
Split in multiple GeoTiff files
Optimize the files individually
Use ImageMosaic
Use a DBMS for indexing granules
Use File Name based property collectors to turn properties into
DB rows attributes
Filter by time, elevation and other attributes via OGC and CQL
filters
Check back up slides for more info!
FOSS4G 2011, Denver
12th-16th September 2011
15. Time, Elevation and other
dimensions
Indexing multiple dimensions with DB support (video
here)
datastore.properties
timeregex.properties
stringregex.properties
indexer.properties
FOSS4G 2011, Denver
12th-16th September 2011
16. Time, Elevation and other
dimensions
FOSS4G 2011, Denver
12th-16th September 2011
17. Proper Mosaic Preparation
ImageMosaic stitches single granules together with basic
processing
Filtered selection
Overviews/Decimation on read
Over/DownSampling in memory
ColorMask (optional)
Mosaic/Stitch
ColorMask again (optional)
Optimize files as if you were serving them individually
Keep a balance between number and dimensions of
granules
FOSS4G 2011, Denver
12th-16th September 2011
18. Proper Mosaic Configuration
STEP 0: Configure Coverage Access (see slide 22)
STEP 1: Configure Mosaic Parameters
ALLOW_MULTITHREADING
Load data from different granules in
parallel
Needs USE_JAI_IMAGE_READ set to
false (Immediate Mode)
Use a proper Tile Size
In-memory processing, must not be too
large
Disk tiling should larger
If memory is scarce:
USE_JAI_IMAGREAD to true
USE_MULTITHREADING to false*
Otherwise
USE_JAI_IMAGREAD to false
ALLOW_MULTITHREADING to true
FOSS4G 2011, Denver
12th-16th September 2011
19. Proper Mosaic Configuration
Optional (Advanced): Configure Mosaic Parameters
Directly
Caching
Load the index in memory (using JTS SRTree)
Super fast granule lookup, good for shapefiles
Bad if you have additional dimension to filter on
Based on Soft References, controlled via Java switch
SoftRefLRUPolicyMSPerMB
ExpandToRGB
Expand colormapped imagery to RGB in
memory
Trade performance for quality
SuggestedSPI
Default ImageIO Decoder
class to use
Don’t touch unless expert
FOSS4G 2011, Denver
12th-16th September 2011
20. Proper Pyramid Preparation
Use gdal_retile for creating the pyramid
Prepare the list of tiles to be retiled
Create the pyramid with GDAL retile (grab a coffee!)
Chunks should not be too small (here 2048x2048)
Too many files is bad anyway
Use internal Tiling for Larger chunks size
If the input dataset is huge use the useDirForEachRow option
Too many files in a dir is bad practice
Make sure the number of level is consistent
Too few bad performance at high scale
FOSS4G 2011, Denver
12th-16th September 2011
21. Proper Pyramid Configuration
STEP 0: Configure Coverage Access (see slide 22)
STEP 1: Configure Pyramid Parameters
ALLOW_MULTITHREADING
Load data from different granules in
parallel
Needs USE_JAI_IMAGE_READ set to
false (Immediate Mode)
Use a proper Tile Size
In-memory processing, must not be too
large
Disk tiling should larger
If memory is scarce:
USE_JAI_IMAGREAD to true
USE_MULTITHREADING to false*
Otherwise
ImagePyramid relies USE_JAI_IMAGREAD to false
on ImageMosaic ALLOW_MULTITHREADING to true
FOSS4G 2011, Denver
12th-16th September 2011
22. Proper Pyramid Configuration
Optional (Advanced): Configure Mosaic Parameters
Directly
Caching
Load the index in memory (using JTS SRTree)
Super fast granule lookup, good for shapefiles
Bad if you have additional dimension to filter on
Based on Soft References, controlled via Java switch
SoftRefLRUPolicyMSPerMB
ExpandToRGB
Expand colormapped imagery to RGB in
memory
Trade performance for quality
SuggestedSPI
Default ImageIO Decoder
class to use
Don’t touch unless expert
FOSS4G 2011, Denver
12th-16th September 2011
23. Proper GDAL Formats Configuration
Fix Missing/Improper CRS with PRJ or coverage config
Fix Missing GeoReferencing with World File
Make sure GDAL_DATA is properly configured
Use a proper Tile Size
In-memory processing, must not be
too large
Fundamental for striped data! JNI
overhead
Disk tiling should larger
If memory is scarce:
USE_JAI_IMAGREAD to true
USE_MULTITHREADING to true*
Otherwise
USE_JAI_IMAGREAD to false
USE_MULTITHREADING is ignored
FOSS4G 2011, Denver
12th-16th September 2011
24. Proper JPEG2000 Kakadu
Configuration
Fix Missing/Improper CRS with PRJ or coverage config
Fix Missing GeoReferencing with World File
Make sure Kakadu dll/so is properly loaded
Use a proper Tile Size
In-memory processing
Must not be too large
Disk tiling should larger
If memory is scarce:
USE_JAI_IMAGREAD to true
USE_MULTITHREADING to true*
Otherwise
USE_JAI_IMAGREAD to false
USE_MULTITHREADING is ignored
FOSS4G 2011, Denver
12th-16th September 2011
25. Proper GeoServer
Coverage Options Configuration
Make sure native JAI and Image is
installed
Enable ImageIO native acceleration
Enable JAI Mosaicking native
acceleration
Give JAI enough memory
Don’t raise JAI memory Threshold too
high
Rule of thumb: use 2 X #Core Tile
Threads (check next slide)
Enable Tile Recycling only on trunk
Enable Tile Recycling if memory is not
a problem
FOSS4G 2011, Denver
12th-16th September 2011
26. Proper GeoServer
Coverage Options Configuration
Multithreaded Granule Loading
Allows to fine tuning multithreading
for ImageMosaic
Orthogonal to JAI Tile Threads
Rule of Thumb: use 2 X #Core Tile
Threads
Perform testing to fine tune
depending on layer configuration as
well as on typical requests
ImageIO Cache threshold
decide when we switch to disk
cache (very large WCS requests)
FOSS4G 2011, Denver
12th-16th September 2011
27. Reprojection Performance
Vs Quality
GeoServer 2.1.x reprojects raster data using a piecewise-
linear algorithm
The area is divided in rectangular blocks, each having its
own affine transform
The transformation between the full trigonometric
expressions and the linear ones is driven by a tolerance,
default value is 0.333
Larger value will make reprojection faster, but lower the
quality
-Dorg.geotools.referencing.resampleTolerance=0.5
FOSS4G 2011, Denver
12th-16th September 2011
29. Vector data checklikst
What do we want from vector data:
Binary data
No complex parsing of data structures
Fast extraction of a geographic subset
Fast filtering on the most commonly used attributes
FOSS4G 2011, Denver
12th-16th September 2011
30. Choosing a format
Slow formats Good formats, local and
indexable
WFS
Shapefile
GML
Directory of shapefiles
DXF
SDE
Spatial databases: PostGIS,
Oracle Spatial, DB2,
MySQL*, SQL server*
FOSS4G 2011, Denver
12th-16th September 2011
31. Shapefiles vs DBMS
Speed comparison vs spatial extent depicted:
Shapefile very fast when rendering the full dataset
Database faster when extracting a small subset of a
very large data set
Shapefile
no attribute indexing, avoid if filtering on attribute is
important (filtering == reading less data, not applying
symbols)
Database
Rich support for complex native filters
Use connection pooling (preferably via JNDI)
Validate connections (with proper pooling)
FOSS4G 2011, Denver
12th-16th September 2011
32. Shapefile preparation
Remove .qix file if present, let GeoServer 2.1.x rebuild it
(more efficient)
If there are large DBF attributes that are not in use, get rid
of them using ogr2ogr, e.g.:
ogr2ogr -select FULLNAME,MTFCC arealm.shp
tl_2010_08013_arealm.shp
If on Linux, enable memory mapping, faster, more scalable
(but will kill Windows):
FOSS4G 2011, Denver
12th-16th September 2011
33. Shapefile filtering
Stuck with shapefiles and have scale dependent rules like
the following?
Show highways first
Show all streets when zoomed in
Use ogr2ogr to build two shapefiles, one with just the
highways, one with everything, and build two layers, e.g.:
ogr2ogr -sql "SELECT * FROM
tl_2010_08013_roads WHERE MTFCC in ('S1100',
'S1200')" primaryRoads.shp
tl_2010_08013_roads.shp
FOSS4G 2011, Denver
12th-16th September 2011
34. PostGIS specific hints
PostgreSQL out of the box configured for very small
hardware:
http://wiki.postgresql.org/wiki/Performance_Optimization
Make sure to run ANALYZE after data imports (updates
optimizer stats)
As usual, avoid large joins in SQL views, consider
materialized views
If the dataset is massive, CLUSTER on the spatial index:
http://postgis.refractions.net/documentation/manual-
1.3/ch05.html
Careful with prepared statements (bad performance)
FOSS4G 2011, Denver
12th-16th September 2011
36. Use scale dependencies
Never show too much data
the map should be readable, not a graphic blob. Rule of thumb:
1000 features max in the display
FOSS4G 2011, Denver
12th-16th September 2011
37. Labeling
Labeling conflict resolution is expensive, limit to the most
inner zooms
Halo is important for readability, but adds significant
overhead
Careful with maxDisplacement, makes for various label
location attempts
FOSS4G 2011, Denver
12th-16th September 2011
38. FeatureTypeStyle
GeoServer uses SLD FeatureTypeStyle objects as Z layers
for painting
Each one allocates its own rendering surface (which can
use a lot of memory), use as few as possible
FOSS4G 2011, Denver
12th-16th September 2011
39. Use translucency sparingly
Translucent display is expensive, use it sparingly
FOSS4G 2011, Denver
12th-16th September 2011
40. Scale dependent rules
Too often forgotten or little used, yet very important:
Hide layers when too zoomed in (raster/vector
example)
Progressively show details
Add more expensive rendering when there are less
features
Key to any high performance / good looking map
FOSS4G 2011, Denver
12th-16th September 2011
42. Hide as you zoom in
Add a MinScaleDenominator to the rule
This will make the layer disappear at 1:75000
(towards 1:1)
FOSS4G 2011, Denver
12th-16th September 2011
43. Alternative rendering
Simple rendering at low scale (up to 1:2000)
More complex rendering when zoomed in (1:1999
and above)
FOSS4G 2011, Denver
12th-16th September 2011
45. Point symbols
• 600 loc for 6
different points types
• Painful…
FOSS4G 2011, Denver
12th-16th September 2011
46. Prepare data
alter table pointlm add column image varchar;
update pointlm set image = 'shop_supermarket.p.16.png' where MTFCC =
'C3081' and (FULLNAME like '%Shopping%' or FULLNAME like '%Mall%');
update pointlm set image = 'peak.png' where MTFCC = 'C3022'
update pointlm set image = 'amenity_prison.p.20.png' where MTFCC =
'K1236';
update pointlm set image = 'museum.p.16.png' where MTFCC = 'K2165';
update pointlm set image = 'airport.p.16.png' where MTFCC = 'K2451';
update pointlm set image = 'school.png' where MTFCC = 'K2543';
update pointlm set image = 'christian3.p.14.png' where MTFCC =
'K2582';
update pointlm set image = 'gate2.png' where MTFCC = 'K3066';
FOSS4G 2011, Denver
12th-16th September 2011
56. WMS request limits
Max memory per request: avoid large requests, allows to
size the server memory (max concurrent request * max
memory)
Max time per request: avoid requests taking too much time
(e.g., using a custom style provided with dynamic SLD in
the request)
Max errors: best effort renderer, but handling errors takes
time
FOSS4G 2011, Denver
12th-16th September 2011
57. WFS request limits
Max feature returned, configured as a global limit
Return feature bbox: reduce amount of generated GML
Per layer max feature count
FOSS4G 2011, Denver
12th-16th September 2011
59. Control flow
Control how many requests are executed in parallel, queue
others:
Increase throughput
Control memory usage
Enforce fairness
More info here
FOSS4G 2011, Denver
12th-16th September 2011
60. Control flow
17%
$GEOSERVER_DATA_DIR/controlflow.properties
# don't allow more than 16 GetMap requests in parallel
ows.wms.getmap=16
FOSS4G 2011, Denver
12th-16th September 2011
61. Auditing
Log each and every request
Log contents driven by customizable template
Summarize and analyze requests with offline tools
More info here
FOSS4G 2011, Denver
12th-16th September 2011
62. JVM and deploy configuration
FOSS4G 2011, Denver
12th-16th September 2011
63. Premise
The options discussed here are not going to help visibly if
you did not prepare the data and the styles
They are finishing touches that can get performance up
once the major data bottlenecks have been dealt with
Check “Running in production” instructions here
FOSS4G 2011, Denver
12th-16th September 2011
64. JVM settings
--server: enables the server JIT compiler
--Xms2048m -Xmx2048m: sets the JVM use two gigabytes
of memory
--XX:+UseParallelOldGC -XX:+UserParallelGC: enables
multi-threaded garbage collections, useful if you have
more than two cores
--XX:NewRatio=2: informs the JVM there will be a high
number of short lived objects
--XX:+AggressiveOpt: enable experimental optimizations
that will be defaults in future versions of the JVM
FOSS4G 2011, Denver
12th-16th September 2011
65. Native JAI and JDK
Install native JAI and use a recent Sun JDK!
Benchmark over a small data set (the effect is not as
visible on larger ones)
FOSS4G 2011, Denver
12th-16th September 2011
66. Setup a local cluster
Java2D locks when drawing antialiased vectors
Limits scalability severely
Use Apache mod_proxy_balance and setup a GeoServer
each 2/4 cores
mod_proxy_balance
GeoServer GeoServer
GeoServer
FOSS4G 2011, Denver
12th-16th September 2011
67. Clustering advantage
FOSS4G 2010 vector benchmarks (roads/buildings/isolines
and so on, over the entire Spain)
GeoServer was benchmarked without local clustering
66%
FOSS4G 2011, Denver
12th-16th September 2011
69. Using JMeter
Good benchmarking tool
Allows to setup multiple thread groups, different
parallelelism and request count, to ramp up the load
Can use CSV files to generate semi-randomized requests
Reports results in a simple table
http://jakarta.apache.org/jmeter/
FOSS4G 2011, Denver
12th-16th September 2011
70. Using JMeter
Thread group: how many
threads
Loop: how many
requests
HTTP sampler: the
request
CSV: read request
params from CSV
Summary table
FOSS4G 2011, Denver
12th-16th September 2011
71. Generating the CSV
Simple randomized generation tool built during WMS
shootouts, wms_request.py
Generate csv with the bbox and width/height to be used in
JMeter scripts:
./wms_request.py -count 1200
-region -180 -90 180 90
-minres 0.002 -maxres 0.1
-minsize 256 256 -maxsize 1024 1024
Get it here along with a corresponding JMeter script:
http://demo1.geo-solutions.it/share/jmeter_2011.zip
FOSS4G 2011, Denver
12th-16th September 2011
72. Checking results
Results table
Run the benchmarks 2-3 times, let the results stabilize
Save the results, check other optimizations, compare the
results
FOSS4G 2011, Denver
12th-16th September 2011
75. Raster data
Whole Italy at 50cm per pixel
Over 4TB, updated fully every 3 years (old data still
available for historical access)
Custom pyramid
100 m per pixel: one image
20m per pixel: mosaic of 20 tiles
4m per pixel: mosaic of few hundred tiles
0.5m per pixel: 9000 tiles
Each tile is 10000x10000, with overviews
FOSS4G 2011, Denver
12th-16th September 2011
76. Vector data
Cadastral data for the whole Italy, with full history
(interval of validity for each parcel)
100 million polygons
A query extracts a subset relative to a certain time
interval and area the user is allowed to see
No data from this table is ever shown below 1:50000 (SLD
scale dependencies)
Physical table level partitioning (Oracle style) of the table
based on geographic area to parallelize and cluster data
loading, plus spatial indexing and indexes on commonly
filtered upon attributes
FOSS4G 2011, Denver
12th-16th September 2011
77. The End
Questions?
andrea.aime@geo-solutions.it
simone.giannecchini@geo-solutions.it
FOSS4G 2011, Denver
12th-16th September 2011