2. Processing large datasets
• Amalgamate datasets from 32 local
authorities
• Complex spatial operations on very large
datasets:
– Can be slow..
3. Example
• Clipping a polygon dataset to local authority
boundaries throughout Scotland
• Complexity added by huge number of
polygons
– Islands (Shetland, Orkney, Eilean Siar, Argyll &
Bute)
5. Performance improvements?
• Clipper transformer
– 'Clippers first' - features are processed as soon as
possible, not sitting unnecessarily in memory
• Batch processing
– WorkspaceRunner
• Spatial database
– make use of spatial indices
10. Using a spatial database with
SQLExecutor
SELECT a.temp_key, a.local_auth,
CASE
WHEN ST_CoveredBy(a.geom, b.geom)
THEN a.geom
ELSE ST_Intersection(a.geom, b.geom)
END
AS geom
FROM input_table AS a,
clipper_table AS b
WHERE ST_Intersects(a.geom, b.geom)
AND a.local_auth = b.local_auth
17. Workspace Runner
• Huge potential for performance increase by
batching processes and running in parallel.
• Creating ‘master’ workbenches which can be
run by altered ‘runner’ workbenches.
– Use like a custom transformer – updates to the
master flow through to the workspaces which call
on the process.
18. FeatureReader
• What if you don’t know which table to read?
– Eg. read the most recent file in a directory
• FeatureReader can help!