This document discusses real-time classification of point cloud data. It begins by introducing point clouds and their applications in digital terrain mapping, forestry management, and infrastructure monitoring. Issues with large point cloud datasets like processing speed and memory usage are described. The document then summarizes a streaming approach to point cloud classification that processes local neighborhoods of points individually to enable faster results. Feature engineering from the point cloud data is discussed as important for machine learning models. Current results show good classification of some classes but room for improvement on others. The streaming approach implemented in Julia can classify several million points per minute.
6. Applications
Digital Terrain Maps (DTM)
Forestry management
Forests in Indonesia
• DTM, CHM, water detection, canal depth detection
Resilient infrastructure
• Anywhere
• Roads
7. GBs of data (AHN2 is several TB)
Larger than memory (tiling)
Classification is required for derived products
• Filtering for ground
• Normalizing for tree height
We want faster results, doable on your laptop
Issues
8. Solution?
Streaming approach
• Process only a few points at a time
• Skip tiling
• Only local operations
Machine learning
• Training on existing datasets
• Classification is done instantly
• Does it generalize?
9. Machine learning
All about the data (features)
AHN2 dataset
• Ground, buildings and water
• No roads, nor trees (up to 50%!)
Vaihingen 3D labeled dataset
• ASPRS reference dataset
• Many classifications but small
Semantic 3D dataset
• TLS reference datasets for NN
• Very large, density issues
11. Features
Derived features
• Height above ground
• Principal Component Analysis
• K nearest neighbours
• Geometric distribution
• Different scales (radius)
12. Features
PCA
• Three values?
• Omni variance
• 3
𝑙1 𝑙2 𝑙3
• 0.98, 0.01, 0.01 → 0,05 1D
• 0.48, 0.48, 0.04 → 0,15 2D
• 0.33, 0.33, 0.33 → 0.33 3D
13. Feature selection before training
Which scales to use?
• More radii
• Compute time
Importance analysis
• Pairplots
14. Feature selection after training
PCA is not enough
• Flat surfaces?
Importance analysis
• Times used in tree
• Confusion matrices
18. Streaming approach
First pass
• Determine raster based on bounding box
• Determine raster cells for all points (index)
> [1,1,2,2,2,3,4,3,4,4,4]
• Determine number of points for each cell
> 2 3 2 0
0 .. 4 ..
.. ..
19. Streaming approach
Second pass
• Start storing points in memory
• Until one cell is completely full
• Process all points in cell and classify
• Write results to disk
• Remove points from memory
• Spatial coherence (ordering)
1,2 3,4,5 5,7 ..
.. .. 6,8,9,
…
20. Streaming approach
Process all points in cell and classify
• For each point, take nearest neighbors
• Calculate new attribute(s) based on these points
• Normalize attribute(s)
• Classify based on new attribute(s)
• Several small k-d trees for number of raster cells
• Classification done using gradient boosted trees
21. Streaming approach
Implemented in Julia ✓
Performance dependent on
• Number of scales
• Size of each scale
Current result dataset
• Worst case
• 40000 points/s
• Several million per minute
22. Lessons
• Workflow for high number of iterations
• Data preparation (feature engineering) is important
• Training data has biases
• Generalizing is hard
• Near realtime classification of pointclouds is possible