DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

•

0 likes•270 views

This document discusses real-time classification of point cloud data. It begins by introducing point clouds and their applications in digital terrain mapping, forestry management, and infrastructure monitoring. Issues with large point cloud datasets like processing speed and memory usage are described. The document then summarizes a streaming approach to point cloud classification that processes local neighborhoods of points individually to enable faster results. Feature engineering from the point cloud data is discussed as important for machine learning models. Current results show good classification of some classes but room for improvement on others. The streaming approach implemented in Julia can classify several million points per minute.

Software

Realtime
classification
of pointclouds
DSD-INT 2018
Maarten Pronk

Index
• Pointclouds
• Applications
• Issues
• Classification
• Feature engineering
• “Realtime”
• Streaming algorithms

Pointclouds
• X Y Z
• Millions, billions
• LiDAR
• ALS, TLS

Applications
Digital Terrain Maps (DTM)
Forestry management
Forests in Indonesia
• DTM, CHM, water detection, canal depth detection
Resilient infrastructure
• Anywhere
• Roads

GBs of data (AHN2 is several TB)
Larger than memory (tiling)
Classification is required for derived products
• Filtering for ground
• Normalizing for tree height
We want faster results, doable on your laptop
Issues

Solution?
Streaming approach
• Process only a few points at a time
• Skip tiling
• Only local operations
Machine learning
• Training on existing datasets
• Classification is done instantly
• Does it generalize?

Machine learning
All about the data (features)
AHN2 dataset
• Ground, buildings and water
• No roads, nor trees (up to 50%!)
Vaihingen 3D labeled dataset
• ASPRS reference dataset
• Many classifications but small
Semantic 3D dataset
• TLS reference datasets for NN
• Very large, density issues

Features
• XYZ
• Intensity
• # return | total returns
Not enough to
do multi-label classification

Features
Derived features
• Height above ground
• Principal Component Analysis
• K nearest neighbours
• Geometric distribution
• Different scales (radius)

Features
PCA
• Three values?
• Omni variance
• 3
𝑙1 𝑙2 𝑙3
• 0.98, 0.01, 0.01 → 0,05 1D
• 0.48, 0.48, 0.04 → 0,15 2D
• 0.33, 0.33, 0.33 → 0.33 3D

Feature selection before training
Which scales to use?
• More radii
• Compute time
Importance analysis
• Pairplots

Feature selection after training
PCA is not enough
• Flat surfaces?
Importance analysis
• Times used in tree
• Confusion matrices

Current results
Good:
• Roads
• Shrubs
• Roofs
• Facades
Ok:
• Low vegetation
• Trees
Bad:
• Cars
• Fences

Streaming approach
First pass
• Determine raster based on bounding box
• Determine raster cells for all points (index)
> [1,1,2,2,2,3,4,3,4,4,4]
• Determine number of points for each cell
> 2 3 2 0
0 .. 4 ..
.. ..

Streaming approach
Second pass
• Start storing points in memory
• Until one cell is completely full
• Process all points in cell and classify
• Write results to disk
• Remove points from memory
• Spatial coherence (ordering)
1,2 3,4,5 5,7 ..
.. .. 6,8,9,
…

Streaming approach
Process all points in cell and classify
• For each point, take nearest neighbors
• Calculate new attribute(s) based on these points
• Normalize attribute(s)
• Classify based on new attribute(s)
• Several small k-d trees for number of raster cells
• Classification done using gradient boosted trees

Streaming approach
Implemented in Julia ✓
Performance dependent on
• Number of scales
• Size of each scale
Current result dataset
• Worst case
• 40000 points/s
• Several million per minute

Lessons
• Workflow for high number of iterations
• Data preparation (feature engineering) is important
• Training data has biases
• Generalizing is hard
• Near realtime classification of pointclouds is possible

Similar to DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

ACM 2013-02-25Ted Dunning

12d model - Whats new in V11mpoynts

Vaex pygrunnMaarten Breddels

2015 10-08 - additive manufacturing software 1Biofabrication Group at University of Pisa

Fast Single-pass K-means Clusterting at Oxford MapR Technologies

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering

Clustering - ACM 2013 02-25MapR Technologies

From Trill to Quill and BeyondBadrish Chandramouli

Intro to CassandraJon Haddad

Enar short courseDeepak Agarwal

Nearest Neighbor Customer InsightMapR Technologies

TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen

Faster and smaller inverted indices with Treaps Research Papersameiralk

Oxford 05-oct-2012Ted Dunning

Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy

Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy

Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy

Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.

Hadoop Tutorial with @techmilindEMC

24-ad-hoc.pptsumadi26

Similar to DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk (20)

ACM 2013-02-25

12d model - Whats new in V11

Vaex pygrunn

2015 10-08 - additive manufacturing software 1

Fast Single-pass K-means Clusterting at Oxford

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화

Clustering - ACM 2013 02-25

From Trill to Quill and Beyond

Intro to Cassandra

Enar short course

Nearest Neighbor Customer Insight

TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)

Faster and smaller inverted indices with Treaps Research Paper

Oxford 05-oct-2012

Cassandra Day Atlanta 2015: Diagnosing Problems in Production

Cassandra Day Chicago 2015: Diagnosing Problems in Production

Cassandra Day London 2015: Diagnosing Problems in Production

Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...

Hadoop Tutorial with @techmilind

24-ad-hoc.ppt

Recently uploaded

What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba

%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba

%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba

Announcing Codolex 2.0 from GDK SoftwareJim McKeeth

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2

WSO2CON 2024 - Does Open Source Still Matter?WSO2

WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba

Recently uploaded (20)

What Goes Wrong with Language Definitions and How to Improve the Situation

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

Announcing Codolex 2.0 from GDK Software

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...

WSO2CON 2024 - Does Open Source Still Matter?

WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

%in Soweto+277-882-255-28 abortion pills for sale in soweto

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

1. Realtime classification of pointclouds DSD-INT 2018 Maarten Pronk

2. Index • Pointclouds • Applications • Issues • Classification • Feature engineering • “Realtime” • Streaming algorithms

3. Pointclouds • X Y Z • Millions, billions • LiDAR • ALS, TLS

6. Applications Digital Terrain Maps (DTM) Forestry management Forests in Indonesia • DTM, CHM, water detection, canal depth detection Resilient infrastructure • Anywhere • Roads

7. GBs of data (AHN2 is several TB) Larger than memory (tiling) Classification is required for derived products • Filtering for ground • Normalizing for tree height We want faster results, doable on your laptop Issues

8. Solution? Streaming approach • Process only a few points at a time • Skip tiling • Only local operations Machine learning • Training on existing datasets • Classification is done instantly • Does it generalize?

9. Machine learning All about the data (features) AHN2 dataset • Ground, buildings and water • No roads, nor trees (up to 50%!) Vaihingen 3D labeled dataset • ASPRS reference dataset • Many classifications but small Semantic 3D dataset • TLS reference datasets for NN • Very large, density issues

10. Features • XYZ • Intensity • # return | total returns Not enough to do multi-label classification

11. Features Derived features • Height above ground • Principal Component Analysis • K nearest neighbours • Geometric distribution • Different scales (radius)

12. Features PCA • Three values? • Omni variance • 3 𝑙1 𝑙2 𝑙3 • 0.98, 0.01, 0.01 → 0,05 1D • 0.48, 0.48, 0.04 → 0,15 2D • 0.33, 0.33, 0.33 → 0.33 3D

13. Feature selection before training Which scales to use? • More radii • Compute time Importance analysis • Pairplots

14. Feature selection after training PCA is not enough • Flat surfaces? Importance analysis • Times used in tree • Confusion matrices

15. Current results (original)

16. Current results (classified)

17. Current results Good: • Roads • Shrubs • Roofs • Facades Ok: • Low vegetation • Trees Bad: • Cars • Fences

18. Streaming approach First pass • Determine raster based on bounding box • Determine raster cells for all points (index) > [1,1,2,2,2,3,4,3,4,4,4] • Determine number of points for each cell > 2 3 2 0 0 .. 4 .. .. ..

19. Streaming approach Second pass • Start storing points in memory • Until one cell is completely full • Process all points in cell and classify • Write results to disk • Remove points from memory • Spatial coherence (ordering) 1,2 3,4,5 5,7 .. .. .. 6,8,9, …

20. Streaming approach Process all points in cell and classify • For each point, take nearest neighbors • Calculate new attribute(s) based on these points • Normalize attribute(s) • Classify based on new attribute(s) • Several small k-d trees for number of raster cells • Classification done using gradient boosted trees

21. Streaming approach Implemented in Julia ✓ Performance dependent on • Number of scales • Size of each scale Current result dataset • Worst case • 40000 points/s • Several million per minute

22. Lessons • Workflow for high number of iterations • Data preparation (feature engineering) is important • Training data has biases • Generalizing is hard • Near realtime classification of pointclouds is possible

23. Questions?

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

Recommended

Recommended

More Related Content

Similar to DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

Similar to DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk (20)

More from Deltares

More from Deltares (20)

Recently uploaded

Recently uploaded (20)

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk