This document describes a new method for georegistering and stabilizing aerial video over mountainous terrain using LIDAR data. The method registers images to high-resolution digital elevation models by generating predicted images from the DEM and sensor model, registering these to the actual images, and correcting the sensor model. Examples show the method stabilizes shaky video, tracks moving objects, produces orthorectified video draped over DEMs, and aligns video and thermal infrared mosaics with map graphics in Google Earth. The method processes images in about 1 second and achieves absolute geolocation accuracy of 1-2 meters.
1. Stabilization and
Georegistration of Aerial
Video Over Mountain
Terrain by Means of LIDAR
IGARSS 2011, Vancouver, Canada
July 24-29, 2011
Mark Pritt, PhD Kevin LaTourette
Lockheed Martin Lockheed Martin
Gaithersburg, Maryland Goodyear, Arizona
mark.pritt@lmco.com kevin.j.latourette@lmco.com
2. Problem: Georegistration
Georegistration is the assignment of 3-D geographic
coordinates to the pixels of an image.
It is required for many geospatial applications:
Fusion of imagery with other sensor data
Alignment of imagery with GIS and map graphics
Accurate 3-D geolocation
Inaccurate georegistration can be a major problem:
Correctly
aligned
Misaligned
GIS
2
3. Solution
Our solution is image registration to a high-resolution
digital elevation model (DEM):
A DEM post spacing of 1 or 2 meters yields good results.
It also works with 10-meter post spacing.
Works with terrain data derived from many sources:
LIDAR: BuckEye, ALIRT, Commercial
Stereo Photogrammetry: Socet Set® DSM
SAR: Stereo and Interferometry
USGS DEMs
3
4. Methods
Create predicted images from the DEM, illumination
conditions, sensor model estimates and actual images.
Register the images while refining the sensor model.
Iterate.
Aerial Video
Sensor
Illumination
Occlusion Predicted
Shadow Images
Scene
4
5. Methods (cont)
The algorithm identifies tie
points between the
predicted and the actual
images by means of NCC
Predicted (normalized cross
Image correlation) with RANSAC
from DEM outlier removal.
Predicted Registration
Image from Tie Point
Aerial Image Detections
5
6. Methods (cont)
The algorithm uses the refined sensor model as the
initial guess for the next video frame:
Initial Register Refine Next Iterate Finish
Camera Frame
• Estimate • Predict • Compose • Register to • Iterate for • Trajectory
camera images from registration previous each video • Propagate
model DEM and fcn & camera frame frame geo data
• Use camera camera • LS fit for • Compose from DEM
focal length • Register better cam with cam of • Resample
& platform images with estimate prev. frame images for
GPS if avail. NCC • Iterate for init. cam orthomosaic
estimate
The refined sensor model enables georegistration.
Exterior orientation: Platform position and rotation angles
Interior orientation: Focal length, pixel aspect ratio, principal point
and radial distortion
6
7. Example 1: Aerial Motion Imagery
Inputs:
Aerial Motion Imagery over
1/3 Arc-second
Arizona, U.S. USGS DEM
Area: 64 km2
Post Spacing: 10 m
16 Mpix, 3.3 fps, panchromatic
7
8. Example 1 (cont)
Problem: Too shaky to find moving objects
Zoomed to full resolution (1 m)
8
9. Example 1: Results
Outputs:
Sensor camera models
Images georegistered to DEM
Platform trajectory
9
10. Example 1 Results (cont)
ATV
Vehicle Human
Pickup Video is now
Truck
stabilized, and as a
result, moving
objects are easily
detected.
10
11. Example 2: Oblique Motion Imagery
Inputs:
Oblique Motion Imagery Over LIDAR DEM
Arizona, U.S.
Area: 24 km2
Post Spacing: 1 m
16 Mpix, 3.4 fps, pan
11
12. Example 2: Results
Target
Tracking
Stabilized Map
Video Inset coordinates
Aligned
Map
Graphics
Orthorectified
Video
Background
LIDAR DEM Aligned
Map
Graphics
12
13. Example 2 Results (cont)
How fast does the algorithm converge?
IMAGE 1 Camera Iteration The initial error
Tie Point Residuals
1 2 3 is high, but it
20
Num tie
points:
319 318 282 18
16 RMSE decreases after
Image Pixels
14
RMSE: 17.4 4.8 2.9 12
mean only several
10 sigma
Mean Δx: 1.4 -0.7 0.1 8
6
iterations.
4
Mean Δy: -3.8 -0.1 0 2
0
Sigma Δx: 15.8 4 2.5
1 2 3
Sigma Δy: 6 2.6 1.5 Camera Iteration
Subsequent
IMAGE 591 Camera Iteration
Tie Point Residuals frames have
1 2 3
3 better initial
Num tie
681 687 681 2.5 RMSE sensor model
points
Image Pixels
2 mean
RMSE 2.7 0.6 0.3 1.5 sigma
estimates and
Mean Δx 1 0 0 1 require only 2
Mean Δy 0.9 0 0 0.5 iterations.
0
Sigma Δx 2.1 0.5 0.3
1 2 3
Sigma Δy 0.9 0.2 0.1 Camera Iteration
13
14. Example 3: Aerial Video
Inputs:
Aerial Video Over LIDAR DEM
Arizona, U.S.
Area: 24 km2
720 x 480 Color 30 fps Post Spacing: 1 m
14
15. Example 3: Results
Background Map
Image coordinates
Draped Over
DEM
Orthorectified
Video
Aligned
Map
Graphics
15
16. Example 3 Results (cont)
Map Graphics Stay Aligned with Features in Video
16
17. Example 4: Thermal Infrared Video
Inputs:
Commercial
MWIR Video Over White
Tank Mountains in Arizona LIDAR DEM
Post Spacing: 2 m
1 Mpix, 3.3 fps
17
18. Example 4: Results
Video Mosaic
Georegistered and
Draped Over Mountains
in Google Earth
Video
Mosaic
Background Inset:
LIDAR DEM Original
Video
with Map
Graphics
Overlay
18
20. Conclusion
We have introduced a new method for aerial video
georegistration and stabilization.
It registers images to high-resolution DEMs by:
Generating predicted images from the DEM and sensor model;
Registering these predicted images to the actual images;
Correcting the sensor model estimates with the registration results.
Processing speed is 1 sec per 16-Mpix image on a PC.
Absolute geospatial accuracy is about 1-2 meters.
We are developing a rigorous error propagation model to quantify
the accuracy.
Applications:
Video stabilization and mosacs
Cross-sensor registration
Alignment with GIS map graphics
20