The NASA Vision Workbench: Reflections on Image Processing in C++

The NASA
Vision Workbench
Reﬂections on Image Processing in C++
Matt Hancher & Michael Broxton
Intelligent Robotics Group
January 7, 2009, Willow Garage
Intelligent Systems Division NASA Ames Research Center

Talk Overview

• Overview and Background

• Introduction to the Vision Workbench

• Vision Workbench Modules and Applications

• Under the Hood: Templates, Views, and Lazy Evaluation

• Lessons Learned and Future Directions


NASA Ames Research Center

• NASA’s Silicon Valley
research center
• Small spacecraft
• Supercomputers
• Lunar & Planetary Science
• Intelligent Systems
• Human Factors
• Thermal protection systems
• Aeronautics
• Astrobiology


Intelligent Robotics Group (IRG)
• Areas of expertise
• Applied computer vision
• Human-robot interaction
• Instrument deployment & placement
• Interactive 3D visualization
• Robot software architectures

• Science-driven exploration
• Instrument placement, resource
mapping, analysis support
• Low speed, deliberative operation

• Fieldwork-driven operations
• Precursor missions (site survey, site
survey, deployment, etc.)
• Manned missions (human-paced
interaction, inspection, etc.)


The NASA Vision Workbench

• Open-source image processing and machine vision library
in C++

• Developed as a foundation for unifying image processing
work at NASA Ames

• A “second-generation” C++ image processing library,
drawing on lessons learned by VXL, GIL, VIGRA, etc.

• Designed for easy, expressive coding of efﬁcient image
processing algorithms


Obtaining the Vision Workbench

• Available under the NASA Open Source
Agreement (NOSA), an OSI-approved non-
viral open source license.

• VW version 2.0 alpha snapshots currently
being released for the brave. (We use it.)

http://ti.arc.nasa.gov/visionworkbench/


Image Module Basics


API Philosophy

• Simple, natural, mathematical, expressive

• Treat images as first-class mathematical data
types whenever possible
• Example: IIR filtering for background subtraction
background += alpha * ( image - background );

• Direct, intuitive function calls
• Example: A Gaussian smoothing filter
result = gaussian_filter( image, 3.0 );


The Core Image Type

ImageView<PixelT>

• Stores a reference-counted array of pixels.
• Templatized on the pixel type; e.g.
ImageView<PixelRGB<uint8> >

• Supports an arbitrary number of image planes.


The ImageView Public Interface
ImageView<...> img;
Constructing ImageView<...> img(cols,rows);
ImageView<...> img(cols,rows,planes);

img.set_size(cols,rows);
Changing dimensions img.set_size(cols,rows,planes);

img.cols()
Getting dimensions img.rows()
img.planes()

img(col,row)
Accessing pixels img(col,row,plane)

ImageView<...>::iterator
STL iterator img.begin()
img.end()

ImageView<...>::pixel_accessor
Pixel accessor img.origin()


Built-In Pixel Types
PixelGray<float32>
Grayscale PixelGrayA<uint8>
PixelRGB<double>
RGB PixelRGBA<int16>
PixelHSV<float32>
PixelXYZ<float32>
Color spaces PixelLuv<float32>
PixelLab<float32>
float32, float64 and 8,16,32,
Unitless (e.g. kernels) 64 bit signed and unsigned integer

Vectors Vector<float64,4>

float32, float64 and 8,16,32,
Unitless (e.g. kernels) 64 bit signed and unsigned integer
PixelMask<float>
Masked Pixels PixelMask<PixelRGBA<uint8> >

• Try something like this at the top of your code:
typedef ImageView<PixelRGB<double> > Image;


Simple ImageView Operations
• Operations like these are inexpensive and
“shallow” or “lazy.”
transpose(img) rotate_180(img)

flip_vertical(img) flip_horizontal(img)

rotate_90cw(img) rotate_90ccw(img)

crop(img,x,y,cols,rows)

subsample(img,factor)
subsample(img,xfactor,yfactor)

• Use copy() to make a deep copy if you need one.
copy(img)


Slicing and Dicing
• Select an individual plane or channel “slice”:
select_plane(img,plane)

select_channel(img,channel)

• Interpret pixel channels as image planes:
channels_to_planes(img)

• Example: making a PixelRGBA<float32> image opaque:
fill( select_channel(img,3), 1.0 );


ImageView Filtering Operations

convolution_filter(img,kernel)

separable_convolution_filter(img,xkernel,ykernel)

gaussian_filter(img,sigma)

derivative_filter(img,xderiv,yderiv)

laplacian_filter(img)

threshold_filter(img,thresh,hi,lo)

...

• There are several options, including edge extensions:
img = gaussian_filter(img, 3.0, ZeroEdgeExtention());


Some Simple Filtering Examples
Original Gaussian

X Derivative Laplacian


ImageView Operators
• Mathematical operators on images work as you’d like.
• Add, subtract, multiply, and divide images (per-pixel).
• Add or subtract a constant pixel value offset.
• Multiply or divide by scalars.
• Example: IIR ﬁltering for background subtraction.
bkg_img += 0.02 * (src_img - bkg_img);

• Operators are the best way to do image arithmetic
with the Vision Workbench.


More ImageView Math
• Most standard math functions work on images too.
abs exp log

sqrt pow hypot

sin cos tan

asin acos atan

sinh cosh tanh

asinh acosh atanh

...and more!

• Example: Computing gradient orientation.
orientation = atan2(grad_y, grad_x);


ImageView Math Examples
Gradient Orientation Gradient Magnitude

Absolute Difference of Gaussians Logarithmic Map


Per-Pixel ImageView Operations
• Cast to a new pixel type or channel type:
pixel_cast<NewPixelT>(img)

channel_cast<NewChannelT>(img)

• Explicit casts are generally not needed to convert
between color spaces.

• Apply an arbitrary function to each pixel, or to each
channel of each pixel:
per_pixel_filter(img,func)

per_pixel_channel_filter(img,func)


Example: Color Detection
• E.g. in color ﬁducial tracking and object tracking
ImageView<PixelRGB<double> > input = ...;
double hue_ref = 0.54;

ImageView<PixelHSV<double> > hsv_im = gaussian_filter( input, 1.0 );

ImageView<double> hue = select_channel( hsv_im, 0 );
ImageView<double> sat = select_channel( hsv_im, 1 );

ImageView<double> match_im = ( 1.0 - 20.0*abs(hue-hue_ref) ) * sat*sat;


Image Transformation
• Arbitrary image transformations via
transform “functors” that deﬁne a mapping.
warped = transform( image, my_txform );

• Simple wrappers for common cases.
resample(img,xscale,yscale) resize(img,xsize,ysize)

translate(img,xoff,yoff) rotate(img,angle)

• Customizable interpolation and image edge
extension via optional arguments.


Transformation Examples
Rotation Homography

Radial Distortion Arbitrary Transformation


Modules & Applications


Interest Point & Alignment Module


Interest Point & Alignment Module

Original
Images

Aligned
Images


Mosaic Module Basics


CTX Polar Mosaic

• Based on pre-release
polar data captured by
CTX on Mars
Reconnaissance
Orbiter
• Two weeks of
development time

• Stats:
• 1610 source images
• 305-GB of source imagery
• 40.3 Gigapixels


Cartography Module


High Dynamic Range Module
• Merge multiple exposures of the same scene to increase
dynamic range.

• Closely related to photometric calibration of orbital
imagery.

LDR HDR


HDR Module


Application: Image Matching
• Problem: Given an image, ﬁnd others like it.

Example database: Apollo Metric Camera images


Texture-Based Image Matching
Model
image

Texture bank filtering
Filtering
(Gaussian 1st derivative and LOG)

Grouping to remove orientation
Output Representation
Energy in a window

E-M Gaussian mixture model
Segmentation
Iterative tryouts, MDL

Max vote
Post-processing

Grouping
Summarization
Mean energy in segment

Euclidian distance
Vector Comparison

Matched
image


Texture Matching Filter Bank


Image Matching: Results


Stereo Module
Right Image

2. Sub-pixel
1. Discrete
Refinement
Correlation
• Fit a 2D convex quadratic
• Find the integer
surface to the nine nearest
offset (disparity) that
points in correlation fitness
minimizes the sum
space.
of absolute
Template Region
difference between (from Left Image)
template region and
the right image.
Discrete Correlation
For speed:
Sub-pixel Correlation
• Coarse-to-fine
processing.
Candidate
• Disparity search
Disparity(dx, dy)
sub-regioning1
• Box filter-optimized Search Area
correlator.

1. Changming Sun. Rectangular Subregioning
and 3-D Maximum-Surface Techniques for Fast
Stereo Matching. In Proceedings of the IEEE
Workshop on Stereo and Multi-Baseline Vision
3. Consistency Checks
(2001)

• Left/Right Cross Check
• Median Filtering

Other methods to be added soon:
• Epipolar, photometric, continuity/
smoothness constraints.
• Robust Cost Function


Improved Stereo Matching:
Affine-adaptive Sub-pixel Correlation
• Right Image
Foreshortening is the geometric effect that gives rise to stereo processing. However, the
change in perspective on a sloped surface can confuse an area-based stereo correlator.

• The solution is to use an iterative algorithm to adapt the correlation window (e.g. affine).

AS15-M-1134 AS15-M-1135


Handling “Noise”
Right Image

• The occasional speck of dust or lint on the Apollo scans can throw off our stereo correlator.

• We have shown that we can mitigate this effect somewhat by using robust statistics.

Dust and lint on AS15-M-1134


Right Image



DEM (Note error due to dust...)


Right Image



DEM (with error corrected using Cauchy robust weighting)


The Ames Stereo Pipeline
• Problem: Given multiple images, compute the 3D terrain.

Mars Pathfinder &
Mars Exploration Rovers (MER) & Viz
MarsMap

Mars Polar Lander & Viz

NASA Ames has been developing surface reconstruction techniques for planetary exploration since the mid 1990s.


Architectural Overview
The Stereo Pipeline is a relatively thin
Vision Workbench Overview
application built upon the open source ARC
• Modular, extensible, C++ machine vision and image Vision Workbench and USGS ISIS toolkits.
processing library (Linux, OS-X, Win32)
• Developed as a framework for unifying image processing
Mission Specific Code
work at NASA Ames
Stereo Pipeline
• Designed for easy, expressive coding of efficient image
ISIS
processing algorithms.

Vision Camera
Vision Workbench Modules
Workbench
• Core (abstract datatypes & utilities)
VW Camera Models
• Camera (models & calibration)
Image ISIS Camera Models
• Cartography (geospatial images)
• GPU (HW accelerated processing)
Image Processing
Stereo
• HDR (high-dynamic range images)
• Interest Point (tracking & matching) Dense Stereo Correlation
FileIO
• Mosaic (composite & blend huge images)
Stereo Camera Geometry
Image File I/O
• Stereo Processing (high-quality DEMs & 3D models)
ISIS File I/O
Cartography
InterestPoint DEM Generation

Image Alignment Georeferenced File I/O


Mars Stereo: MOC NA
MGS MOC-Narrow Angle
• Malin Space Science Systems
• Altitude: 388.4 km (typical)
• Line Scan Camera: 2048 pixels
• Focal length: 3.437m
• Resolution: 1.5-12m / pixel
• FOV: 0.5 deg


Galaxius Fluctus Channel

This VRML model was generated from MOC image pair M01-00115 and E02-01461 (34.66°N, 141.29°E). The complete
stereo reconstruction process takes approximately ﬁve minutes on a 3.0GHz workstation (1024x8064 pixels). This model is
shown without vertical elevation exaggeration.


Warrego Vallis System

Lower Left: This 3D model was generated from MOC-NA images E01-02032 and M07-02071 (42.66°S, 93.55°E).
Upper Right: Ortho-image overlay. Areas of interpolated data are colored red.


NE Terra Meridiani

!%
quot;quot;
#$
!!

$
#
quot;

quot;quot;
quot;quot;

!%
#$

!!quot;quot;quot; $
!%quot;quot;#$

Upper Left: This DTM was generated from MOC images E04-01109 and M20-01357 (2.38°N, 6.40°E). The contour lines (20m
spacing) overlay an ortho-image generated from the 3D terrain model. Lower Right: An oblique view of the corresponding VRML
model.


Lunar Stereo: Apollo Orbiter Cameras

ITEK Panoramic Camera
• Focal length: 610 mm (24”)
• Optical bar camera
• Apollo 15,16,17 Scientific
Instrument Module (SIM)
• Film image: 1.149 x 0.1149 m
• Resolution: 108-135 lines/mm


Apollo 17 Landing Site

Top: Stereo reconstruction

Right: Handheld photo taken by an
orbiting Apollo 17 astronaut


Public Outreach: Haydn Planetarium


Recent Developments:
Processing Large Satellite Imagery
• The Vision Workbench handles Apollo Metric Camera HiRISE LROC
(16,000x16,000) (20,000x40,000) (10000x50000)
arbitrarily large images via
intelligent caching and a flexible
abstraction of an image called an
“image view.”
• Image operations are evaluated
lazily, allowing for optimization down
the line.
HRSC CTX
• Processing occurs one tile at a time, (5184x (5064x
16000) 16000)
and is usually driven by the output
operation (i.e. writing a tile to disk).
• DiskImageView, BlockCacheView,
ImageViewRef, block_rasterize(),
and blocked-savvy write-image()/
FileIO
• Scalable performance on multi-
threaded machines (soon to include MOC-NA
(2048x4800)
Columbia, NASA’s supercomputer)
• Thread and ThreadPool/WorkQueue MER
objects (1024x1024)

• Specifically targeting the stereo
correlator, outlier rejection, and
Nominal Resolutions for Various Imagers. All sizes given in pixels.
stereo intersection algorithms. Apollo Panoramic Camera is not shown (25400 x 244000 pixels)!


Recent Developments:
Least Squares Bundle Adjustment
Right Image

Refining Apollo SPICE Kernels

• Camera position and pose in “historical” SPICE kernels
provided by ASU provide a good initial solution, but they
will require refinement.

• Incorporate new Apollo Metric Camera tie-points into
ULCN 2005 - or - tie these points to the preliminary
LOLA control network in late 2009.

• This work will be carried out as part of a USGS/ARC
LASER proposal during FY09/FY10.

Automating Bundle Adjustment

• Automate tie-point matching using the SIFT and SURF
algorithms.

• Experimenting with reducing sensitivities to outliers using
Robust Statistics (i.e. error models with a “heavy tailed”
probability distributions)
Top: Partial view of Orbit 33 stereo reconstruction. Note the discontinuities in the colored,
hillshaded terrain. Bottom: KSU “Bundlevis” visualization of bundle adjustment for AS15-M-113[5-7]


A Peek Under the Hood


Problem: Intermediate Results
• What happens when you chain operations?
result = image1 + image2 + image3;

result = transpose( crop(x,y,31,31) );

• Normally those would be the same as these:
Image tmp = image1 + image2;
result = tmp + image3;

Image tmp = crop(image,x,y,31,31);
result = transpose(tmp);

• That would be terribly inefﬁcient! Computing the
intermediate requires an extra pass over the data.


Solution: Lazy Evaluation
• The + operator returns a special image sum object.
• The actual computation is only performed when you
set an ImageView equal to one of these objects.

• The entire operation is performed in the inner loop,
once per pixel.

• No intermediate image is needed!
• No second pass over the data is needed, either!


Generalizing the View Concept
• An image view is any object that you can access just
like a regular old ImageView object.
Image::pixel_type
Type deﬁnitions Image::result_type

img.cols()
Getting dimensions img.rows()
img.planes()

img(col,row)
Accessing pixels img(col,row,plane)

Image::pixel_accessor
Pixel accessor img.origin()

Image::prerasterize_type
Rasterization prerasterize(bbox)
template <DestT> rasterize(dest,bbox)

• The data can be anywhere, or it can be computed.

The Pixel Accessor Public Interface
• Pixel accessors are the most efﬁcient way to move
around the pixels in an image, and are typically
used to implement rasterization functions.

• They behave somewhat like standard C++
iterators.
acc.prev_col()
acc.next_col()
acc.prev_row()
Iteration acc.next_row()
acc.prev_plane()
acc.next_plane()

acc.advance(cols,rows)
Advancement acc.advance(cols,rows,planes)

Pixel access *acc


Views, Views, Everywhere!

• None of the functions we’ve seen so far do anything.
• Instead, they immediately return view objects that
represent processed views of the underlying data.

• Nested function calls produce nested view types.
• The computation happens in either the assignment
operator or the constructor of the destination.

• We call this ﬁnal step the “rasterization” of one view
into another view.


Block Rasterization

• Ultra-large (larger than memory) images are are
easily supported.
• All image views natively support block-by-block
computation (“rasterization”).
• write_image() computes per-block or -line
• QuadTreeGenerator computes per-block
• BlockCacheView allows you to manually
control block computation in a nested view.
template <DestT> Image::rasterize(DestT const& dest, BBox2i const& bbox);


A Trivial First Example
• SLOG: Sign of Laplacian of Gaussian

Image slog =
threshold_filter( laplacian_filter( gaussian_filter( img, 1.5 ) ) );


Generic View Types Can be Complicated!

• The type of the resulting view object becomes complex very
quickly.

Image slog =
threshold_filter( laplacian_filter( gaussian_filter( img, 1.5 ) ) );

UnaryPerPixelView<ConvolutionView<SeparableConvolutionView<ImageView<PixelRGB<float> >,
double, ConstantEdgeExtension>,
double, ConstantEdgeExtension>,
UnaryCompoundFunctor<ChannelThresholdFunctor<PixelRGB<float> > >


Other Advantages to Views
• Generalized views emerged as the solution to several
problems at once.

• On-disk images can be supported cleanly.
• Procedurally-generated images can be, too.
• If you only want a small number of processed pixel
values, e.g. near interest points, make the view and just
ask it for those values.

• Lazy evaluation permits more sophisticated algorithmic
optimizations down the road.

Naïve Laziness can be Very Bad™

• What happens when you chain convolutions?
result = convolution_filter(convolution_filter(image,kern1),kern2);

• Now the intermediate result is an important cache:
Image tmp = convolution_filter(image,kern1);
result = convolution_filter(tmp,kern2);

• Without this cache, performance will be terrible.
• In the Vision Workbench, intermediate results are
computed and cached when necessary.


Generic vs. Abstract Views

• Views could be either template-based (generic) or
virtual-function-based (abstract).

• Because pixel access often appears in tight inner
loops, the template-based solution performs better.

• Templates are also more ﬂexible. Virtualization can
only recover one hidden type at a time.

• Alas, keeping track of complex types can be annoying.
Fortunately, the end user usually doesn’t have to.


Virtualizing Image Views
• Sometimes the abstract base class approach is better.
• Run-time polymorphism.
• Hiding complex types altogether.

• The ImageViewRef class wraps an arbitrary view in a veil
of abstraction.
• Templatized only on the pixel type.
• Contains a pointer to a special abstract base class.
• Has reference semantics (but re-bindable).

ImageViewRef<float> img_ref = My(Complex(Image(View(Type(img)))));

• Great for keeping a lazy view around if you only want
to evaluate it at select points.

Image Resources
• Image resources, such as image ﬁles on disk, may
have unknown pixel/channel types.
PixelFormatEnum pixel_format()
Getting type info ChannelTypeEnum channel_type()

int32 img.cols()
Getting dimensions int32 img.rows()
int32 img.planes()

void read( ImageBuffer buf, BBox2i bbox )
Accessing pixel data void write( ImageBuffer buf, BBox2i bbox )

Vector2i native_block_size()
Other void flush()

• ImageBuffer is a simple struct describing a block of
contiguous pixels in memory.

• Read/write functions call helper functions to
convert to/from the desired pixel type.

Lessons Learned and
Thoughts for the Future


Templates and Laziness Revisited
• The image view framework currently serves
multiple purposes:

• Lazy evaluation of pixels on demand

• Block rasterization of gigantic images

• Eliminating unwanted temporaries

• This sometimes results in confused design.

• Lazy views need not be fully statically deﬁned: that is
a premature optimization that complicates design.


Example: Image Transformation
• This simple expression:
rotate( image, 45*M_PI/180 )

• Returns this complex type (assuming an RGB8 image):
TransformView< InterpolationView< EdgeExtensionView< ImageView<PixelRGB<uint8> >,
ZeroEdgeExtension >,
BilinearInterpolation >,
RotateTransform >

• Nested views are very powerful, but the resulting view is
needlessly complex.

• Virtualizing the edge extension step has negligible impact
on performance. Virtualizing the interpolation step is
impossible.


Template Pitfalls
• A common and frustratingly terrible idiom for
supporting multiple pixel types:
template <class PixelT>
int do_something_useful(…) {
// Your actual program code
};

int main(int argc, char *argv) {
// Parse the arguments...

DiskImageResource *resource = DiskImageResource::open(image_filename);
ChannelTypeEnum channel_type = resource->channel_type();
PixelFormatEnum pixel_format = resource->pixel_format();

switch(pixel_format) {
case VW_PIXEL_GRAY:
switch(channel_type) {
case VW_CHANNEL_UINT8: return do_something_useful<PixelGray<uint8> >(…);
case VW_CHANNEL_UINT16: return do_something_useful<PixelGray<uint16> >(…);
// And so on...
}
// And so on...
}
}

• Annoying to write, takes forever to compile, and
results in huge executables.

A More Pythonic Way
• Process an image using its native pixel type, as long as its
a standard type:
>>> import vw
>>> input = vw.read_image( ‘my_image.jpg’ )
>>> filtered = vw.gaussian_filter( input, 3 )
>>> vw.write_image( ‘filtered_image.jpg’, filtered )

• Coercion to a speciﬁc pixel type:
>>> input = vw.read_image( ‘my_image.jpg’, ptype=vw.PixelRGB, ctype=vw.uint8 )

• Successfully implemented in the Python bindings.

• It’s great to use, and terrible to implement.

• Results in huge Python bindings, especially due to SWIG
limitations on multiple compilation units.

Proliferation of Image Concepts
• ImageView : Static pixel type, pixels stored
contiguously in memory.
• ImageViewRef : Static pixel type, abstracts
arbitrary block image computation.
• ImageResource : Dynamic pixel type, block
image access with conversion.
• ImageBuffer : Dynamic pixel type, pixels stored
in a block in memory.

• A dynamically typed version of ImageViewRef?

A Dynamic View Abstraction?
• ImageView needs to be templatized on the pixel type for fast and
easy pixel access, but this does not prevent it from also adhering
to a dynamically typed view abstraction.

• Automatic pixel type casting/coercion is needed to avoid a
combinatorial explosion.

• Existing ImageResource interface may be close.... (for 3.0?)

• Currently exploring an intermediate solution (essentially a
dynamic version of ImageViewRef) for 2.0 release.

PixelFormatEnum pixel_format()
Getting type info ChannelTypeEnum channel_type()

int32 img.cols()
Getting dimensions int32 img.rows()
int32 img.planes()

Rasteriztion void rasterize( ImageBuffer buf, BBox2i bbox )


An OpenCV – VW Bridge?
• OpenCV contains many algorithms that Vision
Workbench users would love to use.
• The simplest approach would be a direct bridge
between ImageView and IplImage.
• A more powerful approach would be to
produce Vision Workbench views whose
rasterizers invoke OpenCV algorithms.
• This would automatically support applying many
OpenCV algorithms to gigantic images, and ﬁt
naturally into the VW view ecosystem.


Questions / Discussion



The NASA Vision Workbench: Reflections on Image Processing in C++

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The NASA Vision Workbench: Reflections on Image Processing in C++

Similar to The NASA Vision Workbench: Reflections on Image Processing in C++ (20)

Recently uploaded

Recently uploaded (13)

The NASA Vision Workbench: Reflections on Image Processing in C++