Parallel implementation of geodesic distance transform with application in superpixel segmentation
1. PARALLEL IMPLEMENTATION OF GEODESIC DISTANCE TRANSFORM WITH
APPLICATION IN SUPERPIXEL SEGMENTATION
Tuan Q. Pham
Canon Information Systems Research Australia (CiSRA)
1 Thomas Holt drive, North Ryde, NSW 2113, Australia.
tuan.pham@cisra.canon.com.au
ABSTRACT
This paper presents a parallel implementation of geodesic distance transform using OpenMP. We show how a sequentialbased chamfer distance algorithm can be executed on parallel
processing units with shared memory such as multiple cores
on a modern CPU. Experimental results show a speedup of
2.6 times on a quad-core machine can be achieved without
loss in accuracy. This work forms part of a C implementation
for geodesic superpixel segmentation of natural images.
Index Terms— geodesic distance transform, OpenMP,
superpixel segmentation
1. INTRODUCTION
Due to a raster order organisation of pixels in an image, many
image processing algorithms operate in a sequential fashion.
This sequential processing is suitable for running on a single
processor system. However, even Personal Computers (PC)
now have multiple processing cores. In fact, the number of
cores on a chip is likely to double every 18 months to sustain Moore’s law [23]. As a result, there is a strong need to
parallelise existing image processing algorithms to run more
efficiently on multi-core hardware.
OpenMP (Open Multi-Processing) is a powerful yet
simple-to-use application programming interface that supports many functionalities for parallel programming. OpenMP
uses a shared-memory model, in which all threads share a
common address space. Each thread can have additional private data under explicit user control. This shared-memory
model simplifies the task of programming because it avoids
the need to synchronise memory across different processors
on a distributed system. The shared-memory model also fits
well with the multi-core architecture of modern CPUs.
Parallel programming using OpenMP has gained significant interests in the image processing community in recent
years. In 2010, the IEEE Signal Processing Society dedicated
a whole issue of its flagship publication, the IEEE Signal Processing Magazine, to signal processing on multiple core platforms. In this issue, Slabaugh et al. demonstrated a 2- to 4time speedup of several popular image processing algorithms
on a quad-core machine using OpenMP [25]. The demonstrated algorithms involve either pixel-wise processing (image warping, image normalisation) or small neighbourhoodwise processing (binary morphology, median filtering). All
of these algorithms generate the output at each pixel independently of those at other output pixels. As a result, they
are naturally extendable to parallel implementation. This type
of data-independent task parallelisation can even be done automatically by a compiler [11]. Parallel implementation of
sequential-based image processing algorithms, however, still
requires manual adaptation by an experienced programmer.
In this paper, we present a parallel implementation of
Geodesic Distance Transform (GDT) using OpenMP. GDT
accepts a greyscale cost image together with a set of seed
points. It outputs a distance transform image whose intensity
at each pixel is the geodesic distance from that pixel to a
nearest seed point. The geodesic distance between two points
is the sum of pixel costs along a minimum-cost path connecting these two points. The nearest seed mapping forms an
over-segmentation of the input image [18, 29]. Fast image
segmentation is the main reason why a parallel implementation of GDT is desirable [10, 2, 7, 28]. There are two main
approaches to GDT estimation: a chamfer distance propagation algorithm [15] and a wavefront propagation algorithm
[27]. Both algorithms are sequential in nature, i.e. they are
not directly parallelisable. The chamfer algorithm was selected for parallelisation in this paper due to its simple raster
scan access over the image data.
The rest of the paper is organised as follows. Section 2
provides some background on GDT and the chamfer distance
propagation algorithm. Section 3 reviews previous attempts
in the literature to parallelise (Euclidean) distance transform.
Our proposed parallel implementation of GDT is presented
in Section 4. Section 5 evaluates the speed and accuracy of
our parallel implementation on different images and different computers. Section 6 presents an application of GDT in
superpixel segmentation of images. Section 7 concludes the
paper.
2. destination
1
10
9
0.8
8
7
0.6
source
0.4
6
seed
5
4
a) forward propagation
b) backward propagation
3
0.2
2
minimum path, cost = 1.7
straight path, cost = 11.1
1
0
a) cost image f (x, y)
b) geodesic distance transform
Fig. 1. Minimum-cost path versus straight path on an uneven
cost surface generated by the membrane function in Matlab.
2. BACKGROUND ON GEODESIC DISTANCE
TRANSFORM
Geodesic distance or topographical distance [16] is a greyweighted distance between two points on a greyscale cost surface. The geodesic distance is calculated as the sum of pixel
costs along a minimum-cost path joining the two points. An
example is illustrated in Figure 1a, where the image intensities f (x, y) represent the cost of traversing each pixel. Two
different paths from a source point in the middle of the image
to a destination point at the top-right corner are drawn. The
minimum cost path in dotted cyan line, despite being a longer
path, integrates over a smaller total cost than the straight path
in magenta (1.7 versus 11.1). The cost image f can be seen
as a terrain surface, where the red blob corresponds to a high
mountain. Figure 1a basically illustrates that going across a
steep mountain incurs a much higher cost than going around
its flat base to reach the other side. Figure 1b shows the GDT
of the image in Figure 1a given one seed point at the centre of
the image. The intensity of each pixel represents the geodesic
distance from that pixel to the central seed point.
2.1. Chamfer distance propagation algorithm
GDT can be estimated efficiently using chamfer distance
propagation [21]. The path between two pixels is approx√
imated by discrete line segments of 1- or 2-pixel length
connecting a pixel with one of its eight immediate neighbours. Initially, the distance transform at every pixel is set to
infinity except at locations of the seed points where the distance transform is zero. The distance transform at every pixel
is then updated by an iterative distance propagation process.
Each iteration comprises two passes over the image. A forward pass scans the image rows from top to bottom, each row
is scanned from left to right (Figure 2a). A backward pass
scans the image rows from bottom up, each row is scanned
from right to left (Figure 2b).
The forward pass propagates the distance transform of
four causal neighbours (shaded grey in Figure 2a) to the cur-
Fig. 2. One iteration of distance propagation comprises of a
forward pass followed by a backward pass.
rent pixel P (x, y) according to equation (1):
d(x − 1, y − 1) + bf (x, y)
d(x, y − 1) + af (x, y)
d(x, y) = min
d(x + 1, y − 1) + bf (x, y)
d(x − 1, y) + af (x, y)
(1)
√
where a = 0.9619 ≈ 1 and b = 1.3604 ≈ 2 are optimal chamfer coefficients for a 3×3 neighbourhood [4]. Similarly, the backward pass propagates the distance transform
from four anti-causal neighbours (shaded grey in Figure 2b)
to the current pixel P (x, y) according to equation (2):
d(x + 1, y + 1) + bf (x, y)
d(x, y + 1) + af (x, y)
d(x, y) = min
d(x − 1, y + 1) + bf (x, y) (2)
d(x + 1, y) + af (x, y)
Equations (1) and (2) apply to pixels which have a full
set of 8 immediate neighbours. Pixels at image border need
a different treatment because some of the neighbours are out
of bound. These out-of-bound neighbours are ignored in the
distance propagation equations (1) and (2).
2.2. Example
An example of GDT given more than one seed points is given
in Figure 3. Figure 3a-b show an input image and its gradient
energy, respectively. The gradient energy is used as a nonnegative cost image, from which the GDT is computed. Four
seed points are shown as circles of different colours in Figure 3b. Figure 3c-d show intermediate distance transforms
after a first forward and a first backward pass through the
cost image (blue=low distance, red=high distance). In the
first forward pass, the top-left region of the distance transform is not updated because these pixels do not have a seed
in their causal path. After the first backward pass, the distance transform gradually settles into its final form before
converging at the twentieth iteration (which looks very similar
to the GDT after 10 iterations in Figure 3e). Many iterations
are required because the minimum-cost paths are usually not
straight, they require multiple distance propagations from different directions. Fortunately, fewer iterations are required if
3. (a) Input (320×240)
(b) Gradient energy
Fig. 4. Image partitioning strategy for a parallel chamfer distance transform on a distributed system [24] (the distances of
shaded pixels are transmitted across processors).
(c) intermediate GDT
after 1st forward pass
(d) intermediate GDT
after 1st backward pass
(e) GDT after 10 iterations
(f) nearest seed label
after 1st forward pass
(g) nearest seed label
after 1st backward pass
(h) nearest seed label
after 10 iterations
Fig. 3. Geodesic distance transform and nearest seed label
computed from the gradient energy image with 4 seed points.
there are more seeds because the geodesic paths generally become shorter, hence do not contain many twists and turns.
The last row of Figure 3 shows the corresponding nearest seed labels of the intermediate distance transforms in the
second row. Each coloured segment corresponds to a set of
pixels with a common nearest seed point. Pixels with the
same coloured label should be connected because they are
connected to the common seed point via some geodesic paths.
Fragmentation happens on Figure 3g because this is an intermediate result. After the GDT converges, the segmentation
boundaries generally trace out strong edges in the scene (Figure 3h). This leads to a geodesic image segmentation algorithm to be presented later in Section 6.
3. LITERATURE SURVEY ON PARALLEL
DISTANCE TRANSFORM
Most previous techniques on parallel distance transform compute Euclidean Distance Transform (EDT) instead of GDT.
EDT accepts a binary image and returns the Euclidean distance from each pixel to a nearest nonzero pixel in the binary
image. EDT is a special case of GDT when the cost image
is constant and positive. A squared Euclidean distance r2
can be decomposed into two components x2 + y 2 , each of
which can be estimated independently using a Voronoi diagram of the nonzero pixels in the binary image [6]. A parallel
implementation of EDT using OpenMP on a 24-core system
achieves 18-time speedup [14]. A parallel implementation of
the chamfer EDT was presented by Shyu et al. in [24]. This
method computes the EDT on a distributed system. As a result, the intermediate results across different processors have
to be synchronised using Message Passing Interface (MPI).
Similar to the original chamfer algorithm in [21], Shyu et
al.’s implementation requires two passes over the image: a
forward pass to propagate the distance transform from causal
neighbours, followed by a backward pass to propagate the distance transform from anti-causal neighbours.
To parallelise these sequential passes, Shyu et al. partitions the input image into bands, the distance computation of
each band is assigned to a processor. At each processor, the
image band is further partitioned into parallelograms. The
label of each parallelogram in Figure 4 specifies its order of
processing (partitions n and n are processed concurrently).
Due to the propagation of causal information, the parallelogram labelled 3 on the second band must wait for the result
of the parallelogram labelled 2 on the first band. The EDT of
the last row of parallelogram 2 (shaded grey) must be transmitted to the next processor before parallelogram 3 can be
processed. After this first data transmission, processor 1 and
2 can work in parallel on its partition 3 and 3 , respectively.
This process of local distance propagation followed by data
transmission repeats for partition 4 and 4 and so on.
4. PARALLEL GEODESIC DISTANCE TRANSFORM
This section presents our parallel implementation of GDT using OpenMP. Our implementation is motivated by the parallel
implementation of the chamfer distance transform in [24].
Shyu et al.’s implementation, however, targets distributed
memory systems, in which data need to be synchronised
across processors by message passing. Using the shared
memory model present in multicore CPUs, we avoid the need
to synchronise data.
The iterative nature of GDT also allows a simpler image
partitioning strategy. Unlike EDT, GDT requires more than
one iterations of forward+backward passes. As a result, the
GDT can be propagated from one image band to the next in
a subsequent iteration rather than within the current pass like
in [24]. Our implementation therefore only uses a band-based
image partitioning across different processors. This fits well
with the parallel for construct in OpenMP.
4. Algorithm 1 Parallel chamfer distance transform (shaded
rows are compiler directives to enable parallel computation).
1 f o r ( i t e r = 0 ; i t e r <10; i t e r ++ )
2 {
.......
3
/ / Forward p r o p a g a t i o n
4
forwardPropagationFirstRow ( . . . ) ;
5
#pragma omp parallel for private( ... private variable declarations ... )
6
7
8
9
f o r ( i = 1 ; i <h e i g h t ; i ++ )
Fig. 5. Band-based image partitioning strategy for parallel
implementation of geodesic distance transform in OpenMP
(shaded pixels are visited in the current propagation iteration). 10
{ fwdProp ( . . . ) ; }
/ / Backward p r o p a g a t i o n
backwardPropagationLastRow ( . . . ) ;
#pragma omp parallel for private( ... private variable declarations ... )
11
f o r ( i = h e i g h t −2; i >=0; i − − ) { bwdProp ( . . . ) ; }
12 } / / End o f i t e r a t i v e c h a m f e r d i s t a n c e p r o p a g a t i o n
Figure 5 illustrates our band-based image partitioning
strategy for a forward propagation of the GDT. The first image row is processed by the master thread outside any parallel
processing block. The first row is treated differently from
the rest because pixels on the first row have only one causal
neighbour. The remaining image rows are partitioned into
non-overlapping bands of equal height (called chunk size in
OpenMP terminology). Each band is processed concurrently
by a different thread. If there are more bands than the total
number of threads, the unprocessed bands will be assigned to
threads in a round-robin fashion (static scheduling) or to the
next available thread (dynamic scheduling).
A pseudo code of the parallel implementation of GDT
in OpenMP is given in Algorithm 1. Details of the distance propagation are handled in the functions fwdProp(),
forwardPropagationFirstRow(),
bwdProp(),
and backwardPropagationLastRow(). This pseudo
code differs from a non-parallel implementation of GDT only
in the shaded lines, where a compiler directive appears just
before a standard for loop in C. This omp parallel
for directive tells the master thread to create a team of parallel threads to process the for loop iterations. When the team
of threads completes the statements in the for loop, they
synchronise and terminate, leaving only the master thread
running. This process is known as the fork-join model of
parallel execution [5].
One important requirement in parallel programming is the
parallel region must be thread-safe. In order words, each iteration of the for loop should be able to be executed independently without interaction across different threads (e.g.,
no data dependencies). In GDT, this means the distance propagation within one band should not wait for the result of the
previous band. Thread 2 on Figure 5, for example, should not
wait until Thread1 finishes the computation of band 1. This
means the GDT of band 1 is not propagated to band 2 within
the current iteration (it will be in the next iteration). To avoid
data dependencies and racing conditions , private variables
undergoing change within each thread should be declared in
the private clause of the parallel for directive.
Because the computed distances from one thread are not
used by other threads within the current iteration, it may
take longer for the GDT to propagate distances from the top
band to the bottom band and vice versa. However, given a
dense sampling of seed points, each seed point only has a
limited spatial range of influence. In other words, the distance transform at one pixel is never propagated for more
than a few bands away. The range of influence depends on
seed density and chunk size. In general, a few iterations of
forward+backward propagation (fewer than 30) are sufficient
for most cases.
5. EVALUATION
We compare three different implementations of chamferbased geodesic distance transform: non-parallel, parallel
using OpenMP with static scheduling (i.e. round-robin assignment of threads to iterations), and parallel using OpenMP
with dynamic scheduling (tasks are assigned to a next available thread). Given an input image, the cost image is computed from the gradient energy plus a constant regularisation
offset (e.g., the median gradient energy value), and the seeds
from local gradient minima. Low-amplitude random noise is
added to the cost image to produce envenly distributed local
minima even in flat image regions.
5.1. Task scheduling model and chunk size
OpenMP allows two main type of task scheduling: static
scheduling, where blocks of iterations are assigned to threads
in a round-robin fashion, and dynamic scheduling, where
the next block of iterations is assigned to the next available
thread. The size of each block, a.k.a the chunk size, is configurable. For static scheduling, the default chunk size is the
number of iterations (i.e. number of image rows in our case)
divided by the number of threads.
To compare different scheduling methods and chunk
sizes, we ran GDT on a 1936×1288 cost image (the gradient
energy of the image in Figure 9) with 1017 evenly distributed
seeds and measured the runtimes. The seeds were selected as
5. a)2.8GHz quad-core(8 threads) b)2.4GHz dual-core(2 threads)
Fig. 6. Runtime as a function of chunk size for different parallel implementations of GDT on a 2MP image with 1017 seeds
and roughly 30 iterations of distance propagation.
local minima of the cost image using non-maximum suppression (NMS) [19] with a suppression radius (i.e. minimum
separation distance) of 20 pixels. The GDT converges in 30
to 31 iterations for all runs with chunk size greater than 10.
The same experiment was carried out on two different machines: an Intel Xeon 2.8 GHz quad-core processor with 12
GB of RAM and Microsoft Visual Studio 2010 compiler, and
an Intel Core 2 Duo P9400 2.4 GHz dual-core processor with
4 GB of RAM and Microsoft Visual Studio 2005 compiler.
The runtimes on these two machines are plotted in Figure 6
for different chunk size, where each data point is averaged
over ten repeated runs.
Several conclusions can be drawn from Figure 6. There
is little difference in the runtimes of static and dynamic
scheduling (the red and blue lines). Both parallel implementations are significantly faster than the non-parallel implementation (green line). The speedup factor of parallel
versus non-parallel reaches a maximum of 2.6 times on a
quad-core machine and 1.3 times on a dual-core one. This
maximum speedup occurs at the default chunk size, which
is 1288/8=161 for the quad-core and 1288/2=644 for the
dual-core machine (there are eight threads on a quad-core
processor due to Intel’s hyper-threading technology). The
highest speed gain is also achieved at integer fractions (i.e.
1/2, 1/3, 1/4, ...) of the default chunk size. This is when the
total number of iterations (1288 image rows) is evenly distributed amongst all threads. In short, static scheduling with
default chunk size works best for GDT. This default chunk
size will therefore be used in all subsequent experiments.
5.2. Number of iterations until convergence
We now show that the number of distance propagation iterations depends on the density of seed points. As stated ealier,
the seed points are selected as local minima of the cost image
using non-maximum suppression. We varied the NMS radius
from 5 to 100 pixels, which results in a number of seed points
ranging from 14000 down to 30, respectively.
Figure 7a plots the number of distance propagation itera-
a) number of GDT iterations b) speedup on a quad-core CPU
Fig. 7. Number of iterations until convergence and speedup
factor as a function of number of seed points on a 2MP image.
tions versus the number of seed points for the same 2MP image used in the previous experiment. As the seeds get denser,
the minimum geodesic paths become shorter. Fewer iterations
are therefore required to propagate the GDT. If the seeds are
sparsely sampled (e.g. less than 1000 seeds for a 2MP image), the parallel implementations require more iterations to
complete the GDT compared to the non-parallel one. The reason for this has been mentioned at the end of Section 4. For
more than 500 seeds per mega-pixels, there is no difference
in the number of iterations for either parallel or non-parallel
implementations.
Because seed density affects the number of iterations, it
also affects the speedup factor. Figure 7b plots the speedup
factor of two parallel implementations over the non-parallel
one as a function of seed number. Similar to the experiment in the previous subsection, the runtimes are averaged
over ten identical runs to smooth out sudden glitches due to
the processors being summoned upon high-priority operating
system tasks. OpenMP implementations on a quad-core machine speed up GDT by a factor between 1.7 and 2.5. The
maximum speedup is achieved when there are 500 seeds per
mega-pixels (i.e. one seed for every 50×50 image block).
The speedup factor reduces slightly when there are more than
500 seeds per mega-pixels.
5.3. Runtime for different image sizes
This subsection investigates the runtime and speedup factor
of parallel GDT for different image sizes given the same seed
selection strategy. Ten images of different sizes ranging from
0.4 to 10 MP were chosen. For each image, the number of
seeds is set to a default value equal to the square root number of pixels. Adaptive NMS (crobust = 1) [3] is used on a
negated cost image to produce an exact number of seed points.
The runtime results are plotted in Figure 8, where the x-axis
specifies the square root of the total number of pixels in the
image (which is also the number of seed points or the image
width for square images).
Figure 8a shows that it takes less than half a second to
compute the GDT for a 3MP image. For a 10MP image, the
6. a) runtime
b) speedup factor
Fig. 8. Runtime and speedup factor for images of different
sizes on a 2.8GHz quad-core machine with 12GB of RAM.
runtime increases to 1.5 seconds. The runtime is linearly proportional to the number of pixels in the image (quadratically
proportional to the image width as shown in Figure 8a). However, the runtime is image-content dependent as suggested by
the two data points around an image width of 1500. Despite
having a similar number of pixels, a 1936×1288 image took
0.28 seconds to compute its GDT, while a 1842×1380 image
took 0.42 seconds (under static scheduling).
Figure 8b shows the speedup factor of two parallel implementations over the non-parallel one. Once again, the
speedup is image-content dependent. For 0.5MP images, the
speedup factor ranges from 1 to 3 times. As the images get
bigger, the speedup factor range shrinks to between 2 to 2.5
times. This variation is due to the different complexity of
edges in each image.
6. APPLICATION: SUPERPIXEL SEGMENTATION
A superpixel is a group of connected pixels sharing some
common properties such as intensity, colour or texture [20]. A
useful superpixel segmentation partitions the image into regularly sized and shaped superpixels (i.e. close to round) that respect scene boundaries. This type of segmentation facilitates
edge-preserving image processing because the processing can
be done on individual superpixels, which do not include pixels across differently textured regions.
As mentioned earlier, GDT produces a label image, in
which each pixel is associated with its nearest seed label
(nearest in term of geodesic distance). Pixels with a common
nearest seed are connected; together they form a superpixel.
Using the strategy mentioned at the beginning of Section 5,
where the cost image is the input image’s gradient energy
plus a small offset and the seed points are its local minima,
the input image can be segmented into geodesic superpixels.
To make the superpixels’ shapes more regular, we moved
each seed point to its superpixel centroid [8] and rerun the
geodesic distance transform. An example of segmentation of
a 2MP image into 1000 superpixels using 3 iterations of seed
recentroiding, each with 10 iterations of distance propagation
is given in Figure 9. Cyan lines denote the superpixel bound-
Fig. 9. 1000 geodesic superpixels on a 1936×1288 image.
aries, and yellow dots denote the recentroidal seed points.
The superpixel boundaries closely follow strong edges in
the image. Note that these superpixels are not designed to
cover every edge in the image, especially edges in highly textured areas. This is because geodesic superpixels are grown
from well-separated seed points. They do not shrink to fit
arbitrarily small regions commonly found in fine textures.
We compared our superpixel segmentation result on a
968×644 image in Figure 10 against eight other segmentation methods:
• Watershed [16] with shallow region removal using
Mathworks’ Image Processing Toolbox (watershed
and imhmin) and small region removal using our own
Matlab implementation
• FH, i.e. graph-based segmentation [9], using a C implementation from the authors 1
• Quickshift [26] using a C implementation from VLFeat2
• Entropy rate [13] using C/MEX code from the authors3
• Centroidal Voronoi Tessellation (CVT) [8] using our
own Matlab implementation
• Superpixel lattices [17] using a C/MEX implementation from the authors 4
• SLIC superpixels [1] using a command line Windows
executable from the authors 5
1 FH:
http://people.cs.uchicago.edu/˜pff/segment/
http://www.vlfeat.org/index.html
3 Entropy rate: http://www.umiacs.umd.edu/ mingyliu/
˜
4 Superpixel lattices: http://web4.cs.ucl.ac.uk/research/
vis/pvl/index.php?option=com_content&view=
article&id=76:superpixel-lattices-code&catid=49:
downloads&Itemid=62
5 SLIC: http://ivrg.epfl.ch/supplementary_material/
RK_SLICSuperpixels/index.html
2 Quickshift:
7. (a) SLIC superpixels
(4.6 seconds)
(b) geodesic superpixels (0.64 second)
(c) TurboPixels (207
seconds)
Fig. 11. Comparison of 3 superpixel segmentation methods
(runtime was measured on the full 2MP image in Figure 9).
Fig. 10. Results of 9 different superpixel segmentation methods on a 968×644 image (images are ordered as in the table,
# denotes number of superpixels returned by the method).
• TurboPixels [12] using a Matlab implementation from
the authors 6
Default parameters were used for all methods, except for:
• FH: min area for region merging was tuned (=22) to
produce a desired number of segments
• Quickshift: maxdist was tuned (=13) to produce a
desired number of segments
• SLIC: spatial weight = 5 was chosen instead of
10 (default) for better edge-following superpixels
The results in Figure 10 show that only SLIC, TurboPixels and our method produces regular superpixels that follow
scene boundaries. Watershed produces a good edge-following
segmentation that rivals the recent graph-based and meanshift techniques. Entropy rate superpixel segmentation produces irregular segments around flat image areas. CVT is
regular but does not follow image edges. Superpixel lattice
produces blocky segmentation.
A close-up comparison of three methods that produces
the most edge-following regular superpixels is given in Figure 11. SLIC superpixels follow edges well but have jaggy
boundaries around textured areas. Our method produces the
most regular and edge-following superpixels visually. TurboPixels produces more regular superpixels than SLIC but it
misses some strong edges. Geodesic superpixel segmentation
is also the fastest methods amongst the three presented. Ours
is one order of magnitude faster than SLIC and two orders of
6 TurboPixels:
research.html
http://www.cs.toronto.edu/˜babalex/
magnitude faster than TurboPixels using executables from the
corresponding authors. This speed advantage is partially due
to the parallel GDT implementation on a quad-core machine.
We also evaluate all nine superpixel methods using two
measures of superpixel regularity. To measure size regularity,
the standard deviation of all superpixels’ areas is used. We
normalised the standard deviation by the averaged superpixel
area to yield a unit-free measure. The smaller the normalised
standard deviation of superpixel size is, the better. To measure
shape regularity, we used a modified version of the isoperimetric quotient in [22]. The isoperimetric quotient is inverted
so that smaller measure means more regular shape. This inverted isoperimetric quotient is computed as the ratio of su√
perpixel Perimeter over the square root of its Area (P/ A).
We averaged this ratio over all superpixels to achieve a single
√
shape measure per method. The P/ A ratio has a theoretical
√
lower bound of 2 π ≈ 3.54 for a circular segment. However,
this lower bound is never achieved since circles by themselves
cannot form a 2D tessellation. Known tessellations such as
√
hexagonal and square grid have an average P/ A ratio of
√
8 3 ≈ 3.72 and 4, respectively.
Figure 12 compares the size and shape regularity of the
superpixels shown in Figure 10 over the whole image. As
expected, CVT produces the smallest area deviation and average ratio. Irregular segmentation methods such as Watershed, FH and QuickShift, on the other hand, produce large
values for both measures. Of the three edge-following superpixel methods, SLIC produces the most regular size but least
regular shape superpixels, TurboPixel produces the most regular shaped but least regular size superpixels. Our geodesic
method achieves a balance between size and shape regularity.
7. CONCLUSION
We have shown that the sequential chamfer algorithm for
computing geodesic distance transform can be modified
for parallel implementation on multicore processors using OpenMP. The parallel implementations yield an exact
GDT using a slightly higher number of iterations than a
non-parallel implementation. However, the overall speed is
increased if the parallel implementations are run under a multicore processor. A speedup factor of 1.3 is achieved for a
dual-core machine and 2.6 for a quad-core machine. When
8. Fig. 12. Comparison of superpixel regularity from different
methods (smaller is better).
applied to a gradient energy image with evenly distributed
seeds, GDT can segment an image into regularly sized and
shaped superpixels. Our geodesic superpixel segmentation
produces regularly edge-following superpixels at a faster
speed than many state-of-the-art methods.
8. ACKNOWLEDGMENT
The author would like to thank Khanh Doan and Ernest Wan
for reviewing an earlier version of this paper.
9. REFERENCES
[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and
S. S¨ sstrunk, “SLIC superpixels compared to state-of-the-art
u
superpixel methods,” PAMI, 34(11):2274–2282, 2012.
[2] X. Bai, and G. Sapiro, “A geodesic framework for fast interactive image and video segmentation and matting,” in Proc. of
ICCV, 2007, pp. 510–517.
[3] M. Brown, R. Szeliski, and S. Winder, “Multi-image matching
using multi-scale oriented patches,” in Proc. of CVPR, 2005,
pp. 510–517.
[4] M.A. Butt and P. Maragos, “Optimum design of chamfer distance transforms,” IEEE Trans. on Image Processing,
7(10):1477–1484, 1998.
[5] B. Chapman, G. Jost, and R. van der Pas, Using OpenMP:
Portable Shared Memory Parallel Programming, The MIT
Press, 2007.
[6] D. Coeurjolly and A. Montanvert, “Optimal separable algorithms to compute the reverse Euclidean distance transformation and discrete medial axis in arbitrary dimension,” PAMI,
29(3):437–448, Mar. 2007.
[7] A. Criminisi, T. Sharp, and A. Blake, “GeoS: Geodesic image
segmentation,” in Proc. of ECCV, 2008, pp. 99–112.
[8] Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoi
tessellations: Applications and algorithms,” SIAM Review,
41(4):637–676, Dec. 1999.
[9] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient graphbased image segmentation,” IJCV, 59(2):167–181, 2004.
[10] L. Grady, “Random walks for image segmentation,” PAMI,
28(11):1768–1783, 2006.
[11] Intel, “Automatic parallelization with Intel compilers,” in Intel
guide for developing multithreaded application. Intel Corporation, 2011.
[12] A. Levinshtein, A. Stere, K.N. Kutulakos, D.J. Fleet, S.J. Dickinson, and K. Siddiqi, “TurboPixels: Fast superpixels using
geometric flows,” PAMI, 31(12):2290–2297, 2009.
[13] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “Entropy rate superpixel segmentation,” in Proc. of CVPR, 2011,
pp. 2097–2104.
[14] D. Man, K. Uda, H. Ueyama, Y. Ito, and K. Nakano, “Implementations of parallel computation of Euclidean distance
map in multicore processors and GPUs,” in Proc. of the First
Int’l Conf. on Networking and Computing, 2010, ICNC ’10,
pp. 120–127.
[15] P. Maragos and M.A. Butt, “Curve evolution, differential morphology, and distance transforms applied to multiscale and
eikonal problems,” Fundamenta Informaticae, 41(1-2):91–
129, Jan. 2000.
[16] F. Meyer, “Topographic distance and watershed lines,” Signal
Processing, 38(1):113–125, July 1994.
[17] A.P. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones,
“Superpixel lattices,” in Proc. of CVPR, 2008.
[18] G. Peyr´ , M. P´ chaud, R. Keriven, and L.D. Cohen, “Geodesic
e
e
methods in computer vision and graphics,” Foundations and
Trends in Computer Graphics, 5(3-4):197–397, 2010.
[19] T.Q. Pham, “Non-maximum suppression using fewer than two
comparisons per pixel,” in Proc. ACIVS, 2010, pp. 438–451.
[20] X. Ren and J. Malik, “Learning a classification model for segmentation,” in Proc. of ICCV, 2003.
[21] A. Rosenfeld and J.L. Pfaltz, “Distance functions on digital
pictures,” Pattern Recognition, 1(1):33–61, 1968.
[22] A. Schick, M. Fischer, and R. Stiefelhagen, “Measuring and
evaluating the compactness of superpixels,” in Proc. of ICPR,
2012, pp. 930–934.
[23] J. Shalf, J. Bashor, D. Patterson, K. Asanovic, K. Yelick,
K. Keutzer, and T. Mattson, “The manycore revolution: Will
HPC lead or follow?,” SciDAC Review, 14:40–49, 2009.
[24] S.J. Shyu, T.W. Chou, and T.L. Chia, “Distance transformation
in parallel,” J. of Informatics & Electronics, 1(1):43–54, 2006.
[25] G. Slabaugh, R. Boyes, and X. Yang, “Multicore image
processing with OpenMP,” Signal Processing Magazine,
27(2):134–138, 2010.
[26] A. Vedaldi and S. Soatto, “Quick shift and kernel methods for
mode seeking,” in Proc. of ECCV (4), 2008, pp. 705–718.
[27] B.J. Verwer, P.W. Verbeek, and S.T. Dekker, “An efficient uniform cost algorithm applied to distance transforms,” PAMI,
11(4):425–429, 1989.
[28] P. Wang, G. Zeng, R. Gan, J. Wang, and H. Zha, “Structuresensitive superpixels via geodesic distance,” IJCV, 103(1):1–
21, 2013.
[29] G. Zeng, P. Wang, J. Wang, R. Gan, and H. Zha, “Structuresensitive superpixels via geodesic distance,” in Proc. of ICCV,
2011.