SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
Moving Cast Shadow Detection using Physics-based Features

                                        Jia-Bin Huang and Chu-Song Chen
                       Institute of Information Science, Academia Sinica, Taipei, Taiwan
                                 jbhuang0604@gmail.com, song@iis.sinica.edu.tw



                         Abstract                                ground regions is reduced. Hence, shadow points are ex-
                                                                 pected to have lower luminance but similar chromaticity
   Cast shadows induced by moving objects often cause se-        values.
rious problems to many vision applications. We present in           There have been many works dedicated to detecting
this paper an online statistical learning approach to model      cast shadows. Most of them are based on the assumption
the background appearance variations under cast shadows.         that shadow pixels should have lower luminance and the
Based on the bi-illuminant (i.e. direct light sources and am-    same chrominance as the corresponding background (i.e.
bient illumination) dichromatic reflection model, we derive       the RGB values of shadow pixels will fall on the line be-
physics-based color features under the assumptions of con-       tween the illuminated value and the origin in the RGB color
stant ambient illumination and light sources with common         space). This linear attenuation property has been employed
spectral power distributions. We first use one Gaussian           in different colors spaces like RGB [2], HSV [1], YUV [12],
Mixture Model (GMM) to learn the color features, which           and c1 c2 c3 [11]. Besides, other shadow-induced features
are constant regardless of the background surfaces or il-        like edge or gradient information extracted from the spatial
luminant colors in a scene. Then, we build up one pixel-         domain have also been used to detect cast shadows [14, 16].
based GMM for each pixel to learn the local shadow fea-          The major limitation of these algorithms is that they often
tures. To overcome the slow convergence rate in the con-         require explicit tuning of a large set of parameters for each
ventional GMM learning, we update the pixel-based GMMs           new scene. Thus, they are inappropriate for on-line applica-
through confidence-rated learning. The proposed method            tions.
can rapidly learn model parameters in an unsupervised
                                                                    To adapt to environment changes, statistical learning-
way and adapt to illumination conditions or environment
                                                                 based approaches have been developed to learn and remove
changes. Furthermore, we demonstrate that our method is
                                                                 cast shadows [9, 5, 4]. However, the linear proportionality
robust to scenes with few foreground activities and videos
                                                                 assumption may not always hold in a real-world environ-
captured at low or unsteady frame rates.
                                                                 ment. For instance, in an outdoor scene, the light sources
                                                                 may consist of direct sunlight, diffused light scattered by the
                                                                 sky, and other colored light from nearby surfaces (i.e. color
1. Introduction                                                  bleeding). These light sources may have different spectral
   Extracting moving objects from video sequences is at the      power distributions (SPDs). Therefore, the RGB values of
core of various vision applications, including visual surveil-   a shadow pixel may not attenuate linearly.
lance, contend-based video coding, and human-computer               Little attention has been paid to the non-proportionality
interaction, etc. One of the most challenging problems of        attenuation problem before. Nadami and Bhanu [8] ad-
extracting moving objects is detecting and removing mov-         dressed the non-linearity by using a dichromatic reflection
ing cast shadows. When performing background subtrac-            model to account for both the sun and the sky illumina-
tion, cast shadows are often misclassified as parts of fore-      tions in an outdoor environment. Recently, a more general
ground objects, distorting the estimation of shape and color     shadow model was presented in [6], which introduced an
properties of target objects. The distortion caused by cast      ambient illumination term that determines the direction in
shadows may hinder subsequent vision algorithms, such as         the RGB color space along which the shaded background
tracking and recognition.                                        values can be found. Since the ambient term may have a
   Cast shadows are caused by the occlusion of light             different SPD from the incident light sources, the values of
sources. When foreground objects cast shadows on back-           shadow pixels may not decrease proportionally. Nonpara-
ground surfaces, the light sources are partially or entirely     metric density estimation was used to model surface varia-
blocked, and thus the total energy incident at the back-         tion under cast shadows in an unsupervised way. By provid-
ing a better description of cast shadows, the shadow model       2. Physics-Based Shadow Model
in [6] provided improved performance over the previous ap-
proaches which used linear models.                               2.1. Bi-illuminant Dichromatic Reflection Model
                                                                    There are three terms in the Shafer’s model [13]: body
    However, these learning-based approaches [9, 5, 4, 6]        reflection, surface reflection, and a constant ambient term.
may suffer from insufficient training samples since the sta-      Each of the two reflection types can be decomposed into
tistical models are learned from background surface varia-       chromatic and achromatic parts: 1) composition: a relative
tion under cast shadows. Unlike obtaining samples in ev-         SPD cb or cs which depends only on wavelength, 2) mag-
ery frame in background modeling, shadows may not ap-            nitude: a geometric scale factor mb or ms which depends
pear at the same pixel in each frame. A single pixel should      only on geometry. Given a scene geometry, the radiance in
be shadowed many times till its estimated parameters con-        the direction (θe , φe ) can be expressed as
verge, while the illumination conditions should be stable.
                                                                 I(θe , φe , λ) = mb (θe , φe )cb (λ)+ms (θe , φe )cs (λ)+cd (λ),
Therefore, this kind of pixel-based shadow models require
                                                                                                                              (1)
a longer period of training time when foreground activities
                                                                 where cd is the constant ambient term.
are rare. This problem becomes more serious when video
                                                                    While this model included a term to account for ambi-
sequences are captured at a low or unsteady frame rate that
                                                                 ent illumination, the model did not separate it into body and
depends on the transmission conditions.
                                                                 surface reflection. Recently, Maxwell et al. [7] proposed
                                                                 the BIDR model, which contains four terms: two types of
    In this paper, we characterize cast shadows with “global”    reflection for both the direct light sources and ambient illu-
parameters for a scene. Based on the bi-illuminant dichro-       mination. Then, the BIDR model is of the form:
matic reflection model (BIDR) [7], we first derive nor-
malized spectral ratio as our color features under the as-         I(θe , φe , λ) = mb (θe , φe , θi , φi )cb (λ)ld (θL , φL , λ)          (2)
sumptions of constant ambient illumination and direct light                 +ms (θe , φe , θi , φi )cs (λ)ld (θL , φL , λ)
sources with a common SPD. The normalized spectral ra-
tio remains constant regardless of different background sur-         +cb (λ)            mb (θe , φe , θi , φi )la (θL , φL , λ)dθi dφi
                                                                               θi ,φi
faces and illumination conditions. We then model the color
features extracted from all moving pixels using a single            +cs (λ)             ms (θe , φe , θi , φi )la (θL , φL , λ)dθi dφi ,
Gaussian Mixture Model (GMM). To further improve the                           θi ,φi
differentiating ability for cast shadows having similar col-
ors to background, we use a pixel-based GMM to describe          where (θL , φL ) is the direction of the direct light source
the gradient intensity distortion for each pixel. We update      relative to the local surface normal, and (θe , φe ) and (θi , φi )
the pixel-based GMMs using the confidence predicted from          are the angles of emittance and incidence, respectively. In
the global GMM through confidence-rated learning to ac-           this reflection model, the range of mb , ms , cb , and cs all
celerate convergence rates. Contributions are presented in       lie in [0, 1] (we refer reader to [7] for further details of the
two key aspects. Firstly, with the global shadow model           deviation of BIDR model.)
learned from physics-based features, our approach does not           With a specific geometry and representing the two am-
require numerous foreground activities or high frame rates       bient integrals as Mab (λ) and Mas (λ), we can simplify the
to learn the shadow model parameters. This makes the pro-        BIDR model to
posed method more practical than existing works using only
                                                                               I(λ) = cb (λ)[mb ld (λ) + Mab (λ)]                          (3)
pixel-based models. Secondly, the proposed confidence-
rated learning can be used for fast learning of local features                   +cs (λ)[ms ld (λ) + Mas (λ)].
in pixel-based models. We provide a principled scheme for
the local and global features to collaborate with each other.        Considering only matte surfaces, we can ignore the lat-
                                                                 ter part of (3). To describe the appearance of cast shadows
                                                                 on a background surface, we multiply the direct illumina-
    The remainder of this paper is organized as follows.         tion with an attenuation factor α ∈ [0, 1], which indicates
We briefly describe in Section 2 the dichromatic reflection        the unoccluded proportion of the direct light. We assume
model [13] and its extension BIDR. In Section 3, we present      that all direct light sources have a common SPD with dif-
the proposed learning approach. The posterior probability        ferent power factor and the ambient illumination is constant
of cast shadows and foreground are developed in Section 4.       over lit and shaded regions (see Fig. 1). This gives us the
Both visual and quantitative results are shown in Section 5      simplified form of the BIDR model
to verify the performance of our method and the robustness
to few foreground activities. Section 6 concludes this paper.              I(λ) = αmb cb (λ)ld (λ) + cb (λ)Mab (λ).                        (4)
(a)                                                              (e)


Figure 1. The contribution of all direct light sources and ambient
illuminance. The shadow values SD are expected to be observed
along the line between background value BG and the constant
ambient term BGA . Note that the shadow values do not necessary
to be proportional to the direction of background values.             (b)                                                              (f)


   The camera sensor response gi at the pixel level
can be obtained through the spectral projection gi =
  Fi (λ)I(λ)dλ, where Fi (λ), i ∈ {R, G, B} is the sensor
spectral sensitivities and λ denotes the wavelength. By ap-
                                                                      (c)                                                              (g)
plying the linearity of spectral projection, we have

        gi = αFi mb ci ld + Fi ci Mab , i ∈ {R, G, B}
                     b
                        i
                                b
                                   i
                                                              (5)

The formulation of gi defines a line segment in the RGB
color space varied between two ends: shadowed pixel (α =
0) to fully lit pixel (α = 1).
                                                                      (d)                                                              (h)
2.2. Extracting Useful Features                                      Figure 2. The color feature value distribution in various environ-
                                                                     ments and illumination conditions. (a)-(d) Frame with cast shad-
2.2.1 Spectral Ratio                                                 ows. (e)-(h) The corresponding feature value distribution in the
To extract color features that are constant and independent          S1 S2 S3 space. Note that the feature values extracted from differ-
                                                                     ent background surface generally follow a line.
to different background surfaces, we need to identify mea-
surements that are invariant to illumination attenuation fac-
tor α, geometry shading factor mb , and the chromatic aspect             To address this problem, we introduce γ        =
of body reflection ci . We calculate the ratio of illuminants
                     b                                               [γR , γG , γB ]T by subtracting each element of S by
to be the spectral ratio S = [SR , SG , SB ]T using                    α
                                                                     1−α :
                                                                                                           i
                                                                                              α          Mab
         SDi       αFi mb ci ld + Fi ci Mab
                           b
                              i
                                      b
                                           i
                                                                                   γi = Si −       =              .    (7)
Si =             =                    i li
                                                              (6)                            1−α                i
                                                                                                     (1 − α)mb ld
       BGi − SDi     (1 − α)Fi mb cb d
                           α         i
                                   Mab                               Similarly, we can obtain the normalized spectral ratio γ with
                                                                                                                            ˆ
                     =        +           i
                                            , i ∈ {R, G, B}.         higher accuracy by factoring out (1−α)mb through normal-
                         1 − α (1 − α)mb ld
                                                                     ization:
                                                                                                    i
If the shaded regions received only the ambient illumination                                     Mab        1
                                                                                      γi =
                                                                                       ˆ                i |γ|
                                                                                                                               (8)
(i.e. all direct light sources are blocked: α = 0), the first                                (1 − α)mb ld
                                    Mi
term in (6) disappears and Si = mbab . We can then derive
                                       i                                                             R         G        B
                                      ld                                            1              Mab 2    Mab      Mab
features invariant to mb by normalizing Si with its length              |γ| =                  (    R
                                                                                                       ) + ( G )2 + ( B )2          (9)
                                                                                (1 − α)mb          ld        ld       ld
|S| since the mb term can be extracted from the normaliza-
tion constant. However, this assumption does not hold in                We validate the proposed physics-based color feature by
real-world environments. Take an indoor scene as an exam-            observing its distributions from cast shadows in various en-
ple, where there are usually multiple light sources. When a          vironments and illumination conditions. Fig. 2 (a) and its
foreground object occludes one or some of the light sources,         reference background images are by courtesy of Maxwell
there is still energy from the remaining light sources inci-         et al. [7]. Fig. 2 (b)-(d) show frames from the benchmark
dent to this surface. Consequently, assuming the attenua-            sequences provided in the Prati et al.’s survey paper [10].
tion factor to be zero will induce bias in estimating the ratio      We manually label shadow pixels in the given images, and
between two illuminants.                                             then extract the spectral ratio Si , i ∈ {R, G, B} for each
pixel in shadow regions using the given image and the ref-
erence background models. The resultant feature distribu-
tions are presented in Fig. 2 (e)-(h). The feature values ex-
tracted from different background surfaces roughly follow
a straight line in the SR SG SB space. Therefore, we can
use the direction of the line as our color feature, which is
roughly the same for all shadow pixels. The direction of the
line in the SR SG SB space can be characterized with two
angles, the zenith and azimuth in the spherical coordinate,
which correspond to the normalized spectral ratio. From the
feature distributions in Fig. 2 (e)-(h), we also observe that
larger feature values tend to be unstable and deviate from       Figure 3. The weak shadow detector. The observation will be con-
the major direction of most feature values. This is because      sidered as potential shadow point if it falls into the gray area. The
there is not sufficient difference between the shaded and lit     weak shadow detector contains three parameters: maximum al-
pixel value to robustly measure the orientation. In addi-        lowed color shift, and minimal and maximal illumination attenua-
                     α
tion, the value of (1−α) in the scene can be easily estimated    tion.
by intersecting the fitted line with the line passing through
(1,1,1) and the origin.                                          the luminance values, the potential shadow values should
                                                                 fall into the conic volume around the corresponding back-
2.2.2 Gradient Intensity Distortion                              ground color. The weak shadow detector is illustrated in Fig
                                                                 3, where values of cast shadows are expected to fall into the
Now we have derived color features that are invariant to dif-
                                                                 gray conic region. Pixel values that fall into the gray conic
ferent background surfaces. This low dimensional (2D) fea-
                                                                 region are considered as potential shadow samples. These
tures, however, might fail to distinguish foreground with
                                                                 samples are then used to learn the global shadow model for
colors similar to background from cast shadows. Thus,
                                                                 the scene and the local shadow model for each pixel.
other shadow-induced properties like edge or gradient in-
formation may be used to further describe the background
appearance variation under cast shadows. In this paper, we       3.2. Global Shadow Model
just use a simple gradient intensity distortion as our local         Using the background surface invariant color features, a
features to demonstrate the improvement by incorporating         global shadow model is learned for the whole scene. Here,
additional local features.                                       we model the background color information by the well-
   For a given pixel p, we define the gradient intensity dis-     known GMM [15] in the RGB color space. For every frame,
tortion ωp as                                                    we obtain potential shadow points by applying the weak
                                                                 shadow detector on moving pixels, which are identified via
               ωp = | (BG)p | − | (F )p |,               (10)
                                                                 background subtraction. We then use one GMM to learn
where BG and F are luminance channels of the background          the normalized spectral ratio r = [γˆ , γˆ ]T in the scene.
                                                                                                 ˆ      R G

image and current frame, and (·) is the gradient operator.       Note that the reason why we only use two of the three di-
                                                                 mensional features is that the third component is redundant
                                                                 since γˆ 2 + γˆ 2 + γˆ 2 = 1. The normalized spectral ratio
                                                                        R       G     B
3. Learning Cast Shadows                                         in the whole scene is modeled by K Gaussian distributions
   In this section, we show how to build models for cast         with mean vector µk and full covariance matrix Σk . Then,
shadows in an unsupervised way. Here, we use GMM to              the probability of the normalized spectral ratio r is given
                                                                                                                   ˆ
learn the background surface variation over time. It is also     by:
possible to use other statistical learning method such as ker-                                 K
nel density estimation.                                                        p(ˆ|µ, Σ) =
                                                                                 r                  πk Gk (ˆ, µk , Σk ),
                                                                                                           r                     (11)
                                                                                              k=1
3.1. Weak Shadow Detector
                                                                 where µ, Σ denote all parameters of the K Gaussians, πk is
   To model the cast shadows, impossible shadow sam-             the mixing weight, and Gk is the kth Gaussian probability
ples that belong to background or foreground (e.g. color         distribution. We use the Expectation-Maximization (EM)
values that are the same as or brighter than background          algorithm to estimate the parameters in the GMM. The es-
values) should be excluded. Therefore, we apply a weak           timated parameters in the current frame are propagated to
shadow detector that evaluates every moving pixel to filter       next frame, so that the EM algorithm can converge quickly.
out some impossible samples. Since cast shadows reduce           Since the light sources are usually stable in the scene, we
α
find that it is sufficient for the estimated parameters to con-    obtained by subtracting 1−α from Si . From the feature dis-
verge with a single EM iteration at each frame.                  tribution of S, our aim is to find the location of point t such
                                                                 that t passes through both the lines passing through the ori-
3.3. Local Shadow Model                                          gin with direction vector (1,1,1) and the line that fits the
                                                                 observations S. This estimation can be achieved using the
    Besides using color features in the global shadow model,
                                                                 robust fitting method that is less sensitive than ordinary least
we build GMM for each pixel to learn the gradient intensity
                                                                 squares to large changes (outliers). In addition, we perform
distortion under cast shadows similar to background mod-                                                                     α
                                                                 the recursive linear regression to update the estimated 1−α
eling [15]. For a given pixel p, its gradient feature value ωp
                                                                 value adaptively. For simplicity, the attenuation factor α is
is sampled and learned whenever it is a potential shadow
                                                                 assumed the same for every pixel.
pixel. However, as we mentioned before, pixel-based mod-
els often suffer from insufficient training data because the
samples are not available at the pixel every frame.              4. Cast Shadow and Foreground Posterior
    To address this problem, we adopt the confidence-rated           Probabilities
learning to improve the convergence rate of the local model
                                                                    In this section, we present how to derive the posterior
parameters. The basic idea is that each sample is weighted
                                                                 probabilities of cast shadows and foreground given the ob-
with different importance computed from the global shadow
                                                                 served sample xp in the RGB color space by using the pro-
model. For example, if we update the model using a poten-
                                                                 posed global and local shadow models.
tial shadow point whose color features matched with the
global shadow model, then we think this sample is rela-          4.1. Cast Shadow Posterior
tively more important than others. In this way, the learning
process of the local shadow model is guided by the global           The shadow posterior is first computed by decomposing
shadow model.                                                    P (SD|xp )) over the (BG, FS) domain, where F S indicates
                                                                 moving pixels (real foreground and cast shadows). Since
3.4. Confidence-Rated Gaussian Mixture Learning                   P (SD|xp , BG) = 0, the decomposition gives
    We present an effective Gaussian mixture learning algo-               P (SD|xp ) = P (SD|xp , F S)P (F S|xp ),          (14)
rithm to overcome some drawbacks in conventional GMM
learning approach. Let ρπ and ρG be the learning rates for       where P (F S|xp ) = 1 − P (BG|xp ) can be di-
the mixing weight and the Gaussian parameters (means and         rectly computed from the background model.              Sec-
covariances) in the local shadow model, respectively. The        ond, we remove pixels that are definitely foreground
updating scheme follows the the formulation of the combi-        (i.e. pixels that are rejected by the weak shadow de-
nation of incremental EM learning and recursive filter [3]:       tector) and consider only potential shadow points (PS):
                                                                 P (SD|xp , F S) = P (SD|xp , F S, P S). Then, we decom-
                          1 − ρdef ault                          pose P (SD|xp , F S, P S) into two parts: Na , and Nna ,
         ρπ = C(ˆ ) ∗ (
                γ              K
                                          ) + ρdef ault   (12)
                               j=1 cj                            which stand for color features that are associated with the
                          1 − ρdef ault                          normalized spectral ratio or not, respectively. If the color
         ρG = C(ˆ ) ∗ (
                γ                       ) + ρdef ault ,   (13)   features do not associate with the working states of the
                               ck
                                                                 GMM, then the probability of belonging to shadow equals
where ck is the number of matches of the kth Gaussian state,     to zero. Therefore, we have
and ρdef ault is a small constant, which is 0.005 in our ex-
periments. The two types of learning rates are controlled           P (SD|xp , F S, P S) = P (SD|xp , F S, P S, Na )∗       (15)
by a confidence value C(ˆ ), which indicates how confident
                          γ                                                          P (Na |xp , F S, P S)
the sample belongs to the shadow. Observations with higher
confidence will then converge faster than those with lower        Here, the gradient intensity distortion ωp and the color fea-
confidence.                                                       ture γp are the sufficient statistics for xp in the first and sec-
                                                                      ˆ
                                                                 ond part of (15). The posterior probability of cast shadow
3.5. Attenuation Factor Estimation                               can thus be computed.
   From the RGB values we observed in the current frame
                                                                 4.2. Foreground Posterior
and the reference background image, we can only compute
the value of S, which may introduce bias in estimating nor-          Computing foreground posterior probability is much eas-
malized spectral ratios. Therefore, the estimation of atten-     ier. Given a pixel p, we first compute the background pos-
uation factor α is required for accurate shadow modeling.        terior P (BG|xp ) from the background model. Then, we
We can see that in (7) the value of γi , i ∈ {R, G, B} is        can obtain shadow posterior probability P (SD|xp ) using
the learned shadow models. Based on the probability the-      5. Experimental Results
ory, the foreground posterior can be obtained as:
                                                                  We present the visual results from challenging video se-
     P (F G|xp ) = 1 − P (BG|xp ) − P (SD|xp ).        (16)   quences captured in various environments, including both
                                                              indoor and outdoor scenes. We also compare the quantita-
                                                              tive accuracy of the proposed method in several videos with
4.3. Summary
                                                              other approaches when the results are available. Previous
   Algorithms 1 and 2 summarize in pseudocode the learn-      approaches using statistical model have higher success rate
ing and detection processes of the proposed algorithm. Our    in detecting cast shadow when numerous foreground activi-
method can be attached to other moving object detection       ties are present. However, we show that our method can deal
programs as an independent module. Shadow detection pro-      with the situation that cast shadows first appear in complex
cess is only applied to moving pixels detected by the back-   scenes and unknown illumination conditions as well as rare
ground model and the learning process occurs only when        foreground activity.
these moving pixels are considered as shadow candidates.
Consequently, the proposed algorithm is practical, it does    5.1. Qualitative Results
not introduce heavy computational burden and can work ef-
fectively to detect shadows.                                      In Figure 4, we show sample cast shadow detection re-
                                                              sults from four video sequences. The first three sequence:
                                                              Laboratory, Intelligent Room, and Highway I are part of the
 Algorithm 1: Learning Process
                                                              benchmark sequences for validating shadow detection al-
  At time t,                                                  gorithm. The last one Hallway is taken from [6]. Figure
  for each pixel p in the frame do                            4 (a) shows one frame selected from the video, where cast
      if P (BG|xt (p)) < 0.5 then                             shadows are present in the scene. The background poste-
          if pixel p satisfies shadow property then            rior probability is presented in Figure 4(b), where the dark
               -Compute normalized spectral ratio γp
                                                   ˆ          region indicates the less probability of belonging to back-
               -Compute gradient intensity distortion ωp      ground. From Figure 4(c)(d), we show the confidence map
               -Update local shadow model at pixel p          of global shadow model and the posterior probability of cast
               using confidence value C(ˆ ) through
                                        γ                     shadows, respectively. In Fig. 4(e) we show the probability
               confidence-rated learning                       values of belonging to foreground objects. We can see that
          end                                                 in these video sequences, the proposed algorithm is capa-
      end                                                     ble of detecting cast shadows without misclassifying fore-
  end                                                         ground as shadows. Note that in the second video, Intelli-
  Run one EM iteration to estimate the parameters of          gent Room, the man just walks in the room by once. Thus,
  global shadow model using the collected color               there is no chance for the pixel-based shadow model to learn
  features.                                                   its parameters. The use global shadow model enables us to
                                                              detect shadows first appear in the scene.
                                                                  To verify the effectiveness of the proposed method, the
                                                              results presented here are raw data and without any post-
 Algorithm 2: Detection Process                               processing. We can obtain binary results simply with
  At time t,                                                  thresholding the foreground posterior values P (F G|xp ).
  for each pixel p ∈ P do                                     The posterior probabilities can also be incorporated with
      -Obtain background posterior P (BG|xp ) from            context model that use spatial and temporal coherence to
      background modeling                                     improve the segmentation accuracy.
      -Compute shadow posterior P (SD|xp )(eq. 14)
      -Compute foreground posterior P (F G|xp )(eq. 16)       5.2. Quantitative Results
      if P (F G|xp ) > P (SD|xp )&P (F G|xp ) >
      P (BG|xp ) then                                            The quantitative evaluation follows the method proposed
          Label pixel p as foreground                         by Prati et al. [10]. There are two defined metrics for eval-
      else                                                    uating the performance of cast shadow detection algorithm:
          Label pixel p as background                         shadow detection rate η and shadow discrimination rate ξ.
      end                                                     The formulations of the two metrics are as follows:
  end
                                                                                  T PS             T PF
                                                                        η=                ;ξ =             ,         (17)
                                                                             T P S + F NS      T PF + F NF
(a)                     (b)                     (c)                     (d)                     (e)
Figure 4. Sample visual results of detecting cast shadows in various environment. (a) Frame from video sequence. (b) Background posterior
probability P (BG|xp ). (c) Confidence map predicted by the global shadow model. (d) The shadow posterior probability P (SD|xp ). (e)
The foreground posterior probability P (F G|xp ).




         (a)                    (b)                    (c)




         (d)                    (e)                    (f)               Figure 6. Quantitative results of the Intelligent Room sequence.
Figure 5. The effect of confidence-rated Gaussian mixture learn-          Shadow detection and shadow discriminative rate are calculated
ing. The mean maps of local shadow model are taken at the 100th          under different frame rates settings.
frame (the first row) and 1000th frame (the second row) (a)(d)The
background image. (b)(e) The mean map of the most important
Gaussian in the mixture with confidence-rated learning. (c)(f)The         5.3. Fast Learning of local Shadow Model
mean map without using confidence-rated learning.
                                                                            We demonstrate the effect of using confidence-rated
                                                                         learning in Figure 5. In this experiment, we learn the local
                                                                         shadow model in two traffic scenes: Highway I and High-
                                                                         way II. Figure 5(a)(d) show the background model of these
where the subscript S stands for shadow and F for fore-                  two outdoor scene. With confidence-rated learning, we ob-
ground, and T P and F N denote true positive and false neg-              tain the mean value of the most important Gaussian (i.e.
ative, respectively. The T PF is the number of ground-truth              with highest mixing weight and smallest variance) in Fig-
points of the foreground objects minus the number of points              ures 5(b)(e). We can see that the local models of gradient
detected as shadows, but belonging to foreground objects.                intensity distortion under cast shadows are well constructed.
                                                                         On the other hand, if we learn the local shadow model
   We show the quantitative results in Table 1. Note that                following conventional Gaussian mixture learning method,
results of other’s approaches are taken directly from [6][4].            then we obtain the results in Figure 5(c)(f), in which the
Table 1. Quantitative results on surveillance sequences
                                 Sequence          Highway I         Highway II           Hallway
                                   Method         η%     ξ%          η%    ξ%           η%      ξ%
                                  Proposed       70.83 82.37        76.50 74.51        82.05 90.47
                                 Kernel [6]      70.50 84.40        68.40 71.20        72.40 86.70
                                   LGf [4]       72.10 79.70          -      -           -       -
                                 GMSM [5]        63.30 71.30        58.51 44.40        60.50 87.00


models are still not built due to the long training time and                Trans. PAMI, 25(10):1337–1342, 2003. 1
disturbance by foreground objects.                                      [2] T. Horprasert, D. Harwood, and L. S. Davis. A statistical
                                                                            approach for real-time robust background subtraction and
5.4. Handling Scene with Few Foreground Activities                          shadow detection. In ICCV Frame-rate Workshop, 1999. 1
                                                                        [3] D. S. Lee. Effective gaussian mixture learning for video
   Here we use the benchmark sequence, Intelligent Room,                    background subtraction. IEEE Trans. PAMI, 27(5):827–832,
to demonstrate the robustness of our approach to videos                     2005. 5
captured at low frame rates and scenes with few foreground              [4] Z. Liu, K. Huang, T. Tan, and L. Wang. Cast shadow removal
activities. Downsampled sequences with lower frame rates                    combining local and global features. In CVPR, pages 1–8,
are obtained by taking a sample from the original image se-                 2007. 1, 2, 7, 8
quence for every M ∈ {2, 3, 4} samples. Thus, we have                   [5] N. Martel-Brisson and A. Zaccarin. Learning and removing
sequences with 10, 5, 3.33, and 2.5 frame rates, respec-                    cast shadows through a multidistribution approach. IEEE
tively. Figure 6 shows the quantitative results on these se-                Trans. PAMI, 29(7):1133–1146, 2007. 1, 2, 8
quences. We can see that the performance on sequences                   [6] N. Martel-Brisson and A. Zaccarin. Kernel-based learning
lower frame rates degraded slightly, demonstrating that our                 of cast shadows from a physical model of light sources and
approach can still learn and removing cast shadows even in                  surfaces for low-level segmentation. In CVPR, June. 2008.
such low frame rates.                                                       1, 2, 6, 7, 8
                                                                        [7] B. Maxwell, R. Friedhoff, and C. Smith. A bi-illuminant
                                                                            dichromatic reflection model for understanding images. In
6. Conclusion                                                               CVPR, pages 1–8, June 2008. 2, 3
   In this paper, we have presented a novel algorithm capa-             [8] S. Nadimi and B. Bhanu. Physical models for moving
ble of detecting cast shadows in various scenes. Qualitative                shadow and object detection in video. IEEE Trans. PAMI,
and quantitative evaluation of the physics-based shadow                     26(8):1079–1087, 2004. 1
model validated that our approach is more effective in de-              [9] F. Porikli and J. Thornton. Shadow flow: a recursive method
                                                                            to learn moving cast shadows. In ICCV, pages 891–898 Vol.
scribing background surface variation under cast shadows.
                                                                            1, Oct. 2005. 1, 2
The physics-based color features can be used to learn a
                                                                       [10] A. Prati, I. Mikic, M. Trivedi, and R. Cucchiara. Detecting
global shadow model for a scene. Therefore, our method
                                                                            moving shadows: algorithms and evaluation. IEEE Trans.
does not suffer from the problem of insufficient training                    PAMI, 25(7):918–923, 2003. 3, 6
data as in pixel-based shadow models. Moreover, with the               [11] E. Salvador, A. Cavallaro, and T. Ebrahimi. Cast shadow seg-
aid of the global shadow model, we can update the local                     mentation using invariant color features. CVIU, 95(2):238–
shadow models through confidence-rated learning, which is                    259, 2004. 1
significantly faster than conventional online updating. To              [12] O. Schreer, I. Feldmann, U. Golz, and P. A. Kauff. Fast and
further improve the detection accuracy, more discriminative                 robust shadow detection in videoconference applications. In
features or the spatial and temporal smoothness constraints                 IEEE Int’l Symp. Video/Image Processing and Multimedia
can also be incorporated into the detection process in the                  Communications, pages 371–375, 2002. 1
future.                                                                [13] S. Shafer. Using Color to Separate Reflection Components.
                                                                            Color Research Applications, pages 210–218, 1985. 2
Acknowledgement                                                        [14] J. Stander, R. Mech, and J. Ostermann. Detection of moving
                                                                            cast shadows for object segmentation. IEEE Trans. Multime-
  This work was supported in part by the National Science                   dia, 1(1):65–76, Mar 1999. 1
Council of Taiwan, R.O.C., under Grant NSC95-2221-E-                   [15] C. Stauffer and W. E. L. Grimson. Adaptive background mix-
001-028-MY3.                                                                ture models for real-time tracking. In Proc. CVPR, volume 2,
                                                                            pages –252, 1999. 4, 5
References                                                             [16] W. Zhang, X. Fang, X. Yang, and Q. M. J. Wu. Moving cast
                                                                            shadows detection using ratio edge. IEEE Trans. Multime-
 [1] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati. Detecting           dia, 9(6):1202–1214, 2007. 1
     moving objects, ghosts, and shadows in video streams. IEEE

Mais conteúdo relacionado

Mais procurados

Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Image segmentation for high resolution images
Image segmentation for high resolution imagesImage segmentation for high resolution images
Image segmentation for high resolution imagesjeet patalia
 
Numerical darkfielddlhm riao_2019_jigs
Numerical darkfielddlhm riao_2019_jigsNumerical darkfielddlhm riao_2019_jigs
Numerical darkfielddlhm riao_2019_jigscalelo36
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learningYu Huang
 
WAVELET DECOMPOSITION AND ALPHA STABLE FUSION
WAVELET DECOMPOSITION AND ALPHA STABLE FUSIONWAVELET DECOMPOSITION AND ALPHA STABLE FUSION
WAVELET DECOMPOSITION AND ALPHA STABLE FUSIONsipij
 
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...IJERA Editor
 
B140715
B140715B140715
B140715irjes
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introductionzukun
 
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
Advanced Lighting Techniques   Dan Baker (Meltdown 2005)Advanced Lighting Techniques   Dan Baker (Meltdown 2005)
Advanced Lighting Techniques Dan Baker (Meltdown 2005)mobius.cn
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...Compressive Light Field Photography using Overcomplete Dictionaries and Optim...
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...Ankit Thiranh
 
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object DetectionDeep Learning JP
 
Fcv scene savarese
Fcv scene savareseFcv scene savarese
Fcv scene savaresezukun
 
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...CSCJournals
 

Mais procurados (20)

Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Slides Hair Rendering
Slides Hair RenderingSlides Hair Rendering
Slides Hair Rendering
 
Image segmentation for high resolution images
Image segmentation for high resolution imagesImage segmentation for high resolution images
Image segmentation for high resolution images
 
Numerical darkfielddlhm riao_2019_jigs
Numerical darkfielddlhm riao_2019_jigsNumerical darkfielddlhm riao_2019_jigs
Numerical darkfielddlhm riao_2019_jigs
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
 
WAVELET DECOMPOSITION AND ALPHA STABLE FUSION
WAVELET DECOMPOSITION AND ALPHA STABLE FUSIONWAVELET DECOMPOSITION AND ALPHA STABLE FUSION
WAVELET DECOMPOSITION AND ALPHA STABLE FUSION
 
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...
Image Restoration and Denoising By Using Nonlocally Centralized Sparse Repres...
 
B140715
B140715B140715
B140715
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
 
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
Advanced Lighting Techniques   Dan Baker (Meltdown 2005)Advanced Lighting Techniques   Dan Baker (Meltdown 2005)
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...Compressive Light Field Photography using Overcomplete Dictionaries and Optim...
Compressive Light Field Photography using Overcomplete Dictionaries and Optim...
 
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
 
Fcv scene savarese
Fcv scene savareseFcv scene savarese
Fcv scene savarese
 
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...
A Richardson-Lucy Algorithm Using a Varying Point Spread Function Along the I...
 

Destaque

Media studies as
Media studies asMedia studies as
Media studies asguest83f870
 
Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1
Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1
Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1Thom Kearney
 
Nx10 digitalcamerareview
Nx10 digitalcamerareviewNx10 digitalcamerareview
Nx10 digitalcamerareviewelectroview
 
Artworks
ArtworksArtworks
Artworkspepe
 
TU Practical SEO Class
TU Practical SEO ClassTU Practical SEO Class
TU Practical SEO ClassJustin
 
Ikt ak gure ikastetxeetan
Ikt ak gure ikastetxeetanIkt ak gure ikastetxeetan
Ikt ak gure ikastetxeetanMaitane
 
1914 to present
1914 to present1914 to present
1914 to presentemmawhap
 
Setting the stage for PS Engage 2012
Setting the stage for PS Engage 2012Setting the stage for PS Engage 2012
Setting the stage for PS Engage 2012Thom Kearney
 
At Boekblad aan Zee
At Boekblad aan ZeeAt Boekblad aan Zee
At Boekblad aan Zee24Symbols
 
Youth United Way Application
Youth United Way ApplicationYouth United Way Application
Youth United Way ApplicationSummer Gill
 
გალაკტიონი
გალაკტიონიგალაკტიონი
გალაკტიონიmaiko
 
15 Things You Can Do Tomorrow To Grow Digital Presence
15 Things You Can Do Tomorrow To Grow Digital Presence15 Things You Can Do Tomorrow To Grow Digital Presence
15 Things You Can Do Tomorrow To Grow Digital PresenceThe Loud Few
 
Presentación1
Presentación1Presentación1
Presentación1pggarde
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
 
Youth United Way Application
Youth  United  Way  ApplicationYouth  United  Way  Application
Youth United Way ApplicationSummer Gill
 

Destaque (20)

Confirmation 3 & 4 done
Confirmation 3 & 4 doneConfirmation 3 & 4 done
Confirmation 3 & 4 done
 
Media studies as
Media studies asMedia studies as
Media studies as
 
Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1
Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1
Thom Kearney lookiing to the future PHAC KE forum Nov 2010 1
 
Nx10 digitalcamerareview
Nx10 digitalcamerareviewNx10 digitalcamerareview
Nx10 digitalcamerareview
 
Samsung WB2000
Samsung WB2000Samsung WB2000
Samsung WB2000
 
Thom’s top ten
Thom’s top tenThom’s top ten
Thom’s top ten
 
Artworks
ArtworksArtworks
Artworks
 
TU Practical SEO Class
TU Practical SEO ClassTU Practical SEO Class
TU Practical SEO Class
 
Ikt ak gure ikastetxeetan
Ikt ak gure ikastetxeetanIkt ak gure ikastetxeetan
Ikt ak gure ikastetxeetan
 
1914 to present
1914 to present1914 to present
1914 to present
 
Setting the stage for PS Engage 2012
Setting the stage for PS Engage 2012Setting the stage for PS Engage 2012
Setting the stage for PS Engage 2012
 
At Boekblad aan Zee
At Boekblad aan ZeeAt Boekblad aan Zee
At Boekblad aan Zee
 
Youth United Way Application
Youth United Way ApplicationYouth United Way Application
Youth United Way Application
 
გალაკტიონი
გალაკტიონიგალაკტიონი
გალაკტიონი
 
Brand Doctors 5 7
Brand Doctors 5 7Brand Doctors 5 7
Brand Doctors 5 7
 
Corpus
CorpusCorpus
Corpus
 
15 Things You Can Do Tomorrow To Grow Digital Presence
15 Things You Can Do Tomorrow To Grow Digital Presence15 Things You Can Do Tomorrow To Grow Digital Presence
15 Things You Can Do Tomorrow To Grow Digital Presence
 
Presentación1
Presentación1Presentación1
Presentación1
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
Youth United Way Application
Youth  United  Way  ApplicationYouth  United  Way  Application
Youth United Way Application
 

Semelhante a Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)

PapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdf
PapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdfPapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdf
PapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdfAdam Hill
 
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOSSHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOScsandit
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
 
Image Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkImage Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkCSCJournals
 
Radar reflectance model for the extraction of height from shape from shading ...
Radar reflectance model for the extraction of height from shape from shading ...Radar reflectance model for the extraction of height from shape from shading ...
Radar reflectance model for the extraction of height from shape from shading ...eSAT Journals
 
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Alejandro Franceschi
 
Mc0086 digital image processing
Mc0086  digital image processingMc0086  digital image processing
Mc0086 digital image processingsmumbahelp
 
Dj31514517
Dj31514517Dj31514517
Dj31514517IJMER
 
Dj31514517
Dj31514517Dj31514517
Dj31514517IJMER
 
Mc0086 digital image processing
Mc0086  digital image processingMc0086  digital image processing
Mc0086 digital image processingsmumbahelp
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesIRJET Journal
 
Simulations of Strong Lensing
Simulations of Strong LensingSimulations of Strong Lensing
Simulations of Strong LensingNan Li
 
New microsoft power point presentation
New microsoft power point presentationNew microsoft power point presentation
New microsoft power point presentationAzad Singh
 
Narrow Band Active Contour
Narrow Band Active ContourNarrow Band Active Contour
Narrow Band Active ContourMohammad Sabbagh
 
Abstract of project 2
Abstract of project 2Abstract of project 2
Abstract of project 2Vikram Mandal
 

Semelhante a Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009) (20)

PapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdf
PapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdfPapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdf
PapersWeLove - Rendering Synthetic Objects Into Real Scenes - Paul Debevec.pdf
 
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOSSHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
Image Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkImage Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic Framework
 
Ai32237241
Ai32237241Ai32237241
Ai32237241
 
Structlight
StructlightStructlight
Structlight
 
Lm342080283
Lm342080283Lm342080283
Lm342080283
 
Em molnar2015
Em molnar2015Em molnar2015
Em molnar2015
 
Radar reflectance model for the extraction of height from shape from shading ...
Radar reflectance model for the extraction of height from shape from shading ...Radar reflectance model for the extraction of height from shape from shading ...
Radar reflectance model for the extraction of height from shape from shading ...
 
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...
Google Research Siggraph Whitepaper | Total Relighting: Learning to Relight P...
 
Mc0086 digital image processing
Mc0086  digital image processingMc0086  digital image processing
Mc0086 digital image processing
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
Mc0086 digital image processing
Mc0086  digital image processingMc0086  digital image processing
Mc0086 digital image processing
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing Approaches
 
Simulations of Strong Lensing
Simulations of Strong LensingSimulations of Strong Lensing
Simulations of Strong Lensing
 
New microsoft power point presentation
New microsoft power point presentationNew microsoft power point presentation
New microsoft power point presentation
 
Narrow Band Active Contour
Narrow Band Active ContourNarrow Band Active Contour
Narrow Band Active Contour
 
F045073136
F045073136F045073136
F045073136
 
Abstract of project 2
Abstract of project 2Abstract of project 2
Abstract of project 2
 

Mais de Jia-Bin Huang

How to write a clear paper
How to write a clear paperHow to write a clear paper
How to write a clear paperJia-Bin Huang
 
Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXJia-Bin Huang
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash CourseJia-Bin Huang
 
Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Jia-Bin Huang
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB CodeJia-Bin Huang
 
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Jia-Bin Huang
 
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Jia-Bin Huang
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and RecognitionJia-Bin Huang
 
Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Jia-Bin Huang
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang
 
Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual trackingJia-Bin Huang
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression EnhancementJia-Bin Huang
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionJia-Bin Huang
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucJia-Bin Huang
 
Static and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionStatic and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionJia-Bin Huang
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionJia-Bin Huang
 

Mais de Jia-Bin Huang (20)

How to write a clear paper
How to write a clear paperHow to write a clear paper
How to write a clear paper
 
Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeX
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB Code
 
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
 
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and Recognition
 
Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum Vitae
 
Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual tracking
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression Enhancement
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure Extraction
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiuc
 
Static and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionStatic and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture Recognition
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes Recognition
 

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)

  • 1. Moving Cast Shadow Detection using Physics-based Features Jia-Bin Huang and Chu-Song Chen Institute of Information Science, Academia Sinica, Taipei, Taiwan jbhuang0604@gmail.com, song@iis.sinica.edu.tw Abstract ground regions is reduced. Hence, shadow points are ex- pected to have lower luminance but similar chromaticity Cast shadows induced by moving objects often cause se- values. rious problems to many vision applications. We present in There have been many works dedicated to detecting this paper an online statistical learning approach to model cast shadows. Most of them are based on the assumption the background appearance variations under cast shadows. that shadow pixels should have lower luminance and the Based on the bi-illuminant (i.e. direct light sources and am- same chrominance as the corresponding background (i.e. bient illumination) dichromatic reflection model, we derive the RGB values of shadow pixels will fall on the line be- physics-based color features under the assumptions of con- tween the illuminated value and the origin in the RGB color stant ambient illumination and light sources with common space). This linear attenuation property has been employed spectral power distributions. We first use one Gaussian in different colors spaces like RGB [2], HSV [1], YUV [12], Mixture Model (GMM) to learn the color features, which and c1 c2 c3 [11]. Besides, other shadow-induced features are constant regardless of the background surfaces or il- like edge or gradient information extracted from the spatial luminant colors in a scene. Then, we build up one pixel- domain have also been used to detect cast shadows [14, 16]. based GMM for each pixel to learn the local shadow fea- The major limitation of these algorithms is that they often tures. To overcome the slow convergence rate in the con- require explicit tuning of a large set of parameters for each ventional GMM learning, we update the pixel-based GMMs new scene. Thus, they are inappropriate for on-line applica- through confidence-rated learning. The proposed method tions. can rapidly learn model parameters in an unsupervised To adapt to environment changes, statistical learning- way and adapt to illumination conditions or environment based approaches have been developed to learn and remove changes. Furthermore, we demonstrate that our method is cast shadows [9, 5, 4]. However, the linear proportionality robust to scenes with few foreground activities and videos assumption may not always hold in a real-world environ- captured at low or unsteady frame rates. ment. For instance, in an outdoor scene, the light sources may consist of direct sunlight, diffused light scattered by the sky, and other colored light from nearby surfaces (i.e. color 1. Introduction bleeding). These light sources may have different spectral Extracting moving objects from video sequences is at the power distributions (SPDs). Therefore, the RGB values of core of various vision applications, including visual surveil- a shadow pixel may not attenuate linearly. lance, contend-based video coding, and human-computer Little attention has been paid to the non-proportionality interaction, etc. One of the most challenging problems of attenuation problem before. Nadami and Bhanu [8] ad- extracting moving objects is detecting and removing mov- dressed the non-linearity by using a dichromatic reflection ing cast shadows. When performing background subtrac- model to account for both the sun and the sky illumina- tion, cast shadows are often misclassified as parts of fore- tions in an outdoor environment. Recently, a more general ground objects, distorting the estimation of shape and color shadow model was presented in [6], which introduced an properties of target objects. The distortion caused by cast ambient illumination term that determines the direction in shadows may hinder subsequent vision algorithms, such as the RGB color space along which the shaded background tracking and recognition. values can be found. Since the ambient term may have a Cast shadows are caused by the occlusion of light different SPD from the incident light sources, the values of sources. When foreground objects cast shadows on back- shadow pixels may not decrease proportionally. Nonpara- ground surfaces, the light sources are partially or entirely metric density estimation was used to model surface varia- blocked, and thus the total energy incident at the back- tion under cast shadows in an unsupervised way. By provid-
  • 2. ing a better description of cast shadows, the shadow model 2. Physics-Based Shadow Model in [6] provided improved performance over the previous ap- proaches which used linear models. 2.1. Bi-illuminant Dichromatic Reflection Model There are three terms in the Shafer’s model [13]: body However, these learning-based approaches [9, 5, 4, 6] reflection, surface reflection, and a constant ambient term. may suffer from insufficient training samples since the sta- Each of the two reflection types can be decomposed into tistical models are learned from background surface varia- chromatic and achromatic parts: 1) composition: a relative tion under cast shadows. Unlike obtaining samples in ev- SPD cb or cs which depends only on wavelength, 2) mag- ery frame in background modeling, shadows may not ap- nitude: a geometric scale factor mb or ms which depends pear at the same pixel in each frame. A single pixel should only on geometry. Given a scene geometry, the radiance in be shadowed many times till its estimated parameters con- the direction (θe , φe ) can be expressed as verge, while the illumination conditions should be stable. I(θe , φe , λ) = mb (θe , φe )cb (λ)+ms (θe , φe )cs (λ)+cd (λ), Therefore, this kind of pixel-based shadow models require (1) a longer period of training time when foreground activities where cd is the constant ambient term. are rare. This problem becomes more serious when video While this model included a term to account for ambi- sequences are captured at a low or unsteady frame rate that ent illumination, the model did not separate it into body and depends on the transmission conditions. surface reflection. Recently, Maxwell et al. [7] proposed the BIDR model, which contains four terms: two types of In this paper, we characterize cast shadows with “global” reflection for both the direct light sources and ambient illu- parameters for a scene. Based on the bi-illuminant dichro- mination. Then, the BIDR model is of the form: matic reflection model (BIDR) [7], we first derive nor- malized spectral ratio as our color features under the as- I(θe , φe , λ) = mb (θe , φe , θi , φi )cb (λ)ld (θL , φL , λ) (2) sumptions of constant ambient illumination and direct light +ms (θe , φe , θi , φi )cs (λ)ld (θL , φL , λ) sources with a common SPD. The normalized spectral ra- tio remains constant regardless of different background sur- +cb (λ) mb (θe , φe , θi , φi )la (θL , φL , λ)dθi dφi θi ,φi faces and illumination conditions. We then model the color features extracted from all moving pixels using a single +cs (λ) ms (θe , φe , θi , φi )la (θL , φL , λ)dθi dφi , Gaussian Mixture Model (GMM). To further improve the θi ,φi differentiating ability for cast shadows having similar col- ors to background, we use a pixel-based GMM to describe where (θL , φL ) is the direction of the direct light source the gradient intensity distortion for each pixel. We update relative to the local surface normal, and (θe , φe ) and (θi , φi ) the pixel-based GMMs using the confidence predicted from are the angles of emittance and incidence, respectively. In the global GMM through confidence-rated learning to ac- this reflection model, the range of mb , ms , cb , and cs all celerate convergence rates. Contributions are presented in lie in [0, 1] (we refer reader to [7] for further details of the two key aspects. Firstly, with the global shadow model deviation of BIDR model.) learned from physics-based features, our approach does not With a specific geometry and representing the two am- require numerous foreground activities or high frame rates bient integrals as Mab (λ) and Mas (λ), we can simplify the to learn the shadow model parameters. This makes the pro- BIDR model to posed method more practical than existing works using only I(λ) = cb (λ)[mb ld (λ) + Mab (λ)] (3) pixel-based models. Secondly, the proposed confidence- rated learning can be used for fast learning of local features +cs (λ)[ms ld (λ) + Mas (λ)]. in pixel-based models. We provide a principled scheme for the local and global features to collaborate with each other. Considering only matte surfaces, we can ignore the lat- ter part of (3). To describe the appearance of cast shadows on a background surface, we multiply the direct illumina- The remainder of this paper is organized as follows. tion with an attenuation factor α ∈ [0, 1], which indicates We briefly describe in Section 2 the dichromatic reflection the unoccluded proportion of the direct light. We assume model [13] and its extension BIDR. In Section 3, we present that all direct light sources have a common SPD with dif- the proposed learning approach. The posterior probability ferent power factor and the ambient illumination is constant of cast shadows and foreground are developed in Section 4. over lit and shaded regions (see Fig. 1). This gives us the Both visual and quantitative results are shown in Section 5 simplified form of the BIDR model to verify the performance of our method and the robustness to few foreground activities. Section 6 concludes this paper. I(λ) = αmb cb (λ)ld (λ) + cb (λ)Mab (λ). (4)
  • 3. (a) (e) Figure 1. The contribution of all direct light sources and ambient illuminance. The shadow values SD are expected to be observed along the line between background value BG and the constant ambient term BGA . Note that the shadow values do not necessary to be proportional to the direction of background values. (b) (f) The camera sensor response gi at the pixel level can be obtained through the spectral projection gi = Fi (λ)I(λ)dλ, where Fi (λ), i ∈ {R, G, B} is the sensor spectral sensitivities and λ denotes the wavelength. By ap- (c) (g) plying the linearity of spectral projection, we have gi = αFi mb ci ld + Fi ci Mab , i ∈ {R, G, B} b i b i (5) The formulation of gi defines a line segment in the RGB color space varied between two ends: shadowed pixel (α = 0) to fully lit pixel (α = 1). (d) (h) 2.2. Extracting Useful Features Figure 2. The color feature value distribution in various environ- ments and illumination conditions. (a)-(d) Frame with cast shad- 2.2.1 Spectral Ratio ows. (e)-(h) The corresponding feature value distribution in the To extract color features that are constant and independent S1 S2 S3 space. Note that the feature values extracted from differ- ent background surface generally follow a line. to different background surfaces, we need to identify mea- surements that are invariant to illumination attenuation fac- tor α, geometry shading factor mb , and the chromatic aspect To address this problem, we introduce γ = of body reflection ci . We calculate the ratio of illuminants b [γR , γG , γB ]T by subtracting each element of S by to be the spectral ratio S = [SR , SG , SB ]T using α 1−α : i α Mab SDi αFi mb ci ld + Fi ci Mab b i b i γi = Si − = . (7) Si = = i li (6) 1−α i (1 − α)mb ld BGi − SDi (1 − α)Fi mb cb d α i Mab Similarly, we can obtain the normalized spectral ratio γ with ˆ = + i , i ∈ {R, G, B}. higher accuracy by factoring out (1−α)mb through normal- 1 − α (1 − α)mb ld ization: i If the shaded regions received only the ambient illumination Mab 1 γi = ˆ i |γ| (8) (i.e. all direct light sources are blocked: α = 0), the first (1 − α)mb ld Mi term in (6) disappears and Si = mbab . We can then derive i R G B ld 1 Mab 2 Mab Mab features invariant to mb by normalizing Si with its length |γ| = ( R ) + ( G )2 + ( B )2 (9) (1 − α)mb ld ld ld |S| since the mb term can be extracted from the normaliza- tion constant. However, this assumption does not hold in We validate the proposed physics-based color feature by real-world environments. Take an indoor scene as an exam- observing its distributions from cast shadows in various en- ple, where there are usually multiple light sources. When a vironments and illumination conditions. Fig. 2 (a) and its foreground object occludes one or some of the light sources, reference background images are by courtesy of Maxwell there is still energy from the remaining light sources inci- et al. [7]. Fig. 2 (b)-(d) show frames from the benchmark dent to this surface. Consequently, assuming the attenua- sequences provided in the Prati et al.’s survey paper [10]. tion factor to be zero will induce bias in estimating the ratio We manually label shadow pixels in the given images, and between two illuminants. then extract the spectral ratio Si , i ∈ {R, G, B} for each
  • 4. pixel in shadow regions using the given image and the ref- erence background models. The resultant feature distribu- tions are presented in Fig. 2 (e)-(h). The feature values ex- tracted from different background surfaces roughly follow a straight line in the SR SG SB space. Therefore, we can use the direction of the line as our color feature, which is roughly the same for all shadow pixels. The direction of the line in the SR SG SB space can be characterized with two angles, the zenith and azimuth in the spherical coordinate, which correspond to the normalized spectral ratio. From the feature distributions in Fig. 2 (e)-(h), we also observe that larger feature values tend to be unstable and deviate from Figure 3. The weak shadow detector. The observation will be con- the major direction of most feature values. This is because sidered as potential shadow point if it falls into the gray area. The there is not sufficient difference between the shaded and lit weak shadow detector contains three parameters: maximum al- pixel value to robustly measure the orientation. In addi- lowed color shift, and minimal and maximal illumination attenua- α tion, the value of (1−α) in the scene can be easily estimated tion. by intersecting the fitted line with the line passing through (1,1,1) and the origin. the luminance values, the potential shadow values should fall into the conic volume around the corresponding back- 2.2.2 Gradient Intensity Distortion ground color. The weak shadow detector is illustrated in Fig 3, where values of cast shadows are expected to fall into the Now we have derived color features that are invariant to dif- gray conic region. Pixel values that fall into the gray conic ferent background surfaces. This low dimensional (2D) fea- region are considered as potential shadow samples. These tures, however, might fail to distinguish foreground with samples are then used to learn the global shadow model for colors similar to background from cast shadows. Thus, the scene and the local shadow model for each pixel. other shadow-induced properties like edge or gradient in- formation may be used to further describe the background appearance variation under cast shadows. In this paper, we 3.2. Global Shadow Model just use a simple gradient intensity distortion as our local Using the background surface invariant color features, a features to demonstrate the improvement by incorporating global shadow model is learned for the whole scene. Here, additional local features. we model the background color information by the well- For a given pixel p, we define the gradient intensity dis- known GMM [15] in the RGB color space. For every frame, tortion ωp as we obtain potential shadow points by applying the weak shadow detector on moving pixels, which are identified via ωp = | (BG)p | − | (F )p |, (10) background subtraction. We then use one GMM to learn where BG and F are luminance channels of the background the normalized spectral ratio r = [γˆ , γˆ ]T in the scene. ˆ R G image and current frame, and (·) is the gradient operator. Note that the reason why we only use two of the three di- mensional features is that the third component is redundant since γˆ 2 + γˆ 2 + γˆ 2 = 1. The normalized spectral ratio R G B 3. Learning Cast Shadows in the whole scene is modeled by K Gaussian distributions In this section, we show how to build models for cast with mean vector µk and full covariance matrix Σk . Then, shadows in an unsupervised way. Here, we use GMM to the probability of the normalized spectral ratio r is given ˆ learn the background surface variation over time. It is also by: possible to use other statistical learning method such as ker- K nel density estimation. p(ˆ|µ, Σ) = r πk Gk (ˆ, µk , Σk ), r (11) k=1 3.1. Weak Shadow Detector where µ, Σ denote all parameters of the K Gaussians, πk is To model the cast shadows, impossible shadow sam- the mixing weight, and Gk is the kth Gaussian probability ples that belong to background or foreground (e.g. color distribution. We use the Expectation-Maximization (EM) values that are the same as or brighter than background algorithm to estimate the parameters in the GMM. The es- values) should be excluded. Therefore, we apply a weak timated parameters in the current frame are propagated to shadow detector that evaluates every moving pixel to filter next frame, so that the EM algorithm can converge quickly. out some impossible samples. Since cast shadows reduce Since the light sources are usually stable in the scene, we
  • 5. α find that it is sufficient for the estimated parameters to con- obtained by subtracting 1−α from Si . From the feature dis- verge with a single EM iteration at each frame. tribution of S, our aim is to find the location of point t such that t passes through both the lines passing through the ori- 3.3. Local Shadow Model gin with direction vector (1,1,1) and the line that fits the observations S. This estimation can be achieved using the Besides using color features in the global shadow model, robust fitting method that is less sensitive than ordinary least we build GMM for each pixel to learn the gradient intensity squares to large changes (outliers). In addition, we perform distortion under cast shadows similar to background mod- α the recursive linear regression to update the estimated 1−α eling [15]. For a given pixel p, its gradient feature value ωp value adaptively. For simplicity, the attenuation factor α is is sampled and learned whenever it is a potential shadow assumed the same for every pixel. pixel. However, as we mentioned before, pixel-based mod- els often suffer from insufficient training data because the samples are not available at the pixel every frame. 4. Cast Shadow and Foreground Posterior To address this problem, we adopt the confidence-rated Probabilities learning to improve the convergence rate of the local model In this section, we present how to derive the posterior parameters. The basic idea is that each sample is weighted probabilities of cast shadows and foreground given the ob- with different importance computed from the global shadow served sample xp in the RGB color space by using the pro- model. For example, if we update the model using a poten- posed global and local shadow models. tial shadow point whose color features matched with the global shadow model, then we think this sample is rela- 4.1. Cast Shadow Posterior tively more important than others. In this way, the learning process of the local shadow model is guided by the global The shadow posterior is first computed by decomposing shadow model. P (SD|xp )) over the (BG, FS) domain, where F S indicates moving pixels (real foreground and cast shadows). Since 3.4. Confidence-Rated Gaussian Mixture Learning P (SD|xp , BG) = 0, the decomposition gives We present an effective Gaussian mixture learning algo- P (SD|xp ) = P (SD|xp , F S)P (F S|xp ), (14) rithm to overcome some drawbacks in conventional GMM learning approach. Let ρπ and ρG be the learning rates for where P (F S|xp ) = 1 − P (BG|xp ) can be di- the mixing weight and the Gaussian parameters (means and rectly computed from the background model. Sec- covariances) in the local shadow model, respectively. The ond, we remove pixels that are definitely foreground updating scheme follows the the formulation of the combi- (i.e. pixels that are rejected by the weak shadow de- nation of incremental EM learning and recursive filter [3]: tector) and consider only potential shadow points (PS): P (SD|xp , F S) = P (SD|xp , F S, P S). Then, we decom- 1 − ρdef ault pose P (SD|xp , F S, P S) into two parts: Na , and Nna , ρπ = C(ˆ ) ∗ ( γ K ) + ρdef ault (12) j=1 cj which stand for color features that are associated with the 1 − ρdef ault normalized spectral ratio or not, respectively. If the color ρG = C(ˆ ) ∗ ( γ ) + ρdef ault , (13) features do not associate with the working states of the ck GMM, then the probability of belonging to shadow equals where ck is the number of matches of the kth Gaussian state, to zero. Therefore, we have and ρdef ault is a small constant, which is 0.005 in our ex- periments. The two types of learning rates are controlled P (SD|xp , F S, P S) = P (SD|xp , F S, P S, Na )∗ (15) by a confidence value C(ˆ ), which indicates how confident γ P (Na |xp , F S, P S) the sample belongs to the shadow. Observations with higher confidence will then converge faster than those with lower Here, the gradient intensity distortion ωp and the color fea- confidence. ture γp are the sufficient statistics for xp in the first and sec- ˆ ond part of (15). The posterior probability of cast shadow 3.5. Attenuation Factor Estimation can thus be computed. From the RGB values we observed in the current frame 4.2. Foreground Posterior and the reference background image, we can only compute the value of S, which may introduce bias in estimating nor- Computing foreground posterior probability is much eas- malized spectral ratios. Therefore, the estimation of atten- ier. Given a pixel p, we first compute the background pos- uation factor α is required for accurate shadow modeling. terior P (BG|xp ) from the background model. Then, we We can see that in (7) the value of γi , i ∈ {R, G, B} is can obtain shadow posterior probability P (SD|xp ) using
  • 6. the learned shadow models. Based on the probability the- 5. Experimental Results ory, the foreground posterior can be obtained as: We present the visual results from challenging video se- P (F G|xp ) = 1 − P (BG|xp ) − P (SD|xp ). (16) quences captured in various environments, including both indoor and outdoor scenes. We also compare the quantita- tive accuracy of the proposed method in several videos with 4.3. Summary other approaches when the results are available. Previous Algorithms 1 and 2 summarize in pseudocode the learn- approaches using statistical model have higher success rate ing and detection processes of the proposed algorithm. Our in detecting cast shadow when numerous foreground activi- method can be attached to other moving object detection ties are present. However, we show that our method can deal programs as an independent module. Shadow detection pro- with the situation that cast shadows first appear in complex cess is only applied to moving pixels detected by the back- scenes and unknown illumination conditions as well as rare ground model and the learning process occurs only when foreground activity. these moving pixels are considered as shadow candidates. Consequently, the proposed algorithm is practical, it does 5.1. Qualitative Results not introduce heavy computational burden and can work ef- fectively to detect shadows. In Figure 4, we show sample cast shadow detection re- sults from four video sequences. The first three sequence: Laboratory, Intelligent Room, and Highway I are part of the Algorithm 1: Learning Process benchmark sequences for validating shadow detection al- At time t, gorithm. The last one Hallway is taken from [6]. Figure for each pixel p in the frame do 4 (a) shows one frame selected from the video, where cast if P (BG|xt (p)) < 0.5 then shadows are present in the scene. The background poste- if pixel p satisfies shadow property then rior probability is presented in Figure 4(b), where the dark -Compute normalized spectral ratio γp ˆ region indicates the less probability of belonging to back- -Compute gradient intensity distortion ωp ground. From Figure 4(c)(d), we show the confidence map -Update local shadow model at pixel p of global shadow model and the posterior probability of cast using confidence value C(ˆ ) through γ shadows, respectively. In Fig. 4(e) we show the probability confidence-rated learning values of belonging to foreground objects. We can see that end in these video sequences, the proposed algorithm is capa- end ble of detecting cast shadows without misclassifying fore- end ground as shadows. Note that in the second video, Intelli- Run one EM iteration to estimate the parameters of gent Room, the man just walks in the room by once. Thus, global shadow model using the collected color there is no chance for the pixel-based shadow model to learn features. its parameters. The use global shadow model enables us to detect shadows first appear in the scene. To verify the effectiveness of the proposed method, the results presented here are raw data and without any post- Algorithm 2: Detection Process processing. We can obtain binary results simply with At time t, thresholding the foreground posterior values P (F G|xp ). for each pixel p ∈ P do The posterior probabilities can also be incorporated with -Obtain background posterior P (BG|xp ) from context model that use spatial and temporal coherence to background modeling improve the segmentation accuracy. -Compute shadow posterior P (SD|xp )(eq. 14) -Compute foreground posterior P (F G|xp )(eq. 16) 5.2. Quantitative Results if P (F G|xp ) > P (SD|xp )&P (F G|xp ) > P (BG|xp ) then The quantitative evaluation follows the method proposed Label pixel p as foreground by Prati et al. [10]. There are two defined metrics for eval- else uating the performance of cast shadow detection algorithm: Label pixel p as background shadow detection rate η and shadow discrimination rate ξ. end The formulations of the two metrics are as follows: end T PS T PF η= ;ξ = , (17) T P S + F NS T PF + F NF
  • 7. (a) (b) (c) (d) (e) Figure 4. Sample visual results of detecting cast shadows in various environment. (a) Frame from video sequence. (b) Background posterior probability P (BG|xp ). (c) Confidence map predicted by the global shadow model. (d) The shadow posterior probability P (SD|xp ). (e) The foreground posterior probability P (F G|xp ). (a) (b) (c) (d) (e) (f) Figure 6. Quantitative results of the Intelligent Room sequence. Figure 5. The effect of confidence-rated Gaussian mixture learn- Shadow detection and shadow discriminative rate are calculated ing. The mean maps of local shadow model are taken at the 100th under different frame rates settings. frame (the first row) and 1000th frame (the second row) (a)(d)The background image. (b)(e) The mean map of the most important Gaussian in the mixture with confidence-rated learning. (c)(f)The 5.3. Fast Learning of local Shadow Model mean map without using confidence-rated learning. We demonstrate the effect of using confidence-rated learning in Figure 5. In this experiment, we learn the local shadow model in two traffic scenes: Highway I and High- way II. Figure 5(a)(d) show the background model of these where the subscript S stands for shadow and F for fore- two outdoor scene. With confidence-rated learning, we ob- ground, and T P and F N denote true positive and false neg- tain the mean value of the most important Gaussian (i.e. ative, respectively. The T PF is the number of ground-truth with highest mixing weight and smallest variance) in Fig- points of the foreground objects minus the number of points ures 5(b)(e). We can see that the local models of gradient detected as shadows, but belonging to foreground objects. intensity distortion under cast shadows are well constructed. On the other hand, if we learn the local shadow model We show the quantitative results in Table 1. Note that following conventional Gaussian mixture learning method, results of other’s approaches are taken directly from [6][4]. then we obtain the results in Figure 5(c)(f), in which the
  • 8. Table 1. Quantitative results on surveillance sequences Sequence Highway I Highway II Hallway Method η% ξ% η% ξ% η% ξ% Proposed 70.83 82.37 76.50 74.51 82.05 90.47 Kernel [6] 70.50 84.40 68.40 71.20 72.40 86.70 LGf [4] 72.10 79.70 - - - - GMSM [5] 63.30 71.30 58.51 44.40 60.50 87.00 models are still not built due to the long training time and Trans. PAMI, 25(10):1337–1342, 2003. 1 disturbance by foreground objects. [2] T. Horprasert, D. Harwood, and L. S. Davis. A statistical approach for real-time robust background subtraction and 5.4. Handling Scene with Few Foreground Activities shadow detection. In ICCV Frame-rate Workshop, 1999. 1 [3] D. S. Lee. Effective gaussian mixture learning for video Here we use the benchmark sequence, Intelligent Room, background subtraction. IEEE Trans. PAMI, 27(5):827–832, to demonstrate the robustness of our approach to videos 2005. 5 captured at low frame rates and scenes with few foreground [4] Z. Liu, K. Huang, T. Tan, and L. Wang. Cast shadow removal activities. Downsampled sequences with lower frame rates combining local and global features. In CVPR, pages 1–8, are obtained by taking a sample from the original image se- 2007. 1, 2, 7, 8 quence for every M ∈ {2, 3, 4} samples. Thus, we have [5] N. Martel-Brisson and A. Zaccarin. Learning and removing sequences with 10, 5, 3.33, and 2.5 frame rates, respec- cast shadows through a multidistribution approach. IEEE tively. Figure 6 shows the quantitative results on these se- Trans. PAMI, 29(7):1133–1146, 2007. 1, 2, 8 quences. We can see that the performance on sequences [6] N. Martel-Brisson and A. Zaccarin. Kernel-based learning lower frame rates degraded slightly, demonstrating that our of cast shadows from a physical model of light sources and approach can still learn and removing cast shadows even in surfaces for low-level segmentation. In CVPR, June. 2008. such low frame rates. 1, 2, 6, 7, 8 [7] B. Maxwell, R. Friedhoff, and C. Smith. A bi-illuminant dichromatic reflection model for understanding images. In 6. Conclusion CVPR, pages 1–8, June 2008. 2, 3 In this paper, we have presented a novel algorithm capa- [8] S. Nadimi and B. Bhanu. Physical models for moving ble of detecting cast shadows in various scenes. Qualitative shadow and object detection in video. IEEE Trans. PAMI, and quantitative evaluation of the physics-based shadow 26(8):1079–1087, 2004. 1 model validated that our approach is more effective in de- [9] F. Porikli and J. Thornton. Shadow flow: a recursive method to learn moving cast shadows. In ICCV, pages 891–898 Vol. scribing background surface variation under cast shadows. 1, Oct. 2005. 1, 2 The physics-based color features can be used to learn a [10] A. Prati, I. Mikic, M. Trivedi, and R. Cucchiara. Detecting global shadow model for a scene. Therefore, our method moving shadows: algorithms and evaluation. IEEE Trans. does not suffer from the problem of insufficient training PAMI, 25(7):918–923, 2003. 3, 6 data as in pixel-based shadow models. Moreover, with the [11] E. Salvador, A. Cavallaro, and T. Ebrahimi. Cast shadow seg- aid of the global shadow model, we can update the local mentation using invariant color features. CVIU, 95(2):238– shadow models through confidence-rated learning, which is 259, 2004. 1 significantly faster than conventional online updating. To [12] O. Schreer, I. Feldmann, U. Golz, and P. A. Kauff. Fast and further improve the detection accuracy, more discriminative robust shadow detection in videoconference applications. In features or the spatial and temporal smoothness constraints IEEE Int’l Symp. Video/Image Processing and Multimedia can also be incorporated into the detection process in the Communications, pages 371–375, 2002. 1 future. [13] S. Shafer. Using Color to Separate Reflection Components. Color Research Applications, pages 210–218, 1985. 2 Acknowledgement [14] J. Stander, R. Mech, and J. Ostermann. Detection of moving cast shadows for object segmentation. IEEE Trans. Multime- This work was supported in part by the National Science dia, 1(1):65–76, Mar 1999. 1 Council of Taiwan, R.O.C., under Grant NSC95-2221-E- [15] C. Stauffer and W. E. L. Grimson. Adaptive background mix- 001-028-MY3. ture models for real-time tracking. In Proc. CVPR, volume 2, pages –252, 1999. 4, 5 References [16] W. Zhang, X. Fang, X. Yang, and Q. M. J. Wu. Moving cast shadows detection using ratio edge. IEEE Trans. Multime- [1] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati. Detecting dia, 9(6):1202–1214, 2007. 1 moving objects, ghosts, and shadows in video streams. IEEE