SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
Homography Normalization for Robust Gaze Estimation in Uncalibrated Setups
                        Dan Witzner Hansen∗                                   Javier San Agustin†                        Arantxa Villanueva‡
                     IT University, Copenhagen                            IT University, Copenhagen                  Public University of Navarra


Abstract                                                                                          ubiquitous, and convenient for the general public. So far, it has not
                                                                                                  been possible to meet these constraints concurrently.
Homography normalization is presented as a novel gaze estimation
                                                                                                  Many gaze models require a fully calibrated setup and detailed eye
method for uncalibrated setups. The method applies when head
                                                                                                  models (a strong prior model) to be able to minimize user calibra-
movements are present but without any requirements to camera cal-
                                                                                                  tion and maintain high accuracy. A major limitation of fully cal-
ibration or geometric calibration. The method is geometrically and
                                                                                                  ibrated setups is that they require exact knowledge of the relative
empirically demonstrated to be robust to head pose changes and
                                                                                                  positions of the camera, light sources and monitor. Geometric cal-
despite being less constrained than cross-ratio methods, it consis-
                                                                                                  ibration is usually tedious and time consuming to perform and au-
tently performs favorably by several degrees on both simulated data
                                                                                                  tomated techniques are sparse [Brolly and Mulligan 2004]. Slight
and data from physical setups. The physical setups include the use
                                                                                                  unintentional movement of a system part or change in focal length
of off-the-shelf web cameras with infrared light (night vision) and
                                                                                                  may result in a significant drop in accuracy when relying on a cali-
standard cameras with and without infrared light. The benefits of
                                                                                                  brated setup. The accuracy is therefore difficult to maintain unless
homography normalization and uncalibrated setups in general are
                                                                                                  the hardware is placed in a rigid setup. Such requirements add to
also demonstrated through obtaining gaze estimates (in the visible
                                                                                                  the cost of the system. Gaze models may alternatively use mul-
spectrum) using only the screen reflections on the cornea.
                                                                                                  tiple calibration points in order to be less dependent on prior as-
                                                                                                  sumptions (e.g. using polynomial approximations [Hansen and Ji
Keywords: Eye tracking, Gaze estimation, Homography normal-                                       2010]). Models employing a weak prior model have not been able
ization, Gaussian process, Uncalibrated setup, HCI                                                to demonstrate head pose invariance to date.
                                                                                                  This paper will both geometrically and empirically demonstrate that
1     Introduction                                                                                it is possible to obtain robust gaze estimation in the presence of
                                                                                                  head movements when using a weak prior model of the geometric
Eye and gaze tracking have a long history but only recently have                                  setup. The model relies on homography normalization and does
gaze trackers become robust enough for use outside laboratories.                                  not require any direct measurements of the relative position of the
The precision of current gaze trackers is sufficient for many types                                screen, camera and light source, nor does it need camera calibra-
of applications, but are we really satisfied with their current capa-                              tion. This means that it is possible to obtain a highly flexible eye
bilities?                                                                                         tracker that can be made compact, mobile and suit individual needs.
Both research and commercial gaze trackers have been driven by                                    Besides, the method is very simple to implement. Homography nor-
the urge to obtain high accuracy gaze position data while simpli-                                 malization is shown to consistently provide higher accuracies than
fying user calibration, often by reducing the number of points nec-                               cross-ratio-based methods on both simulated data (section 4) and
essary for calibrating an individual user to the system. Both high                                data recorded from a physical setup (section 5). One reason for
accuracy and few calibration points are desirable properties of a                                 considering uncalibrated setups is to facilitate the general public
gaze tracker, but they are not necessarily the only parameters which                              with affordable and flexible gaze trackers that are robust with regard
should be optimized [Scott and Findlay 1993]. Price is obviously                                  to head movements. In section 5.2 this is shown to be achievable
an issue, but may be partially resolved with technological devel-                                 through purely off-the-shelf components. It is additionally shown
opments. Today even cheap web cameras are of sufficient quality                                    possible to use screen reflections on the cornea as an alternative
for reliable gaze tracking. In some situations, however, it would be                              to IR glints (section 5.3). Through this paper we intend to show
convenient if light sources, cameras and monitors could be placed                                 that flexible, mobile and low cost gaze trackers are indeed feasible
according to particular needs rather than being constrained by man-                               without sacrificing significant accuracy.
ufacturer specifications. Avoiding external light sources or allow-
ing the user to change the zoom of the camera to suit their particular                            2 Related Work
needs would be desirable. Gaze models that support flexible setups
eliminate the need for rigid frames that keep individual components                               The primary task of a gaze tracker is to determine gaze, where gaze
in place and allow for more compact, lightweight, adaptable and                                   may either be a gaze direction or the point of regard (PoR). Gaze
perhaps cheap eye trackers. If the models employed in the gaze                                    modeling consequently focuses on the relations between the image
trackers only required a few calibration targets and could maintain                               data and gaze. A comprehensive review of eye and gaze models is
accuracy while avoiding the need for light sources, then eye track-                               provided in Hansen & Ji [2010].
ing technology would take an important step towards being flexible,
                                                                                                  All gaze estimation methods need to determine a set of parame-
    ∗ e-mail: witzner@itu.dk                                                                      ters through calibration. Some parameters may be estimated for
    † e-mail: javier@itu.dk                                                                       each session by letting the user look at a set of predefined targets
    ‡ e-mail: avilla@unavarra.es                                                                  on the screen, others need only be calculated once (e.g. human spe-
                                                                                                  cific parameters) and yet other parameters are estimated prior to use
Copyright © 2010 by the Association for Computing Machinery, Inc.                                 (e.g. camera parameters, geometric and physical parameters such
Permission to make digital or hard copies of part or all of this work for personal or
                                                                                                  as angles and location between camera and monitor). A system
classroom use is granted without fee provided that copies are not made or distributed
for commercial advantage and that copies bear this notice and the full citation on the
                                                                                                  where the camera parameters and the geometry are a priori known
first page. Copyrights for components of this work owned by others than ACM must be               is termed fully calibrated [Hansen and Ji 2010].
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on
servers, or to redistribute to lists, requires prior specific permission and/or a fee.            This paper focuses primarily on feature-based methods but alterna-
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail              tive methods based on appearance also exist [Hansen and Ji 2010].
permissions@acm.org.
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

                                                                                             13
Feature-based methods explore the characteristics of the human eye             screen positions). Coutinho and Morimoto [2006] extend the model
to identify a set of distinctive and informative features around the           of Yoo et al. [2005], by using the offset between visual and optical
eyes that are less sensitive to variations in illumination and view-           axes as an argument to learn a constant on-screen offset. They ad-
point. Ensuring head pose invariance is a common problem often                 ditionally perform an elaborate evaluation of the consequences of
solved through the use of external light sources and their reflections          changing the calibration of the virtual calibration parameter (α).
(glints) on the cornea. Besides the glints, the pupil is the most com-         Based on this, they argue that a simpler model can be made by
mon feature to use, since it is easy to extract in IR spectrum images.         learning a single α value rather than four different values as orig-
The image measurements (e.g. the pupil) however, are influenced                 inally proposed. Where calibration in [Yoo and Chung 2005] can
by refraction [Guestrin and Eizenman 2006]. The limbus is less                 only be done by looking at the light sources in the screen corners,
influenced by refraction, but since its boundary may be partially               the method of [Coutinho and Morimoto 2006] may use multiple
occluded, it may be more difficult to obtain reliable measurements.             on-screen targets.

Two types of feature-based gaze estimation approaches exist: the               Since the cross-ratio is defined on projective planes and is invariant
interpolation-based (regression-based) and the model-based (geo-               to any projective transformation, scale changes will not influence
metric) Using a single camera, the 2D regression methods model                 the cross-ratio. The method is therefore not directly applicable to
the optical properties, geometry and the eye physiology indirectly             depth translations. Coutinho and Morimoto [2006] show signifi-
and may, therefore, be considered as approximate models which                  cant accuracy improvements compared to the original paper, pro-
may not strictly guarantee head pose invariance. They are, how-                vided the user does not change their distance to the camera and
ever, simple to implement, do not require camera or geometric cal-             monitor. The advantage of the method, compared to methods based
ibration (a.k.a weak prior model) and may still provide good re-               on calibrated setups, is that full hardware calibration is needless.
sults under conditions of small head movements. More recent 2D                 The method only requires light source position data relative to the
regression-based methods attempt to improve performance under                  screen. One limitation is that the light sources should be placed
larger head movements through compensation, or by adding addi-                 right on the corners of the screen. In practice the method is highly
tional cameras [Hansen and Ji 2010]. The 3D model-based meth-                  sensitive to the individual eye and formal analysis of the method is
ods, on the other hand, directly compute the gaze direction from               presented by Kang et al. [2008]. They identified two main sources
the eye features based on a geometric model of the eye. Most 3D                of errors: (1) the angular offset between visual and optical axes and
model-based (or geometric) approaches rely on metric information               (2) the offset between pupil and glint planes. Depending on the
and thus require camera calibration and a global geometric model               point configuration, the cross-ratio is also known for not being par-
(external to the eye) of light sources, camera and monitor position            ticularly robust to noise, since small changes in point positions can
and orientation. Gaze direction is modeled either as the optical axis          result in large variations in the cross-ratio.
or the visual axis. The optical axis is the line connecting the pupil
center, cornea center and the eyeball center. The line connecting              3 Homography Normalization for Gaze Esti-
the fovea and the center of the cornea is the visual axis. The visual
axis is presumably the true direction of gaze. The visual and optical
                                                                                 mation
axes intersect at the cornea center with subject dependent angular
                                                                               This section presents the fundamental model for a robust point of
offsets. In a typical adult, the fovea is located about 4 − 5◦ horizon-
                                                                               regard estimation method in uncalibrated setups (a priori unknown
tally and about 1.5◦ below the point of the optic axis and the retina
                                                                               geometry and camera parameters). The components of the model
and may vary up to 3◦ vertically between subjects. Much of the the-
                                                                               are illustrated in figure 1.
ory behind geometric models using fully calibrated setups, has been
formalized by Guestrin and Eizenman [2006]. Their model covers
                                                                                    L2          L1
a variable number of light sources and cameras, human specific pa-
rameters, light source positions, refraction, and camera parameters
but is limited by only applying to fully calibrated setups. Methods
                                                                                                                                               Cornea
relying on fully calibrated setups are most common in commercial                    L3                            l1
and research-based systems but are limited for public use unless                                                 l2
placed in a rigid setup. Any change (e.g. placing the camera dif-
ferently or changing the zoom of the camera) requires a tedious                                                 l3
recalibration.                                                                                                                            Πc
                                                                               L4
                                                                                                                l4                             Pupil
An alternative to the fully calibrated systems while allowing for
head movements is to use projective invariants and multiple light                          Πs                                           fc p
sources [Yoo and Chung 2005; Coutinho and Morimoto 2006].                                                                                    c         C
Contrary to the previous methods, Yoo et al. [2005] describe a
method which is capable of determining the point of regard based                                Camera          Πi
solely on the availability of light source position information (e.g.                            Center
no camera calibration or prior knowledge of rigid transformations
between hardware units) by exploiting the cross-ratio of four points           Figure 1: Geometric model of the human eye, light sources, screen,
(light sources) in projective space. Yoo et al. [2005] use two cam-            camera and projections (dashed line). The pupil is depicted as an
eras and four IR light sources placed around the screen to project             ellipse with center pc and the cornea as a hemisphere with center
these corners on the corneal surface, but only one camera is needed            C. The corneal-reflection plane, Πc , and its projection in the image
for gaze estimation. When looking at the screen the pupil center               are shown by quadrilaterals. Both Πc and the cornea focal point,
should ideally be within the four glint area. A fifth IR light emitter          fc , are displaced relative to each other and to the pupil center for
is placed on-axis to produce bright pupil images and to be able to             illustration purposes.
account for non-linear displacements (modeled by four αi parame-
ters) of the glints. The method of Yoo et al. [2005] was shown to be           The cornea is approximately spherical and has a radius, Rc , about
prone to large person specific errors [Coutinho and Morimoto 2006]              7.8mm. The cornea reflects light similarly to a convex mirror and
and can only use the light sources for calibration (e.g. not on other          has a focal point, fc , located halfway between the corneal surface



                                                                          14
and the center of corneal curvature (fc = Rc ≈ 3.9 mm). Re-
                                            2
                                                                                                                             n      n
                                                                                (normalized plane) spanned by four points g1 . . . g4 . Πn represents
flections on the cornea consequently appear further away than the                the (unknown) corneal-reflection plane given up to a homography.
                                                                                      n                                                            n
corneal surface (a.k.a virtual reflections).                                     Let gj (j = 1..4) be the corners of the unit square and define Hi
                                                                                            n      n
                                                                                such that gj = Hi gj . Notice, using the screen corners to span the
Denote the screen plane Πs and four (virtual) reflection on the                  normalized space would be equally viable. The basic idea is that the
          c       c
cornea (g1 . . . g4 ). The reflections may come from any point in 3D                                                                    n
                                                                                pupil is mapped to the normalized space through Hi to normalize
space, for example external light sources (Li ) or the corners of a             the effects of head pose prior to any calibration or gaze estimation
screen reflected on the cornea. The issue of screen projections will                            s
                                                                                procedure (Fn , in figure 2). The mapping of the reflections from
be addressed in section 5.3. For the sake of simplicity and with-                                                                      s     s
                                                                                the image Πi to the screen Πs via Πn is therefore Hi = Hn ◦ Hi .   n
                                                               c     c
out loss of generality, the following description assumes (g1 . . . g4 )                                   s                              s
                                                                                That is, a homography Hn is a sufficient model for Fn when the
come from point light sources. Provided the eye is stationary then              pupil and Πc coincide.
any location of a light source, Li , on li with same direction produce
                                                                                  s
the same point of reflection on the cornea. The light sources can                Hi can be found through a user calibration consisting of a min-
therefore and interchangeably be assumed located on e.g. the screen             imum of 4 calibration targets, t1 . . . tN on the screen. Denote the
plane Πs or at infinity as depicted in figure 1. Projected points at              general principle of normalizing eye data (pupil center, pupil or lim-
infinity lie in the focal plane of the convex mirror. With four light            bus contours) with respect to the reflections by homography nor-
                                                                                                                         s      s
source there will exist a plane Πc (in fact a family of planes related          malization. The method of using Fn = Hn in connection with
by homographies), spanned by the lines li . This plane is denoted               homography normalization is referred to as (Hom).
the corneal-reflection plane and is close to fc when Li at infin-
ity. When considering the reflection laws (e.g. not a projection) the            The cross-ratio method do not model the visual axis well [Kang
corneal reflections may only be approximately planar.                            et al. 2008]. Homography normalization, on the other hand, does
                                                                                model the offset between the optical and visual axes to a much
Without loss of generality suppose the light sources are located on             higher degree. Points in normalized space are based on the pupil
                                   c        c
Πs . The quadrilateral of glints (g1 . . . g4 ) is consequently related         center i.e. a model of the optical axis without the interference of
                                      i        i
to the corresponding quadrilateral (g1 . . . g4 ) in the image via a ho-        head movements. However, as offsets between the optical and vi-
               i
mography, Hc , from the cornea (Πc ) to the image (Πi ) [Hartley                sual axes correspond to translations in normalized space, the visual
                                                                                                                                       s       s
and Zisserman 2004]. Similarly, the mapping from the cornea to                  and optical axis offset is modeled implicitly through Fn = Hn .
                                                   s
the screen is also given by a homography Hc . The homography
                                  s           s       c
from the image to the screen Hi = Hc ◦ Hi via the Πc will                       3.1   Model Error from Planarity Assumption
therefore exist regardless of the location of the cornea, provided
the geometric setup does not change. These arguments also apply                 The previous section describes a generalized approach for head
to cross-ratio-based methods [Coutinho and Morimoto 2006; Yoo                   pose invariant PoR estimation under the assumption that the pupil
and Chung 2005].                                                                and Πc coincide. If the pupil had been located on Πc , it would
                                                                                be a head pose invariant gaze estimation method that models the
The pupil center is located about 4.2 mm from the cornea center
                                                                                visual and optical axis offset. Euclidean information is not avail-
but its location vary between subjects and over time for a particu-
                                                                                able in uncalibrated settings. Using metric information (e.g. be-
lar subject [Guestrin and Eizenman 2006]. However, the pupil is
                                                                                tween the pupil and the Πc ) does therefore not apply in this setting.
located approximately 0.3 mm (| Rc − 4.2|) from the corneal focal
                                    2                                           This section provides an analysis of the model error and section
point, fc , and thus also close to Πc . In the following suppose that
                                                                                3.2 discusses an approach to accommodate the errors. Figure 3 il-
Πc and the pupil coincide. The pupil may under these assumptions
                        s                                                       lustrates two different gaze directions and the associated modeling
be mapped through Hi from the image to the screen via the corneal
                                                                                error measured from the camera.
reflections.
                                                                                                Camera center
Image space                     Normalized space        Screen
                      Pupil
            gi                  gn
                                              gn
 gi           2                  1
                                        pc
                                         n     2                                                 Camera
  1                                                                                             optical axis
                         n
                        H   i                      Fs
                                                    n
 gi           gi                 gn           gn                 PoR                                                                   Gaze
                                                                                                                                    direction 1
  3               4                 3          4                                                           Gaze
                                                                                                        direction 2
      pci
                                                                                                                                       Πc
                                                                                                                                X
                                                                                                                           X
                                                                                                                               e1
Figure 2: (left) Reflection points (crosses) and the pupil (gray el-                                                   e2
lipse) are observed in the image and (middle) the pupil mapped to                                       Pupil position 2            Pupil position 1

the normalized space using the four reflection points. (right) from
the normalized space the pupil is mapped to the point of regard.
                                                                                Figure 3: Projected differences between pupil and the correspond-
These basic observations are sufficient to describe the fundamen-                ing point on Πc for two gaze directions. Πc is kept constant for
tal and simple algorithm for PoR estimation in an uncalibrated set-             clarity.
ting. The method is illustrated in figure 2 and is based on locating
                                 i      i
and tracking four reflections (g1 . . . g4 ) (e.g. glints) and the pupil
in the image. The pupil center, pc , will be used in the following              When the user looks away from the camera (’gaze direction 1’) it is
description. However, the presented method may alternatively use                evident that the error in the image plane is related to the projected
the limbus center or the pupil/limbus ellipse contours directly in the          line segment (between the point on Πc and the actual location of
mapping since homographies allow for mappings of points, lines                  the pupil), el , onto the image plane. A gaze vector directed to-
and conics.                                                                     wards the camera (’gaze direction 2’) yields a point and therefore
                                                                                no error. Hence equal angular offsets from the optical axis of the
It is convenient, though not necessary, to define a virtual plane, Πn ,          camera generate offset vectors ∆c (i, j) with the same magnitude



                                                                           15
when viewed from the camera. The largest magnitude of errors oc-                                                                          seen for single or dual glint systems [Morimoto and Mimica 2005].
cur when the gaze direction is perpendicular to the optical axis of                                                                       One of the limitation when using polynomials is that any increase
the camera. The magnitude field |∆c (i, j)| in camera coordinates                                                                          of the order of the polynomial would require additional calibration
consequently consists of elliptic iso-contours, centered around the                                                                       targets in order to estimate the parameters of the polynomial. A cu-
optical axis of the camera. However, it is the error, ∆s , in screen                                                                      bic polynomial seem to be a good approximation for ∆i [Cerrolaza
coordinates, that is of interest. The true point of regard in screen co-                                                                  et al. 2008], however it would require at least 10 calibration targets.
ordinates, ρ∗ = ρs + ∆s is a function of the estimated gaze ρs and
            s     ˆ                                               ˆ                                                                       Different from the ’weight space’ approach of polynomials is the
the error ∆s . That is ρ∗ = Hi (pc + ∆i ) = Hi pc + Hi ∆i , hence
                         s
                                 s                 s         s
                                                                                                                                          function view approach of Gaussian processes (GP). Gaussian pro-
                                  s
errors on the screen ∆s = Hi ∆i are merely errors in the camera                                                                           cess (GP) interpolation method is used to estimate ∆i by using a
propagated to the screen through the homography. An example of                                                                            squared exponential covariance function [Rasmussen and Williams
the error vector field, ∆s , using a simulator and the corresponding                                                                       2006]:
vector magnitudes is shown in Figure 4.
                                                                                                                                                                                 1 |xp − xq |
                                                                                                                                                   cov(xp , xq ) = k1 ∗ exp(−           2
                                                                                                                                                                                              ) + k3 σ 2
                                           Calibration Targets                                                                                                                   2     k2
              Vector field of PoR errors                  Magnitudes of PoR error vector field
  16



  14
                                                                                                                                          where xp and xq are data points and ki are weights. GP’s have
  12
                                                                                                                                          several innate properties that make them highly suited for gaze es-
  10
                                                                                                                                16
                                                                                                                                          timation. Gaussian processes do not model weights directly and
   8
                                                                                                                           12
                                                                                                                               14
                                                                                                                                          thus there are no requirements on the minimum number of calibra-
   6                                          0.015
                                                                                                                          10              tion targets needed to infer model parameters. Each additional cal-
   4
                                               0.01
                                                                                                                  6
                                                                                                                      8
                                                                                                                                          ibration target provides additional information that will be used to
   2
                                              0.005


                                                  0
                                                                                                              4                           increase accuracy. Each estimate also comes with an error measure-
                                                                                                                                          ment which, via the covariance function, is related to the distance
                                                                                                          2
                                                      0
                                                               5
   0                                                                            10                    0
       0         5                   10           15                                             15

                                                                                                                                          from the input data to the calibration data. This information can
                                           Camera location                                                                                potentially be used to regularize output data. The exponential co-
                                                                                                                                          variance function has been adopted since it is highly smooth (like
                                                                                                                                          ∆i ) and it makes it possible to account for noise directly in the co-
Figure 4: (left) Error vector field and (right) corresponding mag-                                                                         variance function through k3 σ 2 . In the following we denote with
nitudes obtained from simulated data. Crosses indicate calibration                                                                                               s
                                                                                                                                          (GP) the method of Fn that use (Hom) together with Gaussian pro-
targets and the circles the projection of the camera center.                                                                              cess modeling of ∆i .

To argue for the characteristics of ∆s it is without loss of general-                                                                     4 Assessment on Simulated Data
ity and for the sake of simplicity assumed that only four calibration
points, (t1 . . . t4 ), are used (crosses in figure 4). When estimat-
                            s                                                                                                             Head pose, head position, the offset between visual and optical
ing the homography, Hi , through user calibration, the errors in the
                                                                                                                                          axes, refraction, measurement noise, relative position of hardware
calibration targets, ∆s (ti) = 0, are minimized to zero and there
                                                                                                                                          and camera parameters are factors that mostly influence the accu-
will therefore be 5 points (calibration targets and the camera opti-
                                                                                                                                          racy of gaze estimation methods. We will in the following sec-
cal axis) where the ∆s is zero.
                                                                                                                                          tions evaluate the homography normalization methods ((Hom) and
One way of thinking of a homography is that it generates a linear                                                                         (GP)) to the cross-ratio methods ((Yoo)[Yoo and Chung 2005] and
                                            s                                                                                             (Cou)[Coutinho and Morimoto 2006]). These methods have been
vector field of displacements. ∆s = Hi ∆i is therefore a compo-
sition of two vector fields (∆s = Vh + ∆i ), a linear vector field                                                                          chosen since they operate under similar premises as homography
corresponding to the homography (Vh ) and an ellipsoidal vector                                                                           normalization (e.g. uncalibrated/semi-calibrated setup). Simulated
field ∆i . Since ∆s (ti ) = 0 then Vh (ti ) = −∆s (ti ). Vh (ti ) is                                                                       data is used in this section to be able to asses the effects of potential
consequently defined through the negative error vectors of ∆i (ti ).                                                                       noise-factors separately. The simulator [B¨ hme et al. 2008] allows
                                                                                                                                                                                         o
It is worth noting that as the camera location is unknown due to the                                                                      for detailed modeling of the different components of the setup and
uncalibrated setup assumption and the location of the maximum er-                                                                         eye specific parameters. The evaluation is divided according to the
ror depends on the location of the camera, it would be impossible                                                                         presence of head movements and the number of calibration targets
to determine the extremal location without additional information.                                                                        (N). Notice the methods, except (Yoo), allow for multiple on-screen
However, despite of this, it is be shown in the following sections                                                                        calibration targets. The effects of eye specific parameters such as
that it is possible through homography normalization to obtain re-                                                                        refraction and offset between the visual and optical axis as well
sults quite similar to fully calibrated setups.                                                                                           as the effect of the number of calibration targets and errors asso-
                                                                                                                                          ciated with the model assumptions are evaluated when the head is
3.2        Modeling Error Vectors                                                                                                         fixed (section 4.2). The methods are examined with respect to head
                                                                                                                                          movements in section 4.3. In some experiments the (GP) method
This section discusses one approach of modeling the error caused                                                                          has been left out since it is a derivative of (Hom) and would not alter
by the non-coplanarity of Πc and the pupil. Even though the loca-                                                                         the inherent properties of using homography normalization, it only
tion of the largest errors cannot be determined (a priori) due to the                                                                     makes a difference to the accuracy when the number of calibration
uncalibrated setup, it may be worthwhile to accommodate the er-                                                                           targets is larger than four (N > 4).
rors to the extent possible. That is to estimate a vector field similar
to figure 4. When the camera is placed outside the screen area, the                                                                        4.1   Setup
error due to the homography is zero in 5 points (e.g. the calibration
targets and the camera projection center) and non-zero elsewhere.                                                                         The camera is located slightly below and to the right of the cen-
                     s
After estimating Hi it is possible to measure the error due to the ho-                                                                    ter of the screen as to simulate a realistic setup (e.g. users do not
mography for each additional calibration target. Since the error vec-                                                                     place the components in an exact position). All tests have been
tor field is smooth, a simplified yet effective approach would be to                                                                        conducted with the same camera focal length. The cornea is mod-
model the error through polynomials in a similar way as previously                                                                        eled as a sphere with radius 7.98 mm. Four light sources are placed



                                                                                                                                     16
at the corners of a planar surface (screen) to be able compare ho-                                                                                                                                                    offset, γ ( with β = 0), has a significant effect on the accuracies of
mography and cross-ratio methods. In the following denote with                                                                                                                                                        the cross-ratio methods but not on homography normalization. The
N the number of calibration targets. γ and β correspond to the an-                                                                                                                                                    reason is that homography normalization models the optical visual
gular offsets between the visual and optical axes in horizontal and                                                                                                                                                   offset to a much higher degree.
vertical directions, respectively.

4.2                                            Stationary Head                                                                                                                                                                        Accuracy with variable optical/visual−axis offset
                                                                                                                                                                                                                                                       3.5
                                                                                                                                                                                                                                                                                                               Yoo
                                                                                                                                                                                                                                                                                                               Cou
Basic Settings and Refraction       In this section the methods are                                                                                                                                                                                     3                                                      Hom

evaluated as if the head is kept still while gazing at a uniformly




                                                                                                                                                                                                                               On−screen error (deg)
distributed set of 64 × 64 targets. Figure 5 shows the mean ac-                                                                                                                                                                                        2.5

curacy (degrees) with error-bars (variance) in the hypothetical eye
                                                                                                                                                                                                                                                        2
model, where there is no offset between visual and optical axes
E0 = {γ = β = 0} and a more realistic setting with eye model
                                                                                                                                                                                                                                                       1.5
E1 ={γ = 4.5, β = 1.5}. Each sub-figure shows the cases where
refraction is included and when it is not. E0 is a physically infea-                                                                                                                                                                                    1
sible setup since the optical and visual axis are different, but the
model avoids eye specific biases. It is clear from figure 5 that the                                                                                                                                                                                     0.5
methods exhibit similar accuracies in E0 , but the offset between vi-
sual and optical axes in E1 makes a notable difference between the                                                                                                                                                                                      0
                                                                                                                                                                                                                                                        −5   −3.9 −2.8 −1.7 −0.6   0.6   1.7   2.8   3.9   5
methods. Refraction has only a minor effect on the methods.                                                                                                                                                                                                             Offset (degrees)

                                                   Influence of refraction with eye model 0
                                                                                                                                                           Influence of refraction with eye model 1
                                                                                                                                                                                                                            Figure 7: Accuracy as a function of the angular offset.
                                   0.8
                                                                                         Refraction                                        3.5
                                                                                         No refraction                                                                                           Refraction
                                   0.7                                                                                                                                                           No refraction
                                                                                                                                               3
                                   0.6
           Error magnitude (deg)




                                                                                                                                           2.5
                                                                                                                   Error magnitude (deg)




                                   0.5                                                                                                                                                                                4.3   Head Movements
                                                                                                                                               2
                                   0.4
                                                                                                                                           1.5
                                   0.3

                                   0.2                                                                                                         1                                                                      Gaze trackers should ideally be head pose invariant. This section
                                   0.1                                                                                                     0.5
                                                                                                                                                                                                                      evaluates the methods in scenarios where the eye location changes
                                       0                                                                                                       0
                                                                                                                                                                                                                      in space (±300 mm in both x and y directions from the camera
                                                   Yoo                  Cou              Hom                                                               Yoo                Cou                Hom
                                                                      Method                                                                                                 Method                                   center) but the target location remains fixed on the screen.

Figure 5: Comparison of methods (with/without refraction) when
the head is kept still using eye model (left) E0 =(γ = β = 0) and                                                                                                                                                     Influence of N and γ        Figure 8 shows the accuracies of using
(right) eye model E1 =(γ = 4.5, β = 1.5) and N = 4 calibration                                                                                                                                                        a variable number of calibration targets and eye parameters in the
targets.                                                                                                                                                                                                              presence of head movements. The results show similarities to the
                                                                                                                                                                                                                      head still experiments by also revealing that the offset between the
                                                                                                                                                                                                                      optical and visual axes makes a significant difference to the cross-
Changing N       The previous test is based on a minimum number                                                                                                                                                       ratio methods, but not to the homography-based methods. The
of calibration targets. However, the methods may, besides (Yoo),                                                                                                                                                      number of calibration targets has only a minor effect on accuracy.
improve accuracy as the N uniformly distributed calibration targets                                                                                                                                                   Non-linear modeling improves accuracy and especially the differ-
increase. Figure 6 shows accuracy of the methods as a function of                                                                                                                                                     ence between 4 and 9 calibration targets makes a significant dif-
N for both eye models. (GP) exhibit a rapid increase of accuracy                                                                                                                                                      ference. When considering the nuisance of calibration and the ob-
when increasing N . Both (Hom) and (Cou) may be improved by                                                                                                                                                           tained accuracy, it is task dependent whether the rather small in-
increasing N , but large N implies a accuracy decrease for (Cou).                                                                                                                                                     crease in accuracy between 9 and 16 calibration targets is worth-
The accuracy for (Yoo) is as expected.                                                                                                                                                                                while.

                                   Varying the number of calibration targets eye model 0                                                   Varying the number of calibration targets eye model 1
                      0.8                                                                                                     3.5
                                                                                                Yoo
                                                                                                Cou
                                                                                                                                                                                                           Yoo
                                                                                                                                                                                                           Cou        Depth Translation       The methods analyzed here are all using
                      0.7                                                                       Hom                                                                                                        Hom
                                                                                                GP
                                                                                                                                           3
                                                                                                                                                                                                           GP         properties on projective planes. Movements in depth is therefore
                      0.6
                                                                                                                              2.5                                                                                     not an inherent property to the methods. The influence of head
 Accuracy (deg)




                                                                                                         Accuracy (deg)




                      0.5
                                                                                                                                           2                                                                          movements will therefore be examined by evaluating head move-
                      0.4
                                                                                                                              1.5
                                                                                                                                                                                                                      ments as translations parallel to the screen plane (or equivalently
                      0.3

                                                                                                                                           1
                                                                                                                                                                                                                      Πc ) as depicted in figure 9 and movements in depth (figure 10). A
                      0.2
                                                                                                                                                                                                                      single depth is used for calibration. The results show that none of
                                                                                                                              0.5
                      0.1
                                                                                                                                                                                                                      the methods are invariant to neither depth or in-plane translations,
                                   0
                                           4   9   16   25       36         49
                                                    Number of calibration targets
                                                                                    64
                                                                                                                                           0
                                                                                                                                                   4   9   16   25       36         49
                                                                                                                                                            Number of calibration targets
                                                                                                                                                                                            64                        but that the homography normalization-based methods have better
                                                                                                                                                                                                                      performance. For depth changes larger than 150 mm (see figure 10)
Figure 6: Changing the number of calibration targets, N , for E0                                                                                                                                                      the (GP) method does not perform as well as (Hom). The reason is
(left) E1 (right).                                                                                                                                                                                                    that the learned offsets in (GP) are only valid for a single scale.

                                                                                                                                                                                                                      The graphs in figure 10 show the accuracy as a function of depth
Offset between Visual and Optical Axes      There is a noticeable                                                                                                                                                     changes (from the calibration depth) when using different eye pa-
accuracy difference when using E0 and E1 in the previous experi-                                                                                                                                                      rameters (E0 and E1 ) and with a variable number of calibration
ments. Figure 7 shows that the influence of the angular horizontal                                                                                                                                                     targets, N .



                                                                                                                                                                                                                 17
Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups
Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups
Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups

Mais conteúdo relacionado

Mais procurados

EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...
EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...
EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...IJNSA Journal
 
Virtual viewpoint three dimensional panorama
Virtual viewpoint three dimensional panoramaVirtual viewpoint three dimensional panorama
Virtual viewpoint three dimensional panoramaijcseit
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksKalle
 
Portable Multispectral Fundus Camera
Portable Multispectral Fundus CameraPortable Multispectral Fundus Camera
Portable Multispectral Fundus CameraPetteriTeikariPhD
 
AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...
AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...
AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...cscpconf
 
An exploration of periocular region with reduced region for authentication re...
An exploration of periocular region with reduced region for authentication re...An exploration of periocular region with reduced region for authentication re...
An exploration of periocular region with reduced region for authentication re...csandit
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Time-resolved biomedical sensing through scattering medium
Time-resolved biomedical sensing through scattering mediumTime-resolved biomedical sensing through scattering medium
Time-resolved biomedical sensing through scattering mediumPetteriTeikariPhD
 
A maskless exposure device for rapid photolithographic prototyping of sensor ...
A maskless exposure device for rapid photolithographic prototyping of sensor ...A maskless exposure device for rapid photolithographic prototyping of sensor ...
A maskless exposure device for rapid photolithographic prototyping of sensor ...Dhanesh Rajan
 
Droege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesDroege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesKalle
 
New refractive optical MIOL concepts _ 
Comparison of different optical systems
New refractive optical MIOL concepts _ 
Comparison of different optical systemsNew refractive optical MIOL concepts _ 
Comparison of different optical systems
New refractive optical MIOL concepts _ 
Comparison of different optical systemsBreyer, Kaymak & Klabe Augenchirurgie
 

Mais procurados (16)

EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...
EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...
EFFECTIVENESS OF FEATURE DETECTION OPERATORS ON THE PERFORMANCE OF IRIS BIOME...
 
Dohyoung lee icassp2012_poster
Dohyoung lee icassp2012_posterDohyoung lee icassp2012_poster
Dohyoung lee icassp2012_poster
 
Advanced Retinal Imaging
Advanced Retinal ImagingAdvanced Retinal Imaging
Advanced Retinal Imaging
 
Virtual viewpoint three dimensional panorama
Virtual viewpoint three dimensional panoramaVirtual viewpoint three dimensional panorama
Virtual viewpoint three dimensional panorama
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
 
Portable Multispectral Fundus Camera
Portable Multispectral Fundus CameraPortable Multispectral Fundus Camera
Portable Multispectral Fundus Camera
 
AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...
AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...
AN EXPLORATION OF PERIOCULAR REGION WITH REDUCED REGION FOR AUTHENTICATION : ...
 
An exploration of periocular region with reduced region for authentication re...
An exploration of periocular region with reduced region for authentication re...An exploration of periocular region with reduced region for authentication re...
An exploration of periocular region with reduced region for authentication re...
 
P-iris
P-irisP-iris
P-iris
 
Defending
DefendingDefending
Defending
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Time-resolved biomedical sensing through scattering medium
Time-resolved biomedical sensing through scattering mediumTime-resolved biomedical sensing through scattering medium
Time-resolved biomedical sensing through scattering medium
 
A maskless exposure device for rapid photolithographic prototyping of sensor ...
A maskless exposure device for rapid photolithographic prototyping of sensor ...A maskless exposure device for rapid photolithographic prototyping of sensor ...
A maskless exposure device for rapid photolithographic prototyping of sensor ...
 
Droege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesDroege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution Images
 
Multibiometrics ver5
Multibiometrics ver5Multibiometrics ver5
Multibiometrics ver5
 
New refractive optical MIOL concepts _ 
Comparison of different optical systems
New refractive optical MIOL concepts _ 
Comparison of different optical systemsNew refractive optical MIOL concepts _ 
Comparison of different optical systems
New refractive optical MIOL concepts _ 
Comparison of different optical systems
 

Destaque

Stroke Victim
Stroke VictimStroke Victim
Stroke Victimodorox
 
Manual nokia e75 1
Manual nokia e75 1Manual nokia e75 1
Manual nokia e75 1rockorolas
 
Guía de matrícula 2011 2012 Universidad de Burgos
Guía de matrícula 2011 2012 Universidad de BurgosGuía de matrícula 2011 2012 Universidad de Burgos
Guía de matrícula 2011 2012 Universidad de BurgosOscar Herrera
 
Cala webinar on mbh august por 2012 - v02 pt-br-1
Cala webinar on mbh   august por 2012 - v02 pt-br-1Cala webinar on mbh   august por 2012 - v02 pt-br-1
Cala webinar on mbh august por 2012 - v02 pt-br-1Rafael Junquera
 
Manual alarma or bi-t6-userman-spn
Manual alarma or bi-t6-userman-spnManual alarma or bi-t6-userman-spn
Manual alarma or bi-t6-userman-spnmarlenebelgica2012
 
Futureofthequalitymanagementprinciples 1
Futureofthequalitymanagementprinciples 1Futureofthequalitymanagementprinciples 1
Futureofthequalitymanagementprinciples 1Paul Robere
 
Tecnología de la comunicación
Tecnología de la comunicaciónTecnología de la comunicación
Tecnología de la comunicaciónAna Caballero
 
Presentación Equipo Ciclista Cartucho.es 2016
Presentación Equipo Ciclista Cartucho.es 2016Presentación Equipo Ciclista Cartucho.es 2016
Presentación Equipo Ciclista Cartucho.es 2016EC Cartucho es
 
IBM Lotus Mobile Strategy
IBM Lotus Mobile StrategyIBM Lotus Mobile Strategy
IBM Lotus Mobile StrategyDvir Reznik
 
Mapa Visual Web 2.0
Mapa Visual Web 2.0Mapa Visual Web 2.0
Mapa Visual Web 2.0MarcusC
 
ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...
ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...
ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...pharmaindexing
 
Fluid Layouting Techniques - Over The Air 2009
Fluid Layouting Techniques - Over The Air 2009Fluid Layouting Techniques - Over The Air 2009
Fluid Layouting Techniques - Over The Air 2009Daniel Herzog
 
254 2014 15-international postgraduate admission_uou_2014
254 2014 15-international postgraduate admission_uou_2014254 2014 15-international postgraduate admission_uou_2014
254 2014 15-international postgraduate admission_uou_2014Dinh Quyet
 
Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...
Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...
Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...La French Team
 
Punteros y elementos dinámicos en c++
Punteros y elementos dinámicos en c++Punteros y elementos dinámicos en c++
Punteros y elementos dinámicos en c++Tensor
 
Product list of fancy ladies bags
Product list of fancy ladies bagsProduct list of fancy ladies bags
Product list of fancy ladies bagsArijit Chatterjee
 
Towards User-defined Cross-Device Interaction
Towards User-defined Cross-Device InteractionTowards User-defined Cross-Device Interaction
Towards User-defined Cross-Device InteractionAudrey Sanctorum
 

Destaque (20)

Stroke Victim
Stroke VictimStroke Victim
Stroke Victim
 
Manual nokia e75 1
Manual nokia e75 1Manual nokia e75 1
Manual nokia e75 1
 
Guía de matrícula 2011 2012 Universidad de Burgos
Guía de matrícula 2011 2012 Universidad de BurgosGuía de matrícula 2011 2012 Universidad de Burgos
Guía de matrícula 2011 2012 Universidad de Burgos
 
Cala webinar on mbh august por 2012 - v02 pt-br-1
Cala webinar on mbh   august por 2012 - v02 pt-br-1Cala webinar on mbh   august por 2012 - v02 pt-br-1
Cala webinar on mbh august por 2012 - v02 pt-br-1
 
Manual alarma or bi-t6-userman-spn
Manual alarma or bi-t6-userman-spnManual alarma or bi-t6-userman-spn
Manual alarma or bi-t6-userman-spn
 
Futureofthequalitymanagementprinciples 1
Futureofthequalitymanagementprinciples 1Futureofthequalitymanagementprinciples 1
Futureofthequalitymanagementprinciples 1
 
Tecnología de la comunicación
Tecnología de la comunicaciónTecnología de la comunicación
Tecnología de la comunicación
 
Presentación Equipo Ciclista Cartucho.es 2016
Presentación Equipo Ciclista Cartucho.es 2016Presentación Equipo Ciclista Cartucho.es 2016
Presentación Equipo Ciclista Cartucho.es 2016
 
IBM Lotus Mobile Strategy
IBM Lotus Mobile StrategyIBM Lotus Mobile Strategy
IBM Lotus Mobile Strategy
 
Mapa Visual Web 2.0
Mapa Visual Web 2.0Mapa Visual Web 2.0
Mapa Visual Web 2.0
 
ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...
ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...
ANTIUROLITHIATIC ACTIVITY OF ACTIVITY OF TECOMA STANS LEAF EXTRACT *1Anil kum...
 
Fluid Layouting Techniques - Over The Air 2009
Fluid Layouting Techniques - Over The Air 2009Fluid Layouting Techniques - Over The Air 2009
Fluid Layouting Techniques - Over The Air 2009
 
254 2014 15-international postgraduate admission_uou_2014
254 2014 15-international postgraduate admission_uou_2014254 2014 15-international postgraduate admission_uou_2014
254 2014 15-international postgraduate admission_uou_2014
 
Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...
Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...
Les friches culturelles d’hier à aujourd’hui : entre fabriques d’art et démar...
 
Presentación precios trabajo-final-6-a-mkt
Presentación precios trabajo-final-6-a-mktPresentación precios trabajo-final-6-a-mkt
Presentación precios trabajo-final-6-a-mkt
 
Punteros y elementos dinámicos en c++
Punteros y elementos dinámicos en c++Punteros y elementos dinámicos en c++
Punteros y elementos dinámicos en c++
 
Product list of fancy ladies bags
Product list of fancy ladies bagsProduct list of fancy ladies bags
Product list of fancy ladies bags
 
SGM Glesmann Bio2
SGM Glesmann Bio2SGM Glesmann Bio2
SGM Glesmann Bio2
 
Presentacion asLAN 2013 Jazztel Empresas_v1
Presentacion asLAN 2013 Jazztel Empresas_v1Presentacion asLAN 2013 Jazztel Empresas_v1
Presentacion asLAN 2013 Jazztel Empresas_v1
 
Towards User-defined Cross-Device Interaction
Towards User-defined Cross-Device InteractionTowards User-defined Cross-Device Interaction
Towards User-defined Cross-Device Interaction
 

Semelhante a Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups

Skovsgaard.2011.evaluation of a remote webcam based eye tracker
Skovsgaard.2011.evaluation of a remote webcam based eye trackerSkovsgaard.2011.evaluation of a remote webcam based eye tracker
Skovsgaard.2011.evaluation of a remote webcam based eye trackermrgazer
 
Gait analysis report
Gait analysis reportGait analysis report
Gait analysis reportconoranthony
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Kalle
 
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...cscpconf
 
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...csandit
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3Cody Liu
 
Wireless Vision based Real time Object Tracking System Using Template Matching
Wireless Vision based Real time Object Tracking System Using Template MatchingWireless Vision based Real time Object Tracking System Using Template Matching
Wireless Vision based Real time Object Tracking System Using Template MatchingIDES Editor
 
Image Fusion of Video Images and Geo-localization for UAV Applications
Image Fusion of Video Images and Geo-localization for UAV ApplicationsImage Fusion of Video Images and Geo-localization for UAV Applications
Image Fusion of Video Images and Geo-localization for UAV ApplicationsIDES Editor
 
Coutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye TrackingCoutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye TrackingKalle
 
IRJET - A Review on Gradient Histograms for Texture Enhancement and Objec...
IRJET -  	  A Review on Gradient Histograms for Texture Enhancement and Objec...IRJET -  	  A Review on Gradient Histograms for Texture Enhancement and Objec...
IRJET - A Review on Gradient Histograms for Texture Enhancement and Objec...IRJET Journal
 
The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...
The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...
The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...CSCJournals
 
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robot
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robotIn tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robot
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robotSudhakar Spartan
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IVYu Huang
 
Impact of quality based fusion techniques for video-based iris recognition at...
Impact of quality based fusion techniques for video-based iris recognition at...Impact of quality based fusion techniques for video-based iris recognition at...
Impact of quality based fusion techniques for video-based iris recognition at...I3E Technologies
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
 
A biologically inspired cmos image sensor
A biologically inspired cmos image sensorA biologically inspired cmos image sensor
A biologically inspired cmos image sensorSpringer
 
A biologically inspired cmos image sensor
A biologically inspired cmos image sensorA biologically inspired cmos image sensor
A biologically inspired cmos image sensorSpringer
 

Semelhante a Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups (20)

Skovsgaard.2011.evaluation of a remote webcam based eye tracker
Skovsgaard.2011.evaluation of a remote webcam based eye trackerSkovsgaard.2011.evaluation of a remote webcam based eye tracker
Skovsgaard.2011.evaluation of a remote webcam based eye tracker
 
Gait analysis report
Gait analysis reportGait analysis report
Gait analysis report
 
X36141145
X36141145X36141145
X36141145
 
V01 i010405
V01 i010405V01 i010405
V01 i010405
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
 
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
 
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3
 
Wireless Vision based Real time Object Tracking System Using Template Matching
Wireless Vision based Real time Object Tracking System Using Template MatchingWireless Vision based Real time Object Tracking System Using Template Matching
Wireless Vision based Real time Object Tracking System Using Template Matching
 
Image Fusion of Video Images and Geo-localization for UAV Applications
Image Fusion of Video Images and Geo-localization for UAV ApplicationsImage Fusion of Video Images and Geo-localization for UAV Applications
Image Fusion of Video Images and Geo-localization for UAV Applications
 
Coutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye TrackingCoutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
 
IRJET - A Review on Gradient Histograms for Texture Enhancement and Objec...
IRJET -  	  A Review on Gradient Histograms for Texture Enhancement and Objec...IRJET -  	  A Review on Gradient Histograms for Texture Enhancement and Objec...
IRJET - A Review on Gradient Histograms for Texture Enhancement and Objec...
 
The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...
The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...
The Biometric Algorithm based on Fusion of DWT Frequency Components of Enhanc...
 
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robot
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robotIn tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robot
In tech vision-based_obstacle_detection_module_for_a_wheeled_mobile_robot
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
Impact of quality based fusion techniques for video-based iris recognition at...
Impact of quality based fusion techniques for video-based iris recognition at...Impact of quality based fusion techniques for video-based iris recognition at...
Impact of quality based fusion techniques for video-based iris recognition at...
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
A biologically inspired cmos image sensor
A biologically inspired cmos image sensorA biologically inspired cmos image sensor
A biologically inspired cmos image sensor
 
A biologically inspired cmos image sensor
A biologically inspired cmos image sensorA biologically inspired cmos image sensor
A biologically inspired cmos image sensor
 

Mais de Kalle

Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsKalle
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Kalle
 
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Kalle
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Kalle
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlKalle
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Kalle
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingKalle
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Kalle
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Kalle
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneKalle
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerKalle
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingKalle
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchKalle
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyKalle
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Kalle
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Kalle
 
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Kalle
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Kalle
 
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Kalle
 
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Kalle
 

Mais de Kalle (20)

Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
 
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze Alone
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem Solving
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement Study
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
 
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
 
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
 
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
 

Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups

  • 1. Homography Normalization for Robust Gaze Estimation in Uncalibrated Setups Dan Witzner Hansen∗ Javier San Agustin† Arantxa Villanueva‡ IT University, Copenhagen IT University, Copenhagen Public University of Navarra Abstract ubiquitous, and convenient for the general public. So far, it has not been possible to meet these constraints concurrently. Homography normalization is presented as a novel gaze estimation Many gaze models require a fully calibrated setup and detailed eye method for uncalibrated setups. The method applies when head models (a strong prior model) to be able to minimize user calibra- movements are present but without any requirements to camera cal- tion and maintain high accuracy. A major limitation of fully cal- ibration or geometric calibration. The method is geometrically and ibrated setups is that they require exact knowledge of the relative empirically demonstrated to be robust to head pose changes and positions of the camera, light sources and monitor. Geometric cal- despite being less constrained than cross-ratio methods, it consis- ibration is usually tedious and time consuming to perform and au- tently performs favorably by several degrees on both simulated data tomated techniques are sparse [Brolly and Mulligan 2004]. Slight and data from physical setups. The physical setups include the use unintentional movement of a system part or change in focal length of off-the-shelf web cameras with infrared light (night vision) and may result in a significant drop in accuracy when relying on a cali- standard cameras with and without infrared light. The benefits of brated setup. The accuracy is therefore difficult to maintain unless homography normalization and uncalibrated setups in general are the hardware is placed in a rigid setup. Such requirements add to also demonstrated through obtaining gaze estimates (in the visible the cost of the system. Gaze models may alternatively use mul- spectrum) using only the screen reflections on the cornea. tiple calibration points in order to be less dependent on prior as- sumptions (e.g. using polynomial approximations [Hansen and Ji Keywords: Eye tracking, Gaze estimation, Homography normal- 2010]). Models employing a weak prior model have not been able ization, Gaussian process, Uncalibrated setup, HCI to demonstrate head pose invariance to date. This paper will both geometrically and empirically demonstrate that 1 Introduction it is possible to obtain robust gaze estimation in the presence of head movements when using a weak prior model of the geometric Eye and gaze tracking have a long history but only recently have setup. The model relies on homography normalization and does gaze trackers become robust enough for use outside laboratories. not require any direct measurements of the relative position of the The precision of current gaze trackers is sufficient for many types screen, camera and light source, nor does it need camera calibra- of applications, but are we really satisfied with their current capa- tion. This means that it is possible to obtain a highly flexible eye bilities? tracker that can be made compact, mobile and suit individual needs. Both research and commercial gaze trackers have been driven by Besides, the method is very simple to implement. Homography nor- the urge to obtain high accuracy gaze position data while simpli- malization is shown to consistently provide higher accuracies than fying user calibration, often by reducing the number of points nec- cross-ratio-based methods on both simulated data (section 4) and essary for calibrating an individual user to the system. Both high data recorded from a physical setup (section 5). One reason for accuracy and few calibration points are desirable properties of a considering uncalibrated setups is to facilitate the general public gaze tracker, but they are not necessarily the only parameters which with affordable and flexible gaze trackers that are robust with regard should be optimized [Scott and Findlay 1993]. Price is obviously to head movements. In section 5.2 this is shown to be achievable an issue, but may be partially resolved with technological devel- through purely off-the-shelf components. It is additionally shown opments. Today even cheap web cameras are of sufficient quality possible to use screen reflections on the cornea as an alternative for reliable gaze tracking. In some situations, however, it would be to IR glints (section 5.3). Through this paper we intend to show convenient if light sources, cameras and monitors could be placed that flexible, mobile and low cost gaze trackers are indeed feasible according to particular needs rather than being constrained by man- without sacrificing significant accuracy. ufacturer specifications. Avoiding external light sources or allow- ing the user to change the zoom of the camera to suit their particular 2 Related Work needs would be desirable. Gaze models that support flexible setups eliminate the need for rigid frames that keep individual components The primary task of a gaze tracker is to determine gaze, where gaze in place and allow for more compact, lightweight, adaptable and may either be a gaze direction or the point of regard (PoR). Gaze perhaps cheap eye trackers. If the models employed in the gaze modeling consequently focuses on the relations between the image trackers only required a few calibration targets and could maintain data and gaze. A comprehensive review of eye and gaze models is accuracy while avoiding the need for light sources, then eye track- provided in Hansen & Ji [2010]. ing technology would take an important step towards being flexible, All gaze estimation methods need to determine a set of parame- ∗ e-mail: witzner@itu.dk ters through calibration. Some parameters may be estimated for † e-mail: javier@itu.dk each session by letting the user look at a set of predefined targets ‡ e-mail: avilla@unavarra.es on the screen, others need only be calculated once (e.g. human spe- cific parameters) and yet other parameters are estimated prior to use Copyright © 2010 by the Association for Computing Machinery, Inc. (e.g. camera parameters, geometric and physical parameters such Permission to make digital or hard copies of part or all of this work for personal or as angles and location between camera and monitor). A system classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the where the camera parameters and the geometry are a priori known first page. Copyrights for components of this work owned by others than ACM must be is termed fully calibrated [Hansen and Ji 2010]. honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. This paper focuses primarily on feature-based methods but alterna- Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail tive methods based on appearance also exist [Hansen and Ji 2010]. permissions@acm.org. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 13
  • 2. Feature-based methods explore the characteristics of the human eye screen positions). Coutinho and Morimoto [2006] extend the model to identify a set of distinctive and informative features around the of Yoo et al. [2005], by using the offset between visual and optical eyes that are less sensitive to variations in illumination and view- axes as an argument to learn a constant on-screen offset. They ad- point. Ensuring head pose invariance is a common problem often ditionally perform an elaborate evaluation of the consequences of solved through the use of external light sources and their reflections changing the calibration of the virtual calibration parameter (α). (glints) on the cornea. Besides the glints, the pupil is the most com- Based on this, they argue that a simpler model can be made by mon feature to use, since it is easy to extract in IR spectrum images. learning a single α value rather than four different values as orig- The image measurements (e.g. the pupil) however, are influenced inally proposed. Where calibration in [Yoo and Chung 2005] can by refraction [Guestrin and Eizenman 2006]. The limbus is less only be done by looking at the light sources in the screen corners, influenced by refraction, but since its boundary may be partially the method of [Coutinho and Morimoto 2006] may use multiple occluded, it may be more difficult to obtain reliable measurements. on-screen targets. Two types of feature-based gaze estimation approaches exist: the Since the cross-ratio is defined on projective planes and is invariant interpolation-based (regression-based) and the model-based (geo- to any projective transformation, scale changes will not influence metric) Using a single camera, the 2D regression methods model the cross-ratio. The method is therefore not directly applicable to the optical properties, geometry and the eye physiology indirectly depth translations. Coutinho and Morimoto [2006] show signifi- and may, therefore, be considered as approximate models which cant accuracy improvements compared to the original paper, pro- may not strictly guarantee head pose invariance. They are, how- vided the user does not change their distance to the camera and ever, simple to implement, do not require camera or geometric cal- monitor. The advantage of the method, compared to methods based ibration (a.k.a weak prior model) and may still provide good re- on calibrated setups, is that full hardware calibration is needless. sults under conditions of small head movements. More recent 2D The method only requires light source position data relative to the regression-based methods attempt to improve performance under screen. One limitation is that the light sources should be placed larger head movements through compensation, or by adding addi- right on the corners of the screen. In practice the method is highly tional cameras [Hansen and Ji 2010]. The 3D model-based meth- sensitive to the individual eye and formal analysis of the method is ods, on the other hand, directly compute the gaze direction from presented by Kang et al. [2008]. They identified two main sources the eye features based on a geometric model of the eye. Most 3D of errors: (1) the angular offset between visual and optical axes and model-based (or geometric) approaches rely on metric information (2) the offset between pupil and glint planes. Depending on the and thus require camera calibration and a global geometric model point configuration, the cross-ratio is also known for not being par- (external to the eye) of light sources, camera and monitor position ticularly robust to noise, since small changes in point positions can and orientation. Gaze direction is modeled either as the optical axis result in large variations in the cross-ratio. or the visual axis. The optical axis is the line connecting the pupil center, cornea center and the eyeball center. The line connecting 3 Homography Normalization for Gaze Esti- the fovea and the center of the cornea is the visual axis. The visual axis is presumably the true direction of gaze. The visual and optical mation axes intersect at the cornea center with subject dependent angular This section presents the fundamental model for a robust point of offsets. In a typical adult, the fovea is located about 4 − 5◦ horizon- regard estimation method in uncalibrated setups (a priori unknown tally and about 1.5◦ below the point of the optic axis and the retina geometry and camera parameters). The components of the model and may vary up to 3◦ vertically between subjects. Much of the the- are illustrated in figure 1. ory behind geometric models using fully calibrated setups, has been formalized by Guestrin and Eizenman [2006]. Their model covers L2 L1 a variable number of light sources and cameras, human specific pa- rameters, light source positions, refraction, and camera parameters but is limited by only applying to fully calibrated setups. Methods Cornea relying on fully calibrated setups are most common in commercial L3 l1 and research-based systems but are limited for public use unless l2 placed in a rigid setup. Any change (e.g. placing the camera dif- ferently or changing the zoom of the camera) requires a tedious l3 recalibration. Πc L4 l4 Pupil An alternative to the fully calibrated systems while allowing for head movements is to use projective invariants and multiple light Πs fc p sources [Yoo and Chung 2005; Coutinho and Morimoto 2006]. c C Contrary to the previous methods, Yoo et al. [2005] describe a method which is capable of determining the point of regard based Camera Πi solely on the availability of light source position information (e.g. Center no camera calibration or prior knowledge of rigid transformations between hardware units) by exploiting the cross-ratio of four points Figure 1: Geometric model of the human eye, light sources, screen, (light sources) in projective space. Yoo et al. [2005] use two cam- camera and projections (dashed line). The pupil is depicted as an eras and four IR light sources placed around the screen to project ellipse with center pc and the cornea as a hemisphere with center these corners on the corneal surface, but only one camera is needed C. The corneal-reflection plane, Πc , and its projection in the image for gaze estimation. When looking at the screen the pupil center are shown by quadrilaterals. Both Πc and the cornea focal point, should ideally be within the four glint area. A fifth IR light emitter fc , are displaced relative to each other and to the pupil center for is placed on-axis to produce bright pupil images and to be able to illustration purposes. account for non-linear displacements (modeled by four αi parame- ters) of the glints. The method of Yoo et al. [2005] was shown to be The cornea is approximately spherical and has a radius, Rc , about prone to large person specific errors [Coutinho and Morimoto 2006] 7.8mm. The cornea reflects light similarly to a convex mirror and and can only use the light sources for calibration (e.g. not on other has a focal point, fc , located halfway between the corneal surface 14
  • 3. and the center of corneal curvature (fc = Rc ≈ 3.9 mm). Re- 2 n n (normalized plane) spanned by four points g1 . . . g4 . Πn represents flections on the cornea consequently appear further away than the the (unknown) corneal-reflection plane given up to a homography. n n corneal surface (a.k.a virtual reflections). Let gj (j = 1..4) be the corners of the unit square and define Hi n n such that gj = Hi gj . Notice, using the screen corners to span the Denote the screen plane Πs and four (virtual) reflection on the normalized space would be equally viable. The basic idea is that the c c cornea (g1 . . . g4 ). The reflections may come from any point in 3D n pupil is mapped to the normalized space through Hi to normalize space, for example external light sources (Li ) or the corners of a the effects of head pose prior to any calibration or gaze estimation screen reflected on the cornea. The issue of screen projections will s procedure (Fn , in figure 2). The mapping of the reflections from be addressed in section 5.3. For the sake of simplicity and with- s s the image Πi to the screen Πs via Πn is therefore Hi = Hn ◦ Hi . n c c out loss of generality, the following description assumes (g1 . . . g4 ) s s That is, a homography Hn is a sufficient model for Fn when the come from point light sources. Provided the eye is stationary then pupil and Πc coincide. any location of a light source, Li , on li with same direction produce s the same point of reflection on the cornea. The light sources can Hi can be found through a user calibration consisting of a min- therefore and interchangeably be assumed located on e.g. the screen imum of 4 calibration targets, t1 . . . tN on the screen. Denote the plane Πs or at infinity as depicted in figure 1. Projected points at general principle of normalizing eye data (pupil center, pupil or lim- infinity lie in the focal plane of the convex mirror. With four light bus contours) with respect to the reflections by homography nor- s s source there will exist a plane Πc (in fact a family of planes related malization. The method of using Fn = Hn in connection with by homographies), spanned by the lines li . This plane is denoted homography normalization is referred to as (Hom). the corneal-reflection plane and is close to fc when Li at infin- ity. When considering the reflection laws (e.g. not a projection) the The cross-ratio method do not model the visual axis well [Kang corneal reflections may only be approximately planar. et al. 2008]. Homography normalization, on the other hand, does model the offset between the optical and visual axes to a much Without loss of generality suppose the light sources are located on higher degree. Points in normalized space are based on the pupil c c Πs . The quadrilateral of glints (g1 . . . g4 ) is consequently related center i.e. a model of the optical axis without the interference of i i to the corresponding quadrilateral (g1 . . . g4 ) in the image via a ho- head movements. However, as offsets between the optical and vi- i mography, Hc , from the cornea (Πc ) to the image (Πi ) [Hartley sual axes correspond to translations in normalized space, the visual s s and Zisserman 2004]. Similarly, the mapping from the cornea to and optical axis offset is modeled implicitly through Fn = Hn . s the screen is also given by a homography Hc . The homography s s c from the image to the screen Hi = Hc ◦ Hi via the Πc will 3.1 Model Error from Planarity Assumption therefore exist regardless of the location of the cornea, provided the geometric setup does not change. These arguments also apply The previous section describes a generalized approach for head to cross-ratio-based methods [Coutinho and Morimoto 2006; Yoo pose invariant PoR estimation under the assumption that the pupil and Chung 2005]. and Πc coincide. If the pupil had been located on Πc , it would be a head pose invariant gaze estimation method that models the The pupil center is located about 4.2 mm from the cornea center visual and optical axis offset. Euclidean information is not avail- but its location vary between subjects and over time for a particu- able in uncalibrated settings. Using metric information (e.g. be- lar subject [Guestrin and Eizenman 2006]. However, the pupil is tween the pupil and the Πc ) does therefore not apply in this setting. located approximately 0.3 mm (| Rc − 4.2|) from the corneal focal 2 This section provides an analysis of the model error and section point, fc , and thus also close to Πc . In the following suppose that 3.2 discusses an approach to accommodate the errors. Figure 3 il- Πc and the pupil coincide. The pupil may under these assumptions s lustrates two different gaze directions and the associated modeling be mapped through Hi from the image to the screen via the corneal error measured from the camera. reflections. Camera center Image space Normalized space Screen Pupil gi gn gn gi 2 1 pc n 2 Camera 1 optical axis n H i Fs n gi gi gn gn PoR Gaze direction 1 3 4 3 4 Gaze direction 2 pci Πc X X e1 Figure 2: (left) Reflection points (crosses) and the pupil (gray el- e2 lipse) are observed in the image and (middle) the pupil mapped to Pupil position 2 Pupil position 1 the normalized space using the four reflection points. (right) from the normalized space the pupil is mapped to the point of regard. Figure 3: Projected differences between pupil and the correspond- These basic observations are sufficient to describe the fundamen- ing point on Πc for two gaze directions. Πc is kept constant for tal and simple algorithm for PoR estimation in an uncalibrated set- clarity. ting. The method is illustrated in figure 2 and is based on locating i i and tracking four reflections (g1 . . . g4 ) (e.g. glints) and the pupil in the image. The pupil center, pc , will be used in the following When the user looks away from the camera (’gaze direction 1’) it is description. However, the presented method may alternatively use evident that the error in the image plane is related to the projected the limbus center or the pupil/limbus ellipse contours directly in the line segment (between the point on Πc and the actual location of mapping since homographies allow for mappings of points, lines the pupil), el , onto the image plane. A gaze vector directed to- and conics. wards the camera (’gaze direction 2’) yields a point and therefore no error. Hence equal angular offsets from the optical axis of the It is convenient, though not necessary, to define a virtual plane, Πn , camera generate offset vectors ∆c (i, j) with the same magnitude 15
  • 4. when viewed from the camera. The largest magnitude of errors oc- seen for single or dual glint systems [Morimoto and Mimica 2005]. cur when the gaze direction is perpendicular to the optical axis of One of the limitation when using polynomials is that any increase the camera. The magnitude field |∆c (i, j)| in camera coordinates of the order of the polynomial would require additional calibration consequently consists of elliptic iso-contours, centered around the targets in order to estimate the parameters of the polynomial. A cu- optical axis of the camera. However, it is the error, ∆s , in screen bic polynomial seem to be a good approximation for ∆i [Cerrolaza coordinates, that is of interest. The true point of regard in screen co- et al. 2008], however it would require at least 10 calibration targets. ordinates, ρ∗ = ρs + ∆s is a function of the estimated gaze ρs and s ˆ ˆ Different from the ’weight space’ approach of polynomials is the the error ∆s . That is ρ∗ = Hi (pc + ∆i ) = Hi pc + Hi ∆i , hence s s s s function view approach of Gaussian processes (GP). Gaussian pro- s errors on the screen ∆s = Hi ∆i are merely errors in the camera cess (GP) interpolation method is used to estimate ∆i by using a propagated to the screen through the homography. An example of squared exponential covariance function [Rasmussen and Williams the error vector field, ∆s , using a simulator and the corresponding 2006]: vector magnitudes is shown in Figure 4. 1 |xp − xq | cov(xp , xq ) = k1 ∗ exp(− 2 ) + k3 σ 2 Calibration Targets 2 k2 Vector field of PoR errors Magnitudes of PoR error vector field 16 14 where xp and xq are data points and ki are weights. GP’s have 12 several innate properties that make them highly suited for gaze es- 10 16 timation. Gaussian processes do not model weights directly and 8 12 14 thus there are no requirements on the minimum number of calibra- 6 0.015 10 tion targets needed to infer model parameters. Each additional cal- 4 0.01 6 8 ibration target provides additional information that will be used to 2 0.005 0 4 increase accuracy. Each estimate also comes with an error measure- ment which, via the covariance function, is related to the distance 2 0 5 0 10 0 0 5 10 15 15 from the input data to the calibration data. This information can Camera location potentially be used to regularize output data. The exponential co- variance function has been adopted since it is highly smooth (like ∆i ) and it makes it possible to account for noise directly in the co- Figure 4: (left) Error vector field and (right) corresponding mag- variance function through k3 σ 2 . In the following we denote with nitudes obtained from simulated data. Crosses indicate calibration s (GP) the method of Fn that use (Hom) together with Gaussian pro- targets and the circles the projection of the camera center. cess modeling of ∆i . To argue for the characteristics of ∆s it is without loss of general- 4 Assessment on Simulated Data ity and for the sake of simplicity assumed that only four calibration points, (t1 . . . t4 ), are used (crosses in figure 4). When estimat- s Head pose, head position, the offset between visual and optical ing the homography, Hi , through user calibration, the errors in the axes, refraction, measurement noise, relative position of hardware calibration targets, ∆s (ti) = 0, are minimized to zero and there and camera parameters are factors that mostly influence the accu- will therefore be 5 points (calibration targets and the camera opti- racy of gaze estimation methods. We will in the following sec- cal axis) where the ∆s is zero. tions evaluate the homography normalization methods ((Hom) and One way of thinking of a homography is that it generates a linear (GP)) to the cross-ratio methods ((Yoo)[Yoo and Chung 2005] and s (Cou)[Coutinho and Morimoto 2006]). These methods have been vector field of displacements. ∆s = Hi ∆i is therefore a compo- sition of two vector fields (∆s = Vh + ∆i ), a linear vector field chosen since they operate under similar premises as homography corresponding to the homography (Vh ) and an ellipsoidal vector normalization (e.g. uncalibrated/semi-calibrated setup). Simulated field ∆i . Since ∆s (ti ) = 0 then Vh (ti ) = −∆s (ti ). Vh (ti ) is data is used in this section to be able to asses the effects of potential consequently defined through the negative error vectors of ∆i (ti ). noise-factors separately. The simulator [B¨ hme et al. 2008] allows o It is worth noting that as the camera location is unknown due to the for detailed modeling of the different components of the setup and uncalibrated setup assumption and the location of the maximum er- eye specific parameters. The evaluation is divided according to the ror depends on the location of the camera, it would be impossible presence of head movements and the number of calibration targets to determine the extremal location without additional information. (N). Notice the methods, except (Yoo), allow for multiple on-screen However, despite of this, it is be shown in the following sections calibration targets. The effects of eye specific parameters such as that it is possible through homography normalization to obtain re- refraction and offset between the visual and optical axis as well sults quite similar to fully calibrated setups. as the effect of the number of calibration targets and errors asso- ciated with the model assumptions are evaluated when the head is 3.2 Modeling Error Vectors fixed (section 4.2). The methods are examined with respect to head movements in section 4.3. In some experiments the (GP) method This section discusses one approach of modeling the error caused has been left out since it is a derivative of (Hom) and would not alter by the non-coplanarity of Πc and the pupil. Even though the loca- the inherent properties of using homography normalization, it only tion of the largest errors cannot be determined (a priori) due to the makes a difference to the accuracy when the number of calibration uncalibrated setup, it may be worthwhile to accommodate the er- targets is larger than four (N > 4). rors to the extent possible. That is to estimate a vector field similar to figure 4. When the camera is placed outside the screen area, the 4.1 Setup error due to the homography is zero in 5 points (e.g. the calibration targets and the camera projection center) and non-zero elsewhere. The camera is located slightly below and to the right of the cen- s After estimating Hi it is possible to measure the error due to the ho- ter of the screen as to simulate a realistic setup (e.g. users do not mography for each additional calibration target. Since the error vec- place the components in an exact position). All tests have been tor field is smooth, a simplified yet effective approach would be to conducted with the same camera focal length. The cornea is mod- model the error through polynomials in a similar way as previously eled as a sphere with radius 7.98 mm. Four light sources are placed 16
  • 5. at the corners of a planar surface (screen) to be able compare ho- offset, γ ( with β = 0), has a significant effect on the accuracies of mography and cross-ratio methods. In the following denote with the cross-ratio methods but not on homography normalization. The N the number of calibration targets. γ and β correspond to the an- reason is that homography normalization models the optical visual gular offsets between the visual and optical axes in horizontal and offset to a much higher degree. vertical directions, respectively. 4.2 Stationary Head Accuracy with variable optical/visual−axis offset 3.5 Yoo Cou Basic Settings and Refraction In this section the methods are 3 Hom evaluated as if the head is kept still while gazing at a uniformly On−screen error (deg) distributed set of 64 × 64 targets. Figure 5 shows the mean ac- 2.5 curacy (degrees) with error-bars (variance) in the hypothetical eye 2 model, where there is no offset between visual and optical axes E0 = {γ = β = 0} and a more realistic setting with eye model 1.5 E1 ={γ = 4.5, β = 1.5}. Each sub-figure shows the cases where refraction is included and when it is not. E0 is a physically infea- 1 sible setup since the optical and visual axis are different, but the model avoids eye specific biases. It is clear from figure 5 that the 0.5 methods exhibit similar accuracies in E0 , but the offset between vi- sual and optical axes in E1 makes a notable difference between the 0 −5 −3.9 −2.8 −1.7 −0.6 0.6 1.7 2.8 3.9 5 methods. Refraction has only a minor effect on the methods. Offset (degrees) Influence of refraction with eye model 0 Influence of refraction with eye model 1 Figure 7: Accuracy as a function of the angular offset. 0.8 Refraction 3.5 No refraction Refraction 0.7 No refraction 3 0.6 Error magnitude (deg) 2.5 Error magnitude (deg) 0.5 4.3 Head Movements 2 0.4 1.5 0.3 0.2 1 Gaze trackers should ideally be head pose invariant. This section 0.1 0.5 evaluates the methods in scenarios where the eye location changes 0 0 in space (±300 mm in both x and y directions from the camera Yoo Cou Hom Yoo Cou Hom Method Method center) but the target location remains fixed on the screen. Figure 5: Comparison of methods (with/without refraction) when the head is kept still using eye model (left) E0 =(γ = β = 0) and Influence of N and γ Figure 8 shows the accuracies of using (right) eye model E1 =(γ = 4.5, β = 1.5) and N = 4 calibration a variable number of calibration targets and eye parameters in the targets. presence of head movements. The results show similarities to the head still experiments by also revealing that the offset between the optical and visual axes makes a significant difference to the cross- Changing N The previous test is based on a minimum number ratio methods, but not to the homography-based methods. The of calibration targets. However, the methods may, besides (Yoo), number of calibration targets has only a minor effect on accuracy. improve accuracy as the N uniformly distributed calibration targets Non-linear modeling improves accuracy and especially the differ- increase. Figure 6 shows accuracy of the methods as a function of ence between 4 and 9 calibration targets makes a significant dif- N for both eye models. (GP) exhibit a rapid increase of accuracy ference. When considering the nuisance of calibration and the ob- when increasing N . Both (Hom) and (Cou) may be improved by tained accuracy, it is task dependent whether the rather small in- increasing N , but large N implies a accuracy decrease for (Cou). crease in accuracy between 9 and 16 calibration targets is worth- The accuracy for (Yoo) is as expected. while. Varying the number of calibration targets eye model 0 Varying the number of calibration targets eye model 1 0.8 3.5 Yoo Cou Yoo Cou Depth Translation The methods analyzed here are all using 0.7 Hom Hom GP 3 GP properties on projective planes. Movements in depth is therefore 0.6 2.5 not an inherent property to the methods. The influence of head Accuracy (deg) Accuracy (deg) 0.5 2 movements will therefore be examined by evaluating head move- 0.4 1.5 ments as translations parallel to the screen plane (or equivalently 0.3 1 Πc ) as depicted in figure 9 and movements in depth (figure 10). A 0.2 single depth is used for calibration. The results show that none of 0.5 0.1 the methods are invariant to neither depth or in-plane translations, 0 4 9 16 25 36 49 Number of calibration targets 64 0 4 9 16 25 36 49 Number of calibration targets 64 but that the homography normalization-based methods have better performance. For depth changes larger than 150 mm (see figure 10) Figure 6: Changing the number of calibration targets, N , for E0 the (GP) method does not perform as well as (Hom). The reason is (left) E1 (right). that the learned offsets in (GP) are only valid for a single scale. The graphs in figure 10 show the accuracy as a function of depth Offset between Visual and Optical Axes There is a noticeable changes (from the calibration depth) when using different eye pa- accuracy difference when using E0 and E1 in the previous experi- rameters (E0 and E1 ) and with a variable number of calibration ments. Figure 7 shows that the influence of the angular horizontal targets, N . 17