2. fic environment; vehicles generate heat too.
Even the pavement can appear hotter on a
summer day than a pedestrian’s body. So,
rather than offering the solution for pedes-
trian detection per se, infrared sensors pro-
vide a means to simplify the segmentation
problem. Pattern recognition techniques are
still necessary.
Active-sensor approaches
Video sensors do not directly provide
depth information; stereo vision derives
depth by establishing feature correspondence
and performing triangulation. On the other
hand, active sensors measure distances
directly.
Figure 1. A typical dangerous situation: a child suddenly steps into a street. Radar
Some commercial vehicles already
employ radar for adaptive cruise control (for
example, the Distronic System on Mercedes-
appearances, and the cluttered (uncon- Mohan and his colleagues have extended Benz S-Class cars). For near-distance appli-
trolled) backgrounds. Most research on this research to involve a component-based cations, such as pedestrian detection, ongo-
vision-based pedestrian recognition has approach.11 ing investigations focus on 24-GHz radar
taken a learning-based approach, bypassing However, this approach’s performance– technology.14 Radar-based systems can
a pose recovery step altogether and de- speed trade-off is currently unfavorable enhance object localization by placing multi-
scribing human appearance in terms of for use in vehicles. The Chamfer System ple sensors on the vehicle’s relevant parts
simple low-level features from a region of addresses this through two-step object recog- and applying triangulation-based techniques.
interest (ROI). One line of research has nition.12 The first step applies hierarchical They can classify objects—that is, distin-
dealt specifically with scenes involving template matching using contour features to guish pedestrians from other objects such as
people walking laterally to the viewing efficiently lock onto candidate solutions. cars and trees—by examining the power
direction, with recognition by either using Matching is based on correlation with dis- spectral-density plot of the reflected signals.
the periodicity cue2,3 or learning the char- tance-transformed images. By capturing the In this context, we consider an object’s spec-
acteristic lateral gait pattern.4 object’s shape variability through a template tral content and reflectivity. Objects with
A crucial factor determining the suc- hierarchy and by using a combined coarse- smaller spatial extents, such as pedestrians,
cess of learning methods is the availabil- to-fine approach in shape and parameter have narrower peaks in the plot than, say,
ity of a good foreground region. Unlike space, this step achieves large speedups cars. The material properties of the object’s
with applications such as surveillance, compared to an equivalent brute-force surface determine the strength of reflected
where the camera is stationary, standard method. The second step reverts to texture- radar signals. Vehicles’ metallic parts reflect
background subtraction techniques are of based pattern classification of the candidate much better than human tissue, by at least an
little avail here because of the moving solutions that the first step provided. order of magnitude. Human tissue, in turn,
camera. Independent motion detection Another powerful technique to establish reflects much better than nonconductive
techniques can help,3 but they are diffi- ROIs is stereo vision. Uwe Franke and his materials, such as the wood in trees.
cult to develop. Yet, given a correct initial colleagues combine stereo vision with tex-
foreground, we can shift some of the bur- ture-based pattern classification. I describe Laser range finders
den to tracking.4–9 two other stereo vision-based approaches The main appeal of eye-safe laser range
A complementary problem is to recog- later. finders lies in their fast, precise depth mea-
nize pedestrians in single images; this is Lately, interest has been increasing in surement and their large field of view. For
particularly relevant for pedestrians stand- video sensors that operate outside the visi- example, Martin Kunert, Ulrich Lages, and
ing still. One general approach involves ble spectrum. Having long been used ex- I describe a laser range finder that has a
shifting windows of various sizes over the clusively in the military domain, infrared depth accuracy of +/− 5 cm and a range of
image, extracting low-level texture fea- sensors are finding their way into civilian 40 m for objects with at least 5 percent
tures, and using standard pattern classifi- applications owing to the advent of cheaper, reflectivity (this includes most, if not all,
cation techniques to determine a pedes- uncooled cameras. The principle of detect- relevant targets).14 Furthermore, its hori-
trian’s presence. For example, Constantine ing pedestrians by the heat their bodies emit zontal scans cover a 180-degree field of
Papageorgiou and Tomaso Poggio com- is appealing (Takayuki Tsuji and his col- view in increments of 0.5 degree at 20 Hz,
bine wavelet features with a support vector leagues provide one example13). Yet pedes- making the sensor especially suitable to
machine classifier.10 More recently, Anuj trians are not the only heat sources in a traf- cover the area just in front of the vehicle.
78 computer.org/intelligent IEEE INTELLIGENT SYSTEMS
3. Current systems
At least three pedestrian recognition
systems have been integrated on demon-
stration vehicles. Those I describe here are
video-based and employ a two-step detec-
tion–verification framework for efficient
pedestrian recognition; stereo vision pro-
vides the ROI.
At Carnegie Mellon University’s NavLab,
Liang Zhao and Charles Thorpe developed
a system that combines stereo vision with
neural-network pattern classification.15 It
obtains the texture features for classifica-
tion by applying a high-pass filter to the
ROI and normalizing for size. The system,
running at 3 to 12 Hz, aims to assist bus Figure 2. DaimlerChrysler’s Urban Traffic Assistant demonstrator.
drivers in urban traffic. The researchers
plan to expand it to cover the sides of the bus
and, eventually, to provide full 360-degree tive for pedestrian protection under the succession of three components: stereo-
coverage. Fifth Framework project Protector.14,20 based obstacle detection, template-based
The University of Pavia system, imple- The project brings together major vehicle shape matching, and texture-based pattern
mented in the ARGO experimental auto- manufacturers, sensor suppliers, and re- classification. Assume that each compo-
nomous vehicle, combines stereo vision search institutions to develop intelligent nent’s performance is independent of that
with template matching for detecting pe- systems on vehicles for reducing accidents of the others.
destrian head and shoulder shapes.16 The involving pedestrians, bicyclists, and other We conservatively estimate that, to
system searches for vertical symmetry to unprotected traffic participants. Among the detect every pedestrian in urban traffic, the
verify candidate regions. The authors re- completed tasks are the analysis of acci- stereo component produces one pedestrian
port good detection results in the range of dent statistics and the definition of relevant ROI each 10 seconds. (In lieu of hard
10 to 40 meters. traffic scenarios. The project is investigat- experimental data, we use a value derived
At DaimlerChrysler, we have been work- ing three sensor technologies: radar, laser from our experience.) We assume that the
ing on pedestrian recognition as part of our range finder, and video, which we will im- stereo component accomplishes this by
multiyear effort to extend driver assistance plement on two passenger cars (Fiat and employing simple heuristics regarding the
beyond the highway scenario into the com- DaimlerChrysler) and one truck (MAN). sizes and locations of the rectangular
plex urban environment.4,12,17,18 Of par- Sometime in 2002 we will evaluate the final regions it detects as obstacles. Because we
ticular interest is the Intelligent Stop&Go systems on a test track under standardized cannot expect the pedestrian ROI to exactly
system on our Urban Traffic Assistant and realistic conditions (that is, using dum- outline the pedestrian, we assume that we
demonstrator (see Figure 2). Intelligent mies). User interface and user acceptance need 10 probes to extract the pedestrian
Stop&Go lets the UTA autonomously fol- studies will conclude this project. correctly. For the shape-based and texture-
low a lead vehicle, while being aware of based components, we estimate a detection
relevant elements of the traffic infrastruc- The road ahead rate of 95 percent at a false positive rate in
ture (for example, road lanes, traffic A pedestrian safety system’s success or the order of 10–3 and 10–1 per candidate
signs, and traffic lights) and other traffic failure, from a technical viewpoint, will region, respectively.10,12,15 All in all, we
participants. depend largely on the rate of correct detec- arrive, in this best-case scenario, at a false-
Our most recent pedestrian detection sys- tions versus false alarms that it produces, at a positive rate of 1 per 104 seconds or 1 per
tem consists of stereo vision-based obstacle certain processing rate and on a particular 2.8 hours, for a detection rate of 90 percent.
detection and fine localization within the processor platform. But what rate will we Integrating the results over time by track-
stereo ROI using the Chamfer System (see need for actual deployment of a sensor-based ing will improve this figure somewhat.
Figure 3).12 The system tracks detected pedestrian system? This question However, this improvement will be offset by
objects over time and aggregates single- is difficult to answer because the desired rate the lower filter ratios of the shape and tex-
frame results. At the same time, a time delay will depend on the final system concept. If, ture components, which, in practice, are not
neural network with local receptive fields19 for example, the system concept involves independent. On the basis of this, we can
constantly evaluates successive ROIs, search- only a warning function, performance crite- fairly say that we’ll need to reduce the false-
ing for the characteristic temporal patterns ria will likely be less stringent than for a con- positive rate by at least one order of magni-
of (lateral) human gait. Visit www.gavrila. cept that involves active vehicle control. tude to obtain a viable pedestrian system,
net/Computer_Vision/computer_vision.html Perhaps we can more easily establish while maintaining the same detection rate.
for a few video clips. where we currently stand regarding perfor- Fortunately, several ways exist to signifi-
Other systems will soon join these three. mance. Consider a (fictional) video-based cantly reduce the false-positive rate. Im-
The EU has recently begun a major initia- pedestrian detection system that involves a proved multicue video algorithms (combin-
NOVEMBER/DECEMBER 2001 computer.org/intelligent 79
4. the precrash range, prediction quickly be-
comes unreliable; pedestrians can easily
change direction. Furthermore, accurate risk
assessment will increasingly require good
scene understanding. For example, the dan-
ger associated with a pedestrian heading
toward the street will depend largely on the
placement of the road boundaries, whether a
traffic light exists, and, if so, whether it is
green. This suggests that, in the long run, a
reliable, anticipatory pedestrian system must
be aware of several types of infrastructural
elements, through either perception or telem-
atics approaches. We might reduce at least
some complexity by limiting a pedestrian
protection system’s scope to cover only spe-
cific traffic scenarios; this will represent a
good intermediate solution.
D ifficult technical challenges lie ahead,
but this domain’s progress over the past
few years warrants optimism. Consider-
ing the potential for saving lives and in-
creasing safety, the goal certainly appears
worthwhile.
References
1. D.M. Gavrila, “The Visual Analysis of
Human Movement: A Survey,” Computer
Vision and Image Understanding, vol. 73, no.
1, Jan. 1999, pp. 82–98.
2. R. Cutler and L. Davis, “Real-Time Periodic
Motion Detection, Analysis and Applications,”
Proc. IEEE Conf. Computer Vision and Pat-
tern Recognition, vol. 2, IEEE CS Press, Los
Figure 3. Pedestrian detection results (shown in white) from the Chamfer System. Alamitos, Calif., 1999, pp. 326–331.
Besides showing correct detections, the figure illustrates typical shortcomings, such as 3. R. Polana and R. Nelson, “Low Level Recog-
false detections in heavily textured image areas (for example, the left image in the nition of Human Motion,” Proc. IEEE Work-
bottom row) or missing detections in areas of low contrast, occlusion, or both (for shop Motion of Non-rigid and Articulated
example, the right image in the bottom row). Objects, IEEE CS Press, Los Alamitos, Calif.,
1994, pp. 77–82.
ing distance, shape, texture, and motion pedestrian protection devices, pedestrian 4. B. Heisele and C. Wöhler, “Motion-Based
cues) could successively decimate the false safety systems could piggyback on the per- Recognition of Pedestrians,” Proc. 14th Int’l
Conf. Pattern Recognition, IEEE CS Press,
alarm rate, as the description of our fictional vasiveness of the future communication Los Alamitos, Calif., 1998, pp. 1325–1330.
system illustrates. Sensor fusion (for exam- infrastructure (for example, the UMTS
ple, combining video and laser range finder [Universal Mobile Telecommunications 5. A. Baumberg and D. Hogg, “Learning Flexi-
approaches) will probably also produce System] and Bluetooth). ble Models from Image Sequences,” Proc.
European Conf. Computer Vision, Lecture
large benefits. Finally, telematics concepts, Challenges remain even after we solve the Notes in Computer Science, vol. 800, Springer-
involving communication between pedestri- pedestrian detection problem. After all, we’ll Verlag, Heidelberg, 1994, pp. 299–308.
ans and vehicles combined with GPS-based need to assess the danger of a particular traf-
6. T. Cootes et al., “Active Shape Models: Their
localization, could close any remaining per- fic situation. This assessment will consider Training and Applications,” Computer Vision
formance gap. Although we can’t realisti- the pedestrians’ and vehicles’ position and and Image Understanding, vol. 61, no. 1, Jan.
cally expect people to buy special-purpose speed. But with a larger look ahead, beyond 1995, pp. 38–59.
80 computer.org/intelligent IEEE INTELLIGENT SYSTEMS
5. Dariu M. Gavrila is a research scientist with DaimlerChrysler Re-
search’s Image Understanding Group in Ulm, Germany. His research
interests include vision systems for detecting human presence and
activity, with applications in surveillance, virtual reality, and intelli-
7. C. Curio et al., “Walking Pedestrian Recogni- gent human–machine interfaces. He works on real-time vision sys-
tion,” IEEE Trans. Intelligent Transportation tems for driver assistance and intelligent cruise control. He is cur-
Systems, vol. 1, no. 3, Nov. 2000, pp. 155–163. rently responsible for the European Union’s Protector project for
pedestrian protection. He received his MS in computer science cum
8. V. Philomin, R. Duraiswami, and L. Davis, laude from the Free University in Amsterdam and his PhD in com-
“Quasi-random Sampling for Condensation,” puter science from the University of Maryland at College Park. Contact him at Image Under-
Proc. European Conf. Computer Vision, vol. standing Systems, DaimlerChrysler Research, Ulm 89081, Germany; dariu.gavrila@daimlerchrysler.
2, Lecture Notes in Computer Science, vol. com; www.gavrila.net.
1843, Springer-Verlag, Heidelberg, Germany,
2000, pp. 134–149.
9. G. Rigoll, B. Winterstein, and S. Müller,
“Robust Person Tracking in Real Scenarios
with Non-stationary Background Using a Sta- Vehicles, IEEE Press, Piscataway, N.J., 2001, Technologies, L. Vlacic, F. Harashima, and M.
tistical Computer Vision Approach,” Proc. 2nd pp. 133–140. Parent, eds., Butterworth Heinemann, Oxford,
IEEE Int’l Workshop Visual Surveillance, UK, 2001, pp. 131–188.
IEEE CS Press, Los Alamitos, Calif., 1999, 14. D.M. Gavrila, M. Kunert, and U. Lages, “A
pp. 41–47. Multi-sensor Approach for the Protection of 18. U. Franke et al., “Autonomous Driving Goes
Vulnerable Traffic Participants: The PRO- Downtown,” IEEE Intelligent Systems, vol.
10. C. Papageorgiou and T. Poggio, “A Trainable TECTOR Project,” Proc. IEEE Instrumenta- 13, no. 6, Nov./Dec. 1998, pp. 40–48.
System for Object Detection,” Int’l J. Computer tion and Measurement Technology Conf., vol.
Vision, vol. 38, no. 1, June 2000, pp. 15–33. 3, IEEE Press, Piscataway, N.J., 2001, pp. 19. C. Wöhler and J. Anlauf, “An Adaptable
2044–2048. Time-Delay Neural-Network Algorithm for
11. A. Mohan, C. Papageorgiou, and T. Poggio, Image Sequence Analysis,” IEEE Trans.
“Example-Based Object Detection in Images 15. L. Zhao and C. Thorpe, “Stereo- and Neural Neural Networks, vol. 10, no. 6, Nov. 1999,
by Components,” IEEE Trans. Pattern Analy- Network-Based Pedestrian Detection,” IEEE pp. 1531–1536.
sis and Machine Intelligence, vol. 23, no. 4, Trans. Intelligent Transportation Systems,
Apr. 2001, pp. 349–361. 20. P. Carrea and G. Sala, “Short Range Area
vol. 1, no. 3, Nov. 2000, pp. 148–154.
Monitoring for Pre-crash and Pedestrian Pro-
12. D.M. Gavrila, “Pedestrian Detection from a 16. A. Broggi et al., “Shape-Based Pedestrian tection: The Chameleon and Protector Pro-
Moving Vehicle,” Proc. European Conf. Com- Detection,” Proc. IEEE Intelligent Vehicles jects,” Proc. 9th Aachener Colloquium Auto-
puter Vision, vol. 2, Lecture Notes in Com- Symp., IEEE Press, Piscataway, N.J., 2000, mobile and Engine Technology, Institut für
puter Science, vol. 1843, Springer-Verlag, pp. 215–220. Kraftfahrwesen Aachen (Aachen Inst. for
Heidelberg, Germany, 2000, pp. 37–49. Automotive Eng.) and Verbrennungs Kraft-
17. U. Franke et al., “From Door to Door: Princi- maschinen Aachen (Aachen Inst. for Internal
13. T. Tsuji et al., “Development of Night Vision ples and Applications of Computer Vision for Combustion Engines), Aachen, Germany,
System,” Proc. IEEE Int’l Conf. Intelligent Driver Assistant Systems,” Intelligent Vehicle 2000, pp. 629–639.
Advertiser/Product Index
November/December 2001
Advertising Sales Offices
Page No.
Computing in Science & Engineering Cover 3 Sandy Brown
10662 Los Vaqueros Circle, Los Alamitos, CA
IEEE Computer Society 60 90720-1314; phone +1 714 821 8380; fax +1 714 821
IEEE Distributed Systems Online 33 4010; sbrown@computer.org.
IEEE Intelligent Systems Cover 4
IEEE Pervasive Computing 40 Advertising Contact: Debbie Sims, 10662 Los
Vaqueros Circle, Los Alamitos, CA 90720-1314;
Classified Advertising 60 phone +1 714 821 8380; fax +1 714 821 4010;
dsims@computer.org.
Boldface denotes advertisers in this issue.
For production information, and conference and classified advertising, contact Debbie Sims, IEEE Intelligent Systems, 10662 Los Vaqueros Circle, Los
Alamitos, CA 90720-1314; phone (714) 821-8380; fax (714) 821-4010; dsims@computer.org; http://computer.org.
NOVEMBER/DECEMBER 2001 computer.org/intelligent 81