1. A REAL TIME 3D STATIC HAND GESTURE
RECOGNITION SYSTEM USING HCI FOR
RECOGNITION OF NUMBERS
First Author#, Second Author*, Third Author#
#
First-Third Department, First-Third University
Address
1first.author@first-third.edu
3third.author@first-third.edu
*
Second Company
Address Including Country Name
2
second.author@second.com
Abstract— In this paper, we introduce a static hand gestures belongs to sign language. The proposed static
gesture recognition system to recognize numbers from 0 hand gesture recognition system makes use of Human
to 9. This system uses a single camera without any Computer Interaction (HCI) and computer vision.
marker or glove on the hand. This work proposes an Human–computer Interaction (HCI) involves the
easy-to-use and inexpensive approach to recognize single study, planning, and design of the interaction between
handed static gestures accurately. The system helps people (users) and computers. A basic goal of HCI is
millions of deaf people to communicate with other
to improve the interactions between users and
normal people. It describes a hand gesture recognition
system (HGRS) which recognizes hand gestures in a computers by making computers more usable and
vision based setup that includes capturing an image receptive to the user's needs.
using a webcam. The template matching algorithm is Computer vision is a field that includes methods for
used for hand gesture recognition. It is mainly divided acquiring, processing, analyzing, and understanding
into the following stages: image capturing, image pre- images and, in general, high-dimensional data from
processing, region extraction, feature extraction and the real world in order to produce numerical or
matching and gesture recognition. The image is first symbolic information. Computer vision is also
captured in RGB format. The image pre-processing described as the enterprise of automating and
module transforms the raw image into the desirable
integrating a wide range of processes and
feature vector which mainly includes converting the
colour images into the HSV images and reducing noise. representations for vision perception.
The region extraction module extracts the skin region Nowadays, the majority of the human-computer
from the whole image and eliminates the forearm region interaction (HCI) is based on devices such as
giving the region of interest. The feature extraction keyboard and mouse. Physically challenged people
module extracts a set of distinct parameters to represent may have difficulties with such input devices and may
each gesture and distinguish the different gestures. require a new means of entering commands or data
Finally the features are matched and the corresponding into the computer. Gesture, Speech and touch inputs
gesture is recognized. 100 images for each hand gesture are few possible means of addressing such user's
representing different numbers are used to train the
system and then it is tested for a different set of images.
needs to solve this problem. Using computer vision, a
Images for the training set are taken, keeping the hand computer can recognize and perform the user's gesture
at a distance of 15 inches from a 10 megapixel camera. command, thus alleviating the need for a keyboard.
Sign language is the most natural and expressive
way for the hearing impaired. The Indian Sign
Language was proposed by Government of India so
Keywords— Region Extraction, Feature extraction,
Gesture recognition, Single handed gestures. that there is a uniform sign language that can be used
by all the deaf and dumb people in the country.
I. INTRODUCTION Automatic sign language recognition offers
Hand gesture recognition has various applications enhancement of communication capabilities for the
like computer games, machinery control and thorough speech and hearing impaired. It promises improved
mouse replacement. One of the most structured sets of social opportunities and integration in the society to
these people [1].
J.Pansare [2] proposed a method in which the minimum gesture from the training dataset. This method gave a high
Euclidian distance would determine the perfect matching accuracy at a comparatively low cost. Panwar [3] presented a
2. real time system for hand gesture recognition on the basis of In [13] a hand gesture recognition system to translate hand
detection of some meaningful shape based features like gestures into Urdu alphabets using colour segmentation and a
orientation, centre of mass, status of fingers, thumb in terms comprehensive classification scheme.
of raised or folded fingers of hand and their respective Liu, Gan and Sun [14] proposed an algorithm based on Hu
location in the image. This approach is simple, easy to moments and Support Vector Machines (SVM). Firstly, Hu
implement and does not require significant amount of training invariant moments are used to obtain feature vectors and then
or post processing, providing high recognition rate with a SVM is used to find a decision border between the
minimum computation time. integrated hand and the defected hand. It brings a 3.5% error
Dardas and Georganas proposed a system which included rate of identifying the hand.
detecting and tracking bare hands in a cluttered background A simple recognition algorithm that uses three shape-based
using skin detection and hand posture contour comparison features of a hand to identify what gesture it is conveying is
after face subtraction, recognizing hand gestures via bag of proposed in [15]. This algorithm takes an input image of a
features and multiclass support vector machines and building hand gesture and calculates three features of the image, two
a grammar that generates gesture commands to control an based on compactness and one based on radial distance.
application [4].
F.Ullah [5] presented a hand gesture recognition system II. PROPOSED SYSTEM
that uses an evolutionary programming technique called 1) Image Capturing
Cartesian Genetic Programming (CGP) which is faster in
contrast to conventional Genetic Programming. The captured image is in RGB format. RGB images do not
A hand gesture based human computer interaction system use a palette. The colour of each pixel is determined by the
is proposed in [6] which uses a robust method to detect and combination of the red, green, and blue intensities stored in
recognize single stroke gestures traced with fingertips which each colour plane at the pixel's location.
are tracked in air by the camera using ‘Camshift’ tracker and Normalization of Image
are then translated into actions. Normalization is a process that changes the range
Fernando [7] presented a less costly approach to develop a of pixel intensity values for images with poor contrast due to
computer vision based sign language recognition application glare.
in real time context with motion recognition. Normalization transforms an n-
In [8] a vision based human computer interface system was dimensional grayscale image
proposed that can interpret a user’s gestures in real time to
manipulate games and windows using a 3D depth camera,
which is more robust than the method using a general camera.
with intensity values in the range (Min,Max), into a new
A new sub-gesture modelling approach was proposed in [9]
image
which represents each gesture as a sequence of fixed sub-
gestures and performs gesture spotting where the gesture
boundaries are identified. It outperforms state-of-the-art
Hidden Conditional Random Fields (HCRF) based methods
and baseline gesture potting techniques. with intensity values in the range (newMin,newMax).
In [10] a hand gesture recognition system using a stereo Conversion from normalized RGB to YCbCr
camera was implemented in real time. It performed hand YCbCr is a family of colour spaces in which Y is
detection using a depth map, detected region of interest(ROI) the luma component and CB and CR are the blue-difference
using a convex hull, calculate the depth of the object in ROI to and red-difference chroma components.
obtain hand images that are more accurate and uses a blob YCbCr is not an absolute colour space; rather, it is a way of
labelling method to obtain a clean hand image. Finally uses encoding RGB information. It is used since unlike RGB it is
Zhang and Suen’s thinning algorithm to obtain the feature insensitive to luminescence and hence can detect all shades of
points to recognize the gestures. skin.
Zhang and Yun [11] uses skin colour segmentation and
distance distribution feature to realize the gesture recognition 2) Gray thresholding
and added the colour marker to get rid of the independent
regions. This method has good robustness and it can detect The graythresh function uses Otsu's method, which chooses
and recognize the hand gestures in varying illumination the threshold to minimize the intraclass variance of the black
conditions, hand distance and hand angles efficiently. and white pixels by reduction of a graylevel image to a binary
Choras [12] proposed a method for the recognition of hand image.
gestures using geometrical and Radon Transform (RT)
features. The hand gesture recognition is realized based on the 3) Noise Removal
gesture blob and texture parameters extracted with the blocks,
RT image and also invariant moments giving a detection rate Median filtering is a nonlinear operation often used in
of 94%. image processing to reduce "salt and pepper" noise.
3. Extraction of Region of Interest All title and author details must be in single-column format
In order to extract the hand from the image, we use the and must be centered.
concept of largest blob detection. Blob detection refers to Every word in a title must be capitalized except for short
visual modules that are aimed at detecting points and/or minor words such as “a”, “an”, “and”, “as”, “at”, “by”, “for”,
regions in the image that differ in properties like brightness or “from”, “if”, “in”, “into”, “on”, “or”, “of”, “the”, “to”, “with”.
color compared to the surrounding. Author details must not show any professional title (e.g.
Managing Director), any academic title (e.g. Dr.) or any
4) Edge Detection membership of any professional organization (e.g. Senior
Member IEEE).
Edge detection aims at identifying points in a digital To avoid confusion, the family name must be written as the
image at which the image brightness changes sharply or, more last part of each author name (e.g. John A.K. Smith).
formally, has discontinuities Each affiliation must include, at the very least, the name of
the company and the name of the country where the author is
5) Histogram Calculation based (e.g. Causal Productions Pty Ltd, Australia).
Email address is compulsory for the corresponding author.
We calculate a histogram for the image above a grayscale
colour bar. The number of bins in the histogram is specified A. Section Headings
by the image type. If it is a grayscale image, it uses a default No more than 3 levels of headings should be used. All
value of 256 bins. If it is a binary image, it uses two bins. In headings must be in 10pt font. Every word in a heading must
our system, we use a binary image. be capitalized except for short minor words as listed in
Section III-B.
6) Pattern Recognition using Euclidean distance 1) Level-1 Heading: A level-1 heading must be in Small
Caps, centered and numbered using uppercase Roman
The histogram of the test image is compared with the numerals. For example, see heading “III. Page Style” of this
histogram of the images in the training set using Euclidean document. The two level-1 headings which must not be
distance to recognize the gesture. numbered are “Acknowledgment” and “References”.
Euclidean distance = sqrt((x2-y2)2+(x1-y1)2)
2) Level-2 Heading: A level-2 heading must be in Italic,
III.Page Style left-justified and numbered using an uppercase alphabetic
All paragraphs must be indented. All paragraphs must be letter followed by a period. For example, see heading “C.
justified, i.e. both left-justified and right-justified. Section Headings” above.
Text Font of Entire Document
3) Level-3 Heading: A level-3 heading must be indented,
The entire document should be in Times New Roman or
in Italic and numbered with an Arabic numeral followed by a
Times font. Type 3 fonts must not be used. Other font types
right parenthesis. The level-3 heading must end with a colon.
may be used if needed for special purposes.
The body of the level-3 section immediately follows the level-
Recommended font sizes are shown in Table 1.
3 heading in the same paragraph. For example, this paragraph
Title and Author Details
begins with a level-3 heading.
Title must be in 24 pt Regular font. Author name must be
in 11 pt Regular font. Author affiliation must be in 10 pt B. Figures and Tables
Italic. Email address must be in 9 pt Courier Regular font. Figures and tables must be centered in the column. Large
figures and tables may span across both columns. Any table
TABLE I or figure that takes up more than 1 column width must be
FONT SIZES FOR PAPERS positioned either at the top or at the bottom of the page.
Font Appearance (in Time New Roman or Times) Graphics may be full color. All colors will be retained on
Size Regular Bold Italic the CDROM. Graphics must not use stipple fill patterns
8 table caption (in reference item because they may not be reproduced properly. Please use
Small Caps), (partial) only SOLID FILL colors which contrast well both on screen
figure caption, and on a black-and-white hardcopy, as shown in Fig. 1.
reference item
9 author email address abstract abstract heading
(in Courier), body (also in Bold)
cell in a table
10 level-1 heading (in level-2 heading,
Small Caps), level-3 heading,
paragraph author affiliation
11 author name
24 Title
4. D. Table Captions
Tables must be numbered using uppercase Roman
numerals. Table captions must be centred and in 8 pt Regular
font with Small Caps. Every word in a table caption must be
capitalized except for short minor words as listed in Section
III-B. Captions with table numbers must be placed before
their associated tables, as shown in Table 1.
E. Page Numbers, Headers and Footers
Page numbers, headers and footers must not be used.
F. Links and Bookmarks
All hypertext links and section bookmarks will be removed
Fig. 1 A sample line graph using colors which contrast well both on screen
and on a black-and-white hardcopy from papers during the processing of papers for publication.
If you need to refer to an Internet email address or URL in
your paper, you must type out the address or URL fully in
Fig. 2 shows an example of a low-resolution image which Regular font.
would not be acceptable, whereas Fig. 3 shows an example of
an image with adequate resolution. Check that the resolution
is adequate to reveal the important detail in the figure.
Please check all figures in your paper both on screen and on
a black-and-white hardcopy. When you check your paper on
a black-and-white hardcopy, please ensure that:
• the colors used in each figure contrast well,
• the image used in each figure is clear,
• all text labels in each figure are legible.
C. Figure Captions
Figures must be numbered using Arabic numerals. Figure
captions must be in 8 pt Regular font. Captions of a single
line (e.g. Fig. 2) must be centered whereas multi-line captions
must be justified (e.g. Fig. 1). Captions with figure numbers
must be placed after their associated figures, as shown in
Fig. 1.
Fig. 2 Example of an unacceptable low-resolution image
Fig. 3 Example of an image with acceptable resolution
5. G. References Causal Productions has used its best efforts to ensure that the
The heading of the References section must not be templates have the same appearance.
numbered. All reference items must be in 8 pt font. Please
ACKNOWLEDGMENT
use Regular and Italic styles to distinguish different fields as
shown in the References section. Number the reference items The heading of the Acknowledgment section and the
consecutively in square brackets (e.g. [1]). References section must not be numbered.
When referring to a reference item, please simply use the Causal Productions wishes to acknowledge Michael Shell
reference number, as in [2]. Do not use “Ref. [3]” or and other contributors for developing and maintaining the
“Reference [3]” except at the beginning of a sentence, e.g. IEEE LaTeX style files which have been used in the
“Reference [3] shows …”. Multiple references are each preparation of this template. To see the list of contributors,
numbered with separate brackets (e.g. [2], [3], [4]–[6]). please refer to the top of file IEEETran.cls in the IEEE LaTeX
Examples of reference items of different categories shown distribution.
in the References section include:
REFERENCES
• example of a book in [1]
[1] S. M. Metev and V. P. Veiko, Laser Assisted Microtechnology, 2nd
• example of a book in a series in [2] ed., R. M. Osgood, Jr., Ed. Berlin, Germany: Springer-Verlag, 1998.
• example of a journal article in [3] [2] J. Breckling, Ed., The Analysis of Directional Time Series:
Applications to Wind Speed and Direction, ser. Lecture Notes in
• example of a conference paper in [4] Statistics. Berlin, Germany: Springer, 1989, vol. 61.
• example of a patent in [5] [3] S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, “A novel ultrathin
• example of a website in [6] elevated channel low-temperature poly-Si TFT,” IEEE Electron
Device Lett., vol. 20, pp. 569–571, Nov. 1999.
• example of a web page in [7] [4] M. Wegmuller, J. P. von der Weid, P. Oberson, and N. Gisin, “High
• example of a databook as a manual in [8] resolution fiber distributed measurements with coherent OFDR,” in
Proc. ECOC’00, 2000, paper 11.3.4, p. 109.
• example of a datasheet in [9] [5] R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-
• example of a master’s thesis in [10] to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997.
[6] (2002) The IEEE website. [Online]. Available: http://www.ieee.org/
• example of a technical report in [11]
[7] M. Shell. (2002) IEEEtran homepage on CTAN. [Online]. Available:
• example of a standard in [12] http://www.ctan.org/tex-
archive/macros/latex/contrib/supported/IEEEtran/
III. CONCLUSIONS [8] FLEXChip Signal Processor (MC68175/D), Motorola, 1996.
[9] “PDCA12-70 data sheet,” Opto Speed SA, Mezzovico, Switzerland.
The version of this template is V2. Most of the formatting [10] A. Karnik, “Performance of TCP congestion control with rate
instructions in this document have been compiled by Causal feedback: TCP/ABR and rate adaptive TCP/IP,” M. Eng. thesis, Indian
Productions from the IEEE LaTeX style files. Causal Institute of Science, Bangalore, India, Jan. 1999.
[11] J. Padhye, V. Firoiu, and D. Towsley, “A stochastic model of TCP
Productions offers both A4 templates and US Letter templates Reno congestion avoidance and control,” Univ. of Massachusetts,
for LaTeX and Microsoft Word. The LaTeX templates Amherst, MA, CMPSCI Tech. Rep. 99-02, 1999.
depend on the official IEEEtran.cls and IEEEtran.bst files, [12] Wireless LAN Medium Access Control (MAC) and Physical Layer
whereas the Microsoft Word templates are self-contained. (PHY) Specification, IEEE Std. 802.11, 1997.
6. G. References Causal Productions has used its best efforts to ensure that the
The heading of the References section must not be templates have the same appearance.
numbered. All reference items must be in 8 pt font. Please
ACKNOWLEDGMENT
use Regular and Italic styles to distinguish different fields as
shown in the References section. Number the reference items The heading of the Acknowledgment section and the
consecutively in square brackets (e.g. [1]). References section must not be numbered.
When referring to a reference item, please simply use the Causal Productions wishes to acknowledge Michael Shell
reference number, as in [2]. Do not use “Ref. [3]” or and other contributors for developing and maintaining the
“Reference [3]” except at the beginning of a sentence, e.g. IEEE LaTeX style files which have been used in the
“Reference [3] shows …”. Multiple references are each preparation of this template. To see the list of contributors,
numbered with separate brackets (e.g. [2], [3], [4]–[6]). please refer to the top of file IEEETran.cls in the IEEE LaTeX
Examples of reference items of different categories shown distribution.
in the References section include:
REFERENCES
• example of a book in [1]
[1] S. M. Metev and V. P. Veiko, Laser Assisted Microtechnology, 2nd
• example of a book in a series in [2] ed., R. M. Osgood, Jr., Ed. Berlin, Germany: Springer-Verlag, 1998.
• example of a journal article in [3] [2] J. Breckling, Ed., The Analysis of Directional Time Series:
Applications to Wind Speed and Direction, ser. Lecture Notes in
• example of a conference paper in [4] Statistics. Berlin, Germany: Springer, 1989, vol. 61.
• example of a patent in [5] [3] S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, “A novel ultrathin
• example of a website in [6] elevated channel low-temperature poly-Si TFT,” IEEE Electron
Device Lett., vol. 20, pp. 569–571, Nov. 1999.
• example of a web page in [7] [4] M. Wegmuller, J. P. von der Weid, P. Oberson, and N. Gisin, “High
• example of a databook as a manual in [8] resolution fiber distributed measurements with coherent OFDR,” in
Proc. ECOC’00, 2000, paper 11.3.4, p. 109.
• example of a datasheet in [9] [5] R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-
• example of a master’s thesis in [10] to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997.
[6] (2002) The IEEE website. [Online]. Available: http://www.ieee.org/
• example of a technical report in [11]
[7] M. Shell. (2002) IEEEtran homepage on CTAN. [Online]. Available:
• example of a standard in [12] http://www.ctan.org/tex-
archive/macros/latex/contrib/supported/IEEEtran/
III. CONCLUSIONS [8] FLEXChip Signal Processor (MC68175/D), Motorola, 1996.
[9] “PDCA12-70 data sheet,” Opto Speed SA, Mezzovico, Switzerland.
The version of this template is V2. Most of the formatting [10] A. Karnik, “Performance of TCP congestion control with rate
instructions in this document have been compiled by Causal feedback: TCP/ABR and rate adaptive TCP/IP,” M. Eng. thesis, Indian
Productions from the IEEE LaTeX style files. Causal Institute of Science, Bangalore, India, Jan. 1999.
[11] J. Padhye, V. Firoiu, and D. Towsley, “A stochastic model of TCP
Productions offers both A4 templates and US Letter templates Reno congestion avoidance and control,” Univ. of Massachusetts,
for LaTeX and Microsoft Word. The LaTeX templates Amherst, MA, CMPSCI Tech. Rep. 99-02, 1999.
depend on the official IEEEtran.cls and IEEEtran.bst files, [12] Wireless LAN Medium Access Control (MAC) and Physical Layer
whereas the Microsoft Word templates are self-contained. (PHY) Specification, IEEE Std. 802.11, 1997.