Dept. Of Electronics and Communication Engineering,
VIT UNIVERSITY,
CHENNAI CAMPUS
“Hand Gesture Recognition using Neural Network”
A Report submitted for PBL in Neural Network and Fuzzy Control (ECE407)
by
1. Akash Gupta (11BEC1038)
2. Lokesh Jindal (11BEC1043)
3. Bhagwat Singh (11BEC1070)
4. Lovish Jindal (11BEC1129)
Under the guidance and supervision of
Prof. Menaka R
Dept. of SENSE
VIT UNIVERSITY,
CHENNAI CAMPUS
Acknowledgement
We have taken efforts in this project. However, it would not have been possible without the kind
support and help of many individuals and organizations. We would like to extend our sincere
thanks to all of them.
We are highly indebted to Prof. Menaka R for her guidance and constant supervision as well as
for providing necessary information regarding the project & also for her support in completing the
project.
We would also like to express our gratitude towards VIT for providing us with this opportunity.
1. Introduction
Since the introduction of the most common input computer devices not a lot have changed. This
is probably because the existing devices are adequate. It is also now that computers have been so
tightly integrated with everyday life, that new applications and hardware are constantly introduced.
Recently, there has been a surge in interest in recognizing human hand gestures. Hand gesture
recognition has various applications like computer games, machinery control (e.g. crane), and
thorough mouse replacement. One of the most structured sets of gestures belongs to sign language.
In sign language, each gesture has an assigned meaning (or meanings). Computer recognition of
hand gestures may provide a more natural-computer interface, allowing people to point, or rotate
a CAD model by rotating their hands. Hand gestures can be classified in two categories: static and
dynamic. A static gesture is a particular hand configuration and pose, represented by a single
image. A dynamic gesture is a moving gesture, represented by a sequence of images. We will focus
on the recognition of static images. Interactive applications pose particular challenges. The
response time should be very fast. The user should sense no appreciable delay between when he
or she makes a gesture or motion and when the computer responds. The computer vision
algorithms should be reliable and work for different people.
1.1 American Sign Language
American Sign Language is the language of choice for most deaf people in the United States. It is
part of the “deaf culture” and includes its own system of puns, inside jokes, etc. However, ASL is
one of the many sign languages of the world. As an English speaker would have trouble
understanding someone speaking Japanese or Hindi, a speaker of ASL would have trouble
understanding the Sign Language of Sweden. ASL also has its own grammar that is different from
English. ASL consists of approximately 6000 gestures of common words with finger spelling used
to communicate obscure words or proper nouns. Finger spelling uses one hand and 26 gestures to
communicate the 26 letters of the alphabet. Some of the signs can be seen in Figure 1 below.
Figure 1 ASL Examples
Another interesting characteristic that will be ignored by this project is the ability that ASL offers
to describe a person, place or thing and then point to a place in space to temporarily store for later
reference.
ASL uses facial expressions to distinguish between statements, questions and directives. The
eyebrows are raised for a question, held normal for a statement, and furrowed for a directive. There
has been considerable work and research in facial feature recognition, they will not be used to aid
recognition in the task addressed. This would be feasible in a full real-time ASL dictionary.
2. Approach
2.1 Image Database
The starting point of the project was the creation of a database with all the images that would be
used for training and testing. The image database can have different formats. Images can be either
hand drawn, digitized photographs or a 3D dimensional hand. Photographs were used, as they are
the most realistic approach.
The construction of such a database is clearly dependent on the application. If the application is a
crane controller for example operated by the same person for long periods the algorithm doesn’t
have to be robust on different person’s images. In this case noise and motion blur should be
tolerable.
In the first row there are the training images. In the second there are the testing images.
(A) (B) (C)
Figure 2. Training Images
(A) (B) (C)
Figure 3. Testing Images
Figure 4. Pattern Recognition System
2.2 Orientation Histogram:
One can calculate the local orientation using image gradients. We used two 3 – tap x and y
derivative filters. The outputs of the x and y derivative operators will be dx and dy. Then the
gradient direction is atan (dx, dy). We had decided to use the edge orientation as the only feature
that will be presented to the neural network. The reason for this is that if the edge detector was
good enough it would have allowed us to test the network with images from different databases.
Another feature that could have been extracted from the image would be the gradient magnitude
using the formula below:
√𝑑𝑥2 + 𝑑𝑦2
This would lead though to testing the algorithm with only similar images. Apart from this the
images before resized should be of approximately the same size. This is the size of the hand itself
in the canvas and not the size of the canvas. Once the image has been processed the output will be
a single vector containing a number of elements equal to the number of bins of the orientation
histogram.
Figure 5 shows the orientation histogram calculation for a simple image. Blurring can be used to
allow neighboring orientations to sense each other.
Figure 5. Orientation Histogram
3. MATLAB Code:
Part-1. Image Processing Code:
F = menu('Choose a database set','Test Set','Train Set');
if F==1
K = menu('Choose a file','Test A');
if K == 1
loop=3
for i=1:loop
string = ['testa' num2str(i) '.tif'];
Rimages{i} = imread(string);
end
end
end;
if F==2
loop=3
L = menu('Choose a file','Train A');
if L == 1
for i=1:loop
string = ['traina' num2str(i) '.tif'];
Rimages{i} = imread(string);
end
end
end
for i=1:loop
T_{i}=imresize(Rimages{i},[150,140]);
x = [0 -1 1];
y = [0 1 -1]';
dx_{i} = convn(T_{i},x,'same');
dy_{i} = convn(T_{i},y,'same');
gradient_{i} = dy_{i} ./dx_{i};
theta_{i} = atan(gradient_{i});
cl{i}= im2col(theta_{i},[1 1],'sliding');
N{i} = (cl{i}*180)/3.14159265359;
c1{i}=(N{i}>0)&(N{i}<10);
s1{i}=sum(c1{i});
c2{i}=(N{i}>10.0001)&(N{i}<20);
s2{i}=sum(c2{i});
c3{i}=(N{i}>20.0001)&(N{i}<30);
sum(c3{i});
s3{i}=sum(c3{i});
c4{i}=(N{i}>30.0001)&(N{i}<40);
sum(c4{i});
s4{i}=sum(c4{i});
c5{i}=(N{i}>40.0001)&(N{i}<50);
sum(c5{i});
s5{i}=sum(c5{i});
c6{i}=(N{i}>50.0001)&(N{i}<60);
sum(c6{i});
s6{i}=sum(c6{i});
c7{i}=(N{i}>60.0001)&(N{i}<70);
sum(c7{i});
s7{i}=sum(c7{i});
c8{i}=(N{i}>70.0001)&(N{i}<80);
sum(c8{i});
s8{i}=sum(c8{i});
c9{i}=(N{i}>80.0001)&(N{i}<90);
sum(c9{i});
s9{i}=sum(c9{i});
c10{i}=(N{i}>90.0001)&(N{i}<100);
sum(c10{i});
s10{i}=sum(c10{i});
c11{i}=(N{i}>-89.9)&(N{i}<-80);
sum(c11{i});
s11{i}=sum(c11{i});
c12{i}=(N{i}>-80.0001)&(N{i}<-70);
sum(c12{i});
s12{i}=sum(c12{i});
c13{i}=(N{i}>-70.0001)&(N{i}<-60);
sum(c13{i});
s13{i}=sum(c13{i});
c14{i}=(N{i}>-60.0001)&(N{i}<-50);
sum(c14{i});
s14{i}=sum(c14{i});
c15{i}=(N{i}>-50.0001)&(N{i}<-40);
sum(c15{i});
s15{i}=sum(c15{i});
c16{i}=(N{i}>-40.0001)&(N{i}<-30);
sum(c16{i});
s16{i}=sum(c16{i});
c17{i}=(N{i}>-30.0001)&(N{i}<-20);
sum(c17{i});
s17{i}=sum(c17{i});
c18{i}=(N{i}>-20.0001)&(N{i}<-10);
sum(c18{i});
s18{i}=sum(c18{i});
c19{i}=(N{i}>-10.0001)&(N{i}<-0.0001);
sum(c19{i});
s19{i}=sum(c19{i});
D{i}= [s1{i} s2{i} s3{i} s4{i} s5{i} s6{i} s7{i} s8{i} s9{i} s10{i} s11{i} s12{i} s13{i} s14{i} s15{i} s16{i}
s17{i} s18{i} s19{i}];
D1=D.';
end;
Part-2. Neural Network Code:
fid = fopen('train.txt','rt');
P1 = fscanf(fid,'%f',[19,inf]);
P=P1.';
P=dct(P);
fid = fopen('test.txt','rt');
TS1 = fscanf(fid,'%f',[19,inf]);
Z=TS1.';
Z=dct(Z);
fid = fopen('target.txt','rt');
T1 = fscanf(fid,'%f',[1,inf]);
T=T1.';
fid = fopen('a.txt','rt');
Y1 = fscanf(fid,'%f',[1,inf]);
Y=Y1.';
And now use Neural Network Toolbox and train the network and also validate and test the
network.
To start Neural Network Toolbox, use this command: nnstart
4. Results:
After training the neural network, we got 100% correct outputs for training; this means that
network is trained successfully.
Now, we have tested the network using “test images” and network have given 100% true results
after 3 iterations.
5. References:
 http://matlabsproj.blogspot.in/2012/06/hand-gesture-recognition-using-neural.html
 http://matlabsproj.blogspot.in/search?q=neural
 http://www.technogallery.in/2013/08/gesture-recognition-using-neural.html
 Hand Gesture Recognition Using Neural Networks by Klimis Symeonidis

Hand Gesture Recognition using Neural Network

  • 1.
    Dept. Of Electronicsand Communication Engineering, VIT UNIVERSITY, CHENNAI CAMPUS “Hand Gesture Recognition using Neural Network” A Report submitted for PBL in Neural Network and Fuzzy Control (ECE407) by 1. Akash Gupta (11BEC1038) 2. Lokesh Jindal (11BEC1043) 3. Bhagwat Singh (11BEC1070) 4. Lovish Jindal (11BEC1129) Under the guidance and supervision of Prof. Menaka R Dept. of SENSE VIT UNIVERSITY, CHENNAI CAMPUS
  • 2.
    Acknowledgement We have takenefforts in this project. However, it would not have been possible without the kind support and help of many individuals and organizations. We would like to extend our sincere thanks to all of them. We are highly indebted to Prof. Menaka R for her guidance and constant supervision as well as for providing necessary information regarding the project & also for her support in completing the project. We would also like to express our gratitude towards VIT for providing us with this opportunity.
  • 3.
    1. Introduction Since theintroduction of the most common input computer devices not a lot have changed. This is probably because the existing devices are adequate. It is also now that computers have been so tightly integrated with everyday life, that new applications and hardware are constantly introduced. Recently, there has been a surge in interest in recognizing human hand gestures. Hand gesture recognition has various applications like computer games, machinery control (e.g. crane), and thorough mouse replacement. One of the most structured sets of gestures belongs to sign language. In sign language, each gesture has an assigned meaning (or meanings). Computer recognition of hand gestures may provide a more natural-computer interface, allowing people to point, or rotate a CAD model by rotating their hands. Hand gestures can be classified in two categories: static and dynamic. A static gesture is a particular hand configuration and pose, represented by a single image. A dynamic gesture is a moving gesture, represented by a sequence of images. We will focus on the recognition of static images. Interactive applications pose particular challenges. The response time should be very fast. The user should sense no appreciable delay between when he or she makes a gesture or motion and when the computer responds. The computer vision algorithms should be reliable and work for different people. 1.1 American Sign Language American Sign Language is the language of choice for most deaf people in the United States. It is part of the “deaf culture” and includes its own system of puns, inside jokes, etc. However, ASL is one of the many sign languages of the world. As an English speaker would have trouble understanding someone speaking Japanese or Hindi, a speaker of ASL would have trouble understanding the Sign Language of Sweden. ASL also has its own grammar that is different from English. ASL consists of approximately 6000 gestures of common words with finger spelling used to communicate obscure words or proper nouns. Finger spelling uses one hand and 26 gestures to communicate the 26 letters of the alphabet. Some of the signs can be seen in Figure 1 below.
  • 4.
    Figure 1 ASLExamples Another interesting characteristic that will be ignored by this project is the ability that ASL offers to describe a person, place or thing and then point to a place in space to temporarily store for later reference. ASL uses facial expressions to distinguish between statements, questions and directives. The eyebrows are raised for a question, held normal for a statement, and furrowed for a directive. There has been considerable work and research in facial feature recognition, they will not be used to aid recognition in the task addressed. This would be feasible in a full real-time ASL dictionary. 2. Approach 2.1 Image Database The starting point of the project was the creation of a database with all the images that would be used for training and testing. The image database can have different formats. Images can be either hand drawn, digitized photographs or a 3D dimensional hand. Photographs were used, as they are the most realistic approach. The construction of such a database is clearly dependent on the application. If the application is a crane controller for example operated by the same person for long periods the algorithm doesn’t have to be robust on different person’s images. In this case noise and motion blur should be tolerable. In the first row there are the training images. In the second there are the testing images.
  • 5.
    (A) (B) (C) Figure2. Training Images (A) (B) (C) Figure 3. Testing Images Figure 4. Pattern Recognition System
  • 6.
    2.2 Orientation Histogram: Onecan calculate the local orientation using image gradients. We used two 3 – tap x and y derivative filters. The outputs of the x and y derivative operators will be dx and dy. Then the gradient direction is atan (dx, dy). We had decided to use the edge orientation as the only feature that will be presented to the neural network. The reason for this is that if the edge detector was good enough it would have allowed us to test the network with images from different databases. Another feature that could have been extracted from the image would be the gradient magnitude using the formula below: √𝑑𝑥2 + 𝑑𝑦2 This would lead though to testing the algorithm with only similar images. Apart from this the images before resized should be of approximately the same size. This is the size of the hand itself in the canvas and not the size of the canvas. Once the image has been processed the output will be a single vector containing a number of elements equal to the number of bins of the orientation histogram. Figure 5 shows the orientation histogram calculation for a simple image. Blurring can be used to allow neighboring orientations to sense each other. Figure 5. Orientation Histogram
  • 7.
    3. MATLAB Code: Part-1.Image Processing Code: F = menu('Choose a database set','Test Set','Train Set'); if F==1 K = menu('Choose a file','Test A'); if K == 1 loop=3 for i=1:loop string = ['testa' num2str(i) '.tif']; Rimages{i} = imread(string); end end end; if F==2 loop=3 L = menu('Choose a file','Train A'); if L == 1 for i=1:loop string = ['traina' num2str(i) '.tif']; Rimages{i} = imread(string); end end end for i=1:loop T_{i}=imresize(Rimages{i},[150,140]); x = [0 -1 1]; y = [0 1 -1]'; dx_{i} = convn(T_{i},x,'same'); dy_{i} = convn(T_{i},y,'same'); gradient_{i} = dy_{i} ./dx_{i}; theta_{i} = atan(gradient_{i}); cl{i}= im2col(theta_{i},[1 1],'sliding'); N{i} = (cl{i}*180)/3.14159265359; c1{i}=(N{i}>0)&(N{i}<10); s1{i}=sum(c1{i}); c2{i}=(N{i}>10.0001)&(N{i}<20); s2{i}=sum(c2{i}); c3{i}=(N{i}>20.0001)&(N{i}<30); sum(c3{i}); s3{i}=sum(c3{i}); c4{i}=(N{i}>30.0001)&(N{i}<40); sum(c4{i}); s4{i}=sum(c4{i}); c5{i}=(N{i}>40.0001)&(N{i}<50); sum(c5{i}); s5{i}=sum(c5{i}); c6{i}=(N{i}>50.0001)&(N{i}<60); sum(c6{i}); s6{i}=sum(c6{i});
  • 8.
    c7{i}=(N{i}>60.0001)&(N{i}<70); sum(c7{i}); s7{i}=sum(c7{i}); c8{i}=(N{i}>70.0001)&(N{i}<80); sum(c8{i}); s8{i}=sum(c8{i}); c9{i}=(N{i}>80.0001)&(N{i}<90); sum(c9{i}); s9{i}=sum(c9{i}); c10{i}=(N{i}>90.0001)&(N{i}<100); sum(c10{i}); s10{i}=sum(c10{i}); c11{i}=(N{i}>-89.9)&(N{i}<-80); sum(c11{i}); s11{i}=sum(c11{i}); c12{i}=(N{i}>-80.0001)&(N{i}<-70); sum(c12{i}); s12{i}=sum(c12{i}); c13{i}=(N{i}>-70.0001)&(N{i}<-60); sum(c13{i}); s13{i}=sum(c13{i}); c14{i}=(N{i}>-60.0001)&(N{i}<-50); sum(c14{i}); s14{i}=sum(c14{i}); c15{i}=(N{i}>-50.0001)&(N{i}<-40); sum(c15{i}); s15{i}=sum(c15{i}); c16{i}=(N{i}>-40.0001)&(N{i}<-30); sum(c16{i}); s16{i}=sum(c16{i}); c17{i}=(N{i}>-30.0001)&(N{i}<-20); sum(c17{i}); s17{i}=sum(c17{i}); c18{i}=(N{i}>-20.0001)&(N{i}<-10); sum(c18{i}); s18{i}=sum(c18{i}); c19{i}=(N{i}>-10.0001)&(N{i}<-0.0001); sum(c19{i}); s19{i}=sum(c19{i}); D{i}= [s1{i} s2{i}s3{i} s4{i} s5{i} s6{i} s7{i} s8{i} s9{i} s10{i} s11{i} s12{i} s13{i} s14{i} s15{i} s16{i} s17{i} s18{i} s19{i}]; D1=D.'; end; Part-2. Neural Network Code: fid = fopen('train.txt','rt'); P1 = fscanf(fid,'%f',[19,inf]); P=P1.'; P=dct(P);
  • 9.
    fid = fopen('test.txt','rt'); TS1= fscanf(fid,'%f',[19,inf]); Z=TS1.'; Z=dct(Z); fid = fopen('target.txt','rt'); T1 = fscanf(fid,'%f',[1,inf]); T=T1.'; fid = fopen('a.txt','rt'); Y1 = fscanf(fid,'%f',[1,inf]); Y=Y1.'; And now use Neural Network Toolbox and train the network and also validate and test the network. To start Neural Network Toolbox, use this command: nnstart 4. Results: After training the neural network, we got 100% correct outputs for training; this means that network is trained successfully.
  • 10.
    Now, we havetested the network using “test images” and network have given 100% true results after 3 iterations. 5. References:  http://matlabsproj.blogspot.in/2012/06/hand-gesture-recognition-using-neural.html  http://matlabsproj.blogspot.in/search?q=neural  http://www.technogallery.in/2013/08/gesture-recognition-using-neural.html  Hand Gesture Recognition Using Neural Networks by Klimis Symeonidis