by Oraya Sawettanusorn1, Yasutaka Senda2, Shinjiro Kawato3, Nobuji Tetsutani4, and Hironori Yamauchi51,2,5 Ritsumeikan University, Shiga, Japan3,4 ATR Media Information Science Laboratories , Kyoto, JapanAbstract :
In this paper, we propose a real-time face detection algorithm using Six-Segmented Rectangular (SSR) filter, distance information, and template matching technique. Between-the-Eyes is selected as face representative in our detection because its characteristic is common to most people and is easily seen for a wide range of face orientation. Firstly, we scan a certain size of rectangle divided into six segments throughout the face image. Then their bright-dark relations are tested if its center can be a candidate of Between-the-Eyes. Next, the distance information obtained from stereo camera and template matching is applied to detect the true Between-the-Eyes among candidates. We implement this system on PC with Xeon 2.2 GHZ. The system can run at real-time speed of 30 frames/sec with detection rate of 92%.
The current evolution of computer technologies has enhanced various applications in human-computer interface. Face and gesture recognition is a part of this field, which can be applied in various applications such as in robotic, security system, driversÂ¡Â¯ monitor, and video coding system.
Since human face is a dynamic object and has a high degree of variability, various techniques have been proposed previously. Based on the survey of Hjelmas , he has classified face detection techniques into two categories: featurebased approach and image-based approach. The techniques in the first category makes used of apparent properties of face such as face geometry, skin color, and motion. Even feature-based technique can achieve high speed in face detection, but it also has problem in poor reliability under lighting condition. For second category, the imagebased approach takes advantage of current advance in pattern recognition theory. Most of the imagebased approach applies a window scanning technique for detecting face , which requires large computation. Therefore, by using only imagebased approach is not suitable enough in real-time application.
In order to achieve high speed and reliable face detection system, we propose the method combine both feature-based and image-based approach to detect the point between the eyes (hereafter we call it Between-the-Eyes) by using Six-Segmented Rectangular filter (SSR filter). The proposed SSR filter, which is the rectangle divided into 6 segments, operates by using the concept of bright-dark relation around Between-the-Eyes area. We select Between-the-Eyes as face representative because it is common to most people and easy to find for wide range of face orientation . Between-the-Eyes has dark part (eyes and eyebrows) on both sides, and has comparably bright part on upper side (forehead), and lower side (nose and cheekbone). This characteristic is stable for any facial expression .
In this paper, we use an intermediate representation of image called Â¡Â°integral imageÂ¡Â± from Viola and JonesÂ¡Â¯ work  to calculate sums of pixel values in each segment of SSR filter. Firstly, SSR filter is scanned on the image and the average gray level of each segment is calculated from integral image. Then, the bright-dark relations between each segment are tested to see whether its center can be a candidate point for Between-the- Eyes. Next, the stereo camera is used to find the distance information and the suitable Between-the- Eyes template size. Then, the Between-the-Eyes candidates are evaluated by using a template of Between-the-Eyes (obtained from 400 images of 40 people from ORL face database ) matching technique. Finally the true Between-the-Eyes can be detected.
The proposed technique gains advantage of using only the gray level information so it is more reliable for changes of lighting conditions. Moreover, this method is also not affected by beards, mustaches, hair, or nostril visibility, since only the information around eyes, eyebrows and nose area is required. We implement this system on PC with Xeon 2.2 GHz CPU. The system can run at 30 frames/sec with detection rate of 92%.
In Section 2, we describe the concept of integral image followed by the explanation of using SSR filter to extract Between-the-Eyes candidates in Section 3. For Section 4, we explain the candidate selection method by using stereo camera and average Between-the-Eyes template matching technique. Then in Section 5, the whole system of real-time face detection system is shown. The experimental results are shown in Section 6 and end up with conclusion in Section 7.
2. Integral Image
The SSR filter is computed by using intermediate representation for image called Â¡Â°integral imageÂ¡Â±. For the original image i(x, y), the integral image is defined as 
The integral image can be computed in one pass over the original image by the following pair of recurrences.
s(x ,y ) = s(x , y - 1) + i(x ,y) (2)
ii(x ,y ) = ii(x - 1, y) + s(x ,y ) (3)
Where s(x ,y ) is the cumulative row sum, s(x , -1) = 0, and ii(-1, y) = 0.
Using the integral image, the sum of pixels within rectangle D (rs) can be computed at high speed with four array references as shown in Fig.1.
sr = (ii (x ,y ) + ii(x - W, y - L)) - (ii (x - W, y) + ii(x, y - L )) (4)
Figure 1. Integral Image
3. SSR filter
3.1 SSR filter
At the beginning, a rectangle is scanned throughout the input image. This rectangle is segmented into six segments as shown in Fig.2 (a).
Figure 2. SSR Filter
We denote the total sum of pixel value of each segment (B1 Â¡Â B6) as 1 6 b b S S Â¡Â . The proposed SSR filter is used to detect the Between-the-Eyes based on two characteristics of face geometry.
(1) The nose area ( n S ) is brighter than the right and left eye area ( er S and el S , respectively) as shown in Fig.2 (b), where
Sn = Sb2 + Sb5
Ser = Sb1 + Sb4
Sel = Sb3 + Sb6
Sn > Ser (5)
Sn > Sel (6)
(2) The eye area (both eyes and eyebrows) ( e S ) is relatively darker than the cheekbone area (including nose) ( c S ) as shown in Fig. 2 (c), where
Se = Sb1 + Sb2 + Sb3
Sc = Sb4 + Sb5 + Sb6
Se < Sc (7)
When expression (5), (6), and (7) are all satisfied, the center of the rectangle can be a candidate for Between-the-Eyes.
Figure 3. Between-the-Eyes candidates from SSR filter
In Fig.3 (b), the Between-the-Eyes candidate area is displayed as the white areas and the non-candidate area is displayed as the black part. By performing labeling process on Fig. 3 (b), the result of using SSR filter to detect Between-the-Eyes candidates is shown in Fig. 3 (a).
3.2 Filter Size Estimation
In order to find the most suitable filter size, we use 400 facial images of 40 people, i.e., 10 for each from ORL face database . The images were taken at different time, under various lighting condition, at different gesture, and with and without eyeglasses. Each image size is 92×112 with 256 gray levels.
We perform filter size estimation manually for all 400 facial images to find the standard filter size, which covers two eyes, two eyebrows, and cheekbone area (including nose). The result is estimated to be a rectangle of size 60×30 pixels. In the experiment, we counted whether a true candidate is included or is in vicinity area. By varying the standard filter size of 60×30 by 20%, the true candidate point detection rate and the number of candidate of each filter size are shown in Table 1.
The standard filter size of 60×30 can obtain 92% detection rate, which prove that this filter size can function effectively. On the other hand, the detection rate becomes worse (52%), when we use the filter of size 84×42 pixels, because large filter size may include some unnecessary parts of face such as hair or beard. Since the sum of pixel value is used in expression (5), (6), and (7), the filter of size 24×12 and 36×18 as shown in Fig. 4 can achieve unexpected high detection rate of Betweenthe- Eyes even these filter sizes do not completely contain both eyes area, because only some parts of eyes are still darker than nose area.
Fig. 5 is the examples of some successful Between-the-Eyes detections, where some of failures are shown in Fig. 6. These detection errors may cause by the illumination. The detection failure of the middle image in Fig. 6 is mainly influenced by the reflection on the eyeglasses.
Figure 4. Various size of SSR filter
Figure 5. Examples of successful Between-the-Eyes detection
Figure 6. Examples of failures in Between-the-Eyes detection
Moreover Fig. 7 is the example of successful Between-the-Eyes detection for image, which has horizontal illumination hits on one side of face. In this case, SSR filter can also function effectively even if one side of face is covered by shadow. Therefore SSR filter can be used to detect Betweenthe- Eyes under variations of lighting condition.
Table 1. Detection results of various SSR filter sizes (from 400 face images)
Figure 7. Example of successful Between-the-Eyes detection for face, which has illumination hits on one side
According to Table 1, the rectangular of size 0.6~1.2 times of standard size (60×30) can be used to detect candidate of Between-the-Eyes. Therefore various size of face image from 0.83~1.67 times of the standard image (92×112 pixels) can be detected by our proposed SSR filter.
4. Candidate Selection
4.1 Stereo camera
In real situation, face size varies according to the distance from face to cameras. We use two cameras to construct a binocular stereo system to find the distance information, so that a suitable size of Between-the-Eyes template can be estimated for further template matching technique discussed later in Section 4.2. Since the stereo camera system is the general process, the detail explanation is omitted in this paper.
We performed the experiments to find the suitable size of Between-the-Eyes template by using the difference among right and left images based on the principle of binocular stereo camera system. Firstly, we measured the horizontal different in pixel between the Between-the-Eyes of face image obtained from right and left cameras manually. Then the width between the right and left temples is manually measured, which should be corresponded to the width of the template of Between-the-Eyes.
The relation between disparities and suitable templates sizes of Between-the-Eyes is shown in Fig.8. Based on this relation, we can select an appropriate size of the template according to the measured disparity in an actual scene. This is why our proposed technique is applicable to faces at various distances between 0.5-3.5 m. from the cameras.
From experiments and relation in Fig.8, we can find relations between SSR filter size, disparity, and size of Between-the-Eyes template as shown in Table 2. Only two filter size: 40×20 and 24×12 are used since they are flexible enough to detect face within pre-defined range. For example, face of disparity equal to 20, the SSR filter of size 40×20 is used and the template size of Between-the-Eyes is 48×24 pixels. Then the template is scaled to match the average Between-the-Eyes template size for template matching technique. For the face of disparity outside the range shown in table 2 is assumed to be undetectable.
Figure 8. The relation between the horizontal differences in pixel (disparity) and the Between-the-Eyes template size
Table 2. Filter size, disparity, and related Between-the-Eyes template size
4.2 Average Between-the-Eyes Template Matching
Because the SSR filter extracts not only the true Between-the- Eyes but also some false candidates, so we use the average Between-the-Eyes template matching technique to solve this problem. The average Between-the-Eyes pattern used in this paper obtained in the same manner as  from 400 face images of 40 people from ORL face database .
Figure 9. Average Between-the-Eyes template and its variance pattern
Fig. 9 is the average Between-the-Eyes template and its variance pattern of size 32×16. The gray levels of each sample were normalized to have average gray level equal to zero and variance equal to one. Then we calculated an average pattern and its variance at each pixel. Next, the gray level was converted to have the average level equal to 128 with standard deviation of 64. Then we can get the average pattern as an image. To obtain the variance pattern, each value was multiply by 255. Both average and variance pattern are symmetry.
To avoid the influence of unbalanced illumination, we evaluate the right and left part of face separately because lighting condition is likely different between right and left half of face. Moreover, we also avoid the affect of hair and beard, and reduce calculation load by discard the top three rows from the calculation. At the end, the pattern of 16×13 pixels (for one side) is used in template matching.
Define the average Between-the-Eyes template and its variance for left side of face as , and for the right side as t1ij, v1ij (i=0,...,15, j = 3, ...., 15) and trij, vrij (i=0,...,15, j = 3, ...., 15). trij and t1ij have average value of 128 with standard deviation of 64, where vr and v1 represent maximum gray level equal to 255.
To evaluate the candidates, we define the Betweenthe- Eyes pattern as pmn (m=0,...,31, n = 0, ...., 15) . Then right and left half of pmn is re-defined again separately as prij (i=0,...,15, j = 3, ...., 15) and p1ij (i=0,...,15, j = 3, ...., 15), respectively, each has been converted to have average value of 128 and standard deviation of 64.
Then the left mismatching value (Dl) and the right mismatching value (Dr) are calculated by using the following equation.
Only the candidate with both Dl and Dr less than pre-defined threshold ( D ) is counted as the true candidate. For the case of more than one candidate has bothDl and Dr less than threshold, the candidate with the smallest mismatch value is judged as the true Between-the-Eyes candidate.
4.3 Detection of Eye-Like Points
Since Between-the-Eyes is located in the middle of left and right eye alignment, we perform detection of both eyes to confirm the location of the true Between-the-Eyes. When the locations of both eyes are extracted from the selected face area, the Between-the-Eyes is re-registered as the middle point among them.
We search eyes area from Between-the-Eyes template obtained from Section 4.1. The eye detection is done in a simple way as a technique used in . In order to avoid the influence of illumination, we perform the right eye and left eye search independently. Firstly, the rectangular areas on both side of the Between-the-Eyes candidate where the eyes should be found are extracted. In this paper, for the selected Between-the-Eyes area of size 32×16, we avoid the affect of eyebrows, hair, and beard by ignore 1 pixel at boarder. Then both eyes areas are assumed to be at 12×14 pixels on each side of face (neglect three pixel in the middle of Between-the- Eyes template as nose area).
Next, we find the threshold level for each area to binarize the image. The threshold level is determined when the sum of the number of pixels of all components except the boarder exceeds a pre-defined value  (10 in this paper). In some case, the eyebrows have almost the same gray level as the eyes. So we select the area within a certain range of pixels (5~25 pixels) with the lowest position.
To solve the problem in similarity of gray level of eyes and eyebrows, the searching process using the concept of left and right eye alignment is performed. The range of this process focuses on the 3×3 pixels in the middle of both eyes area. Then condition of the distance between the located eyes ( De) and the angle ( Ae) at Between-the-Eyes candidate are tested using the following expression. Both expressions are obtained from experiments.
15 < De < 21 (10)
115° < Ae < 180° (11)
Only the candidate with eyes relation satisfies both condition is re-registered as the true Between-the-Eyes. Otherwise, the Between-the-Eyes and eyes area cannot determine.
5. Real-Time Face Detection System
The processing flow of Real-Time face detection system is shown in Fig. 10.
Figure 10. Processing Flow of Real-Time Face Detection
We implement the system on PC with Xeon 2.2 GHz CPU. In the experiment, two commercial NTSC video cameras, multivideo composer, and video capture board without any special hardware is used. Two NTSC cameras are used to construct a binocular stereo system. The multi-video composer combines four NTSC video signals into one NTSC signal. We use only two NTSC video signals from multi-video composer in our experiment. Each video image becomes one half of the original size. Therefore, the captured image size for each camera is 320×240. However, to avoid the interlaced scanning problem for moving object, we use only even line data. Consequently, the image size is 320×120 for each camera. The resulting horizontal image resolution is double of the vertical one as shown in the bottom two images in Fig. 11. We keep this non-uniform resolution to obtain as accurate disparity as possible.
On the other hand, we need a regular image for applying template matching of Between-the-Eyes. Therefore we reconstruct a smaller image by sub-sampling as shown in uppermost-left image of Fig.11.
Fig. 11 is the face detection result from the experiment performed in the laboratory with unspecified background. The uppermost-left image is a monochrome image of the right camera with only the green component. The Betweenthe Eyes detection is applied to this (160×120) monochrome image. The lower image is the image obtained from the right camera, and the lowest image is obtained from the left camera.
Figure 11. Face Detection Result
The detection result from SSR filter is shown in the uppermost-right image. The upper corner is the Between-the Eyes candidate area after cutting and scaling to match the average matching template. Its binarized image of detected eyes and eyebrows after eye detection process is displayed below. Anyway, since no information in the inclination of face is used in SSR filter, this technique cannot be used to detect face with inclination larger than 10 ° . For the case of large reflection at eyeglasses, our proposed technique also failed to detect the true Between-the Eyes occasionally. In real implementation, the system can operate at 30 frames/sec, which achieve real-time processing speed.
We propose a real-time face detection system consists of three major components: SSR filter, stereo camera system, and average Between-the-Eyes template matching unit. At the beginning, a SSR filter, in which bright-dark relations of average gray levels of each segments are tested if its center can be Between-the-Eyes candidate. At this point, we used Â¡Â°integral imageÂ¡Â± proposed by Viola  in SSR filter calculation in order to obtain real-time scanning of filter throughout the image. Since only gray information is used, our proposed technique is more reliable for changes of lighting conditions than skin color extraction methods. Next, stereo camera system is performed to find distance information so that the suitable size of Between-the-Eyes template can be estimated. This technique can be used to reduce calculation load and to detect faces of different size. Then we performed the average Between-the-Eyes template matching to select the true candidate, followed by the detection of both eye areas to verify our detection result. We implemented the system on PC with Xeon 2.2 GHz. The system ran at 30 frames/sec, which satisfied realtime processing speed. Anyway our proposed technique still has limitation in face orientation. Further development to solve this problem should be performed.
-  E.Hjelmas and B.K.Low, Â¡Â°Face Detection: A survey,Â¡Â± Computer Vision and Image Understanding, 83(3),pp.236- 274,2001.
-  S.Kawato and N. Tetsutani, Â¡Â°Real-time detection of Between-the-Eyes with a Circle Frequency Filter,Â¡Â± ACCV2002: The 5th Asian Conference on Computer Vision, 23-25 January 2002, Melbourne, Australia.
-  P.Viola and M.Jones, Â¡Â°Rapid object Detection using a Boosted Cascade of Simple Features,Â¡Â± Proc. of IEEE Conf.CVRP,1, pp.511-518,2001.
-  AT&T Laboratories Cambridge, Â¡Â°The ORL face database,Â¡Â± http://www.uk.research.att.com/facedatabase.html
-  J.Yang and A.Waibel, Â¡Â°A real-time face tracker,Â¡Â± Proc.3rd IEEE Workshop on Application of Computer Vision, pp.142- 147, 1996.
-  S.Kawato and J.Ohya, Â¡Â°Two-step Approach for Real- Time Eye Tracking with a New Filtering Technique,Â¡Â± SMC200: IEEE Int. Conf. on Systems, Man & Cybernetics, pp. 1366-1371, 08-11 Oct. 2000, Nashville, Tennessee, USA.