|
NRC-IIT Facial Video Database
NRC-CNRC |
IIT-ITI |
CV Group |
Perceptual Vision TechSite
FRiV
technology | Associative Neural
Networks |
Publications 
Location: /db/video/faces/cvglab
User agreement: The data from this database is freely granted for research use, provided the
reference for one of the following papers is given.
-
Dmitry O. Gorodnichy Video-based
framework for face recognition in video.
Second Workshop on Face Processing in Video (FPiV'05) in Proceedings of
Second Canadian Conference on Computer and Robot Vision (CRV'05), pp.
330-338, Victoria, BC, Canada, 9-11 May, 2005. ISBN 0-7695-2319-6. NRC
48216. [Abstract
& Pdf]
-
Dmitry O. Gorodnichy, Associative neural networks
as means for low-resolution video-based recognition.
International Joint Conference on Neural Networks (IJCNN'05), Montreal,
Quebec, Canada, July 31-August 4, 2005. NRC 48217. [Abstract
& Pdf]
More about this research: FRiV
Technical
Website (publications).
Preface & Goal: This video-based face database has been
created in order to provide the performance evaluation criteria for
the techniques developed and to be developed for face recognition in video
(FRiV) and also in order to study the effect of different factors and parameters, of
which there many influencing the recognition performance in the long chain from
the capturing the video to saying a name of person in the video, on the
recognition performance.
The Driving Premise: With the help of this and other databases we are
collecting, we aim at showing that, as humans, computers can also
recognize faces in video quite adequately as long as faces have at least the
resolution of 12 pixels between the eyes, which we call the nominal face
resolution.
Description: this database contains pairs of short video clips
each showing a face of a computer user sitting in front of the monitor
exhibiting a wide range of facial expressions and orientations as
captured by a Intel webcam mounted on the computer monitor.
The video capture resolution is kept to 160 x 120. With the face occupying 1/4
to 1/8 of the image (measured by width), this translates into a commonly
observed on a TV screen situation when a face of a TV show occupies 1/8 to 1/16
of the screen.
Setup: For this Database, two video clips of each person are taken one
after another, under approximately the same illumination conditions (no
sunlight, only ceiling light evenly distributed over the room), the same setup
and almost the same background, for all persons in the database.
This database is thus most suited for testing the recognition
performance with respect to such inherent to video-based recognition factors as:
- low resolution, - motion blur, - out-of focus factor, - facial expression variation, - facial orientation variation,-
occlusions.
We are also in process of creating other Databases with a) the same individuals
taken in a different office (with sunlight) a few months later with the
same type webcam and b) the same individuals captured by a hand held VHS video
camera, which will be possible to use to test the recognition performance wrt to
illumination and hairstyle variation factors.
Characteristics:
- Resolution: 160x120.
- Average file size: 1.5 MB (Video : 900 KB, Audio : 600 KB -
not used )
- Average duration: 10 - 20 secs. Average total number of frames
in a clip: 300.
- Video type and compression: colour AVI, 20.0 fps (unless indicated
otherwise). Intel webcam-provided codec compressed at 481 Kbps.
Number of registered individuals: 11
Available for download:
- All 11*2=22 video sequences can be downloaded as one 28MB zip file
or individually from /avi
directory or by clicking on snapshots below.
- Executable files developed for the project are available from
our FRiV Technical Website.
- Log files showing our recognition recognition results are also available. See also
Readme for Results and Logs.
Snapshots and representative frame-based association-based recognition
results
In order to provide an idea of the quality/amount/uniqueness of facial data in each video pair
(and also as a reference data for other face recognition techniques to be tested
using this database), the
representative recognition results for each person are given in the third
column, as obtained by our technology.
For explanation of the numbers see Remarks below.
Click on the snapshot to view the video.
| ID |
clips used to
memorize a person |
clips used to test recognition of
a person |
most frequent
neural outcome: 0000000000
& recognition results **** :
10 | 11 | 01 (02)
| 00
|
| 0 |

228. 77/121 * |

249. 131/63 * |
00000000000
0 | 0 | 73 (15) | 112
|
| 1 |
237. 96/1 |

329. 54/0 |
01000000000
49 | 4 | 0 | 1
|
| 2 |
257. 176/3 |

339. 184/2 |
00100000000
175 | 0 | 3 | 8
|
| 3 |
286. 232/2 |

357. 308/2 |
00010000000
288 | 1 | 2 | 19
|
| 4 |
353. 172/1 |

404. 276/3 |
00001000000
163| 1 | 11 | 98
|
| 5 |
78. 78/0 |

128. 121/1 |
00000100000
84 | 2 | 3 | 36
|
| 6 |
324. 231/10 |

353. 237/1 |
000000100000
202 | 2 | 3 | 15
|
| 7 |
258. 206/5 |

328. 239/1 |
000000010000
208 | 3 | 12 | 17
|
| 8 |
346. 351/12 |

426. 401/1 |
000000001000
353 | 3 | 8 | 38
|
| 9 |
318. 300/1 |

388. 273/26 |
191 | 8 | 30 | 62
|
| 10 |
338. 184/81 |

378. 274/36 |
00000000001
00000100001
00010010000
259 | 0 | 10 (17) | 24
|
Remarks:
The numbers below the
images (N.Y/Z) indicate the number of frames in clips (N) and number of those of
them where 1 face-looking (Y) region or
more (Z) was/were detected using the Haar-like
features filters available from Intel OpenCV. Note that the face
(or face looking region) is not detected in every frame. Also note that
sometimes more than one face is detected, i.e. part of a scene is erroneously
detected as a face. For the results presented we made sure, using video
processing techniques based on motion and colour tracking, that these false
"faces" are not taken into account.
* There are many falsely
detected "faces" (in shelf area) in clips of ID=0. Therefore,
these clips are used to test the
performance on an unknown face and false "faces".
** In some video clips
(e.g. ID 10) "false" faces are not completely
eliminated and/or faces show extremely challenging for recognition orientation, which lowers the individual-frame based recognition rate,
*** The final recognition decision is not based on the
individual frame but on a sequence of them. Thus the actual recognition rate on
an entire video clip is much higher than that based on any individual frame of
the clip (e.g. this log file
and ReadMe-logs).
**** The numbers in third column show the total number of single-frame based (i.e.
image-based) results, i.e. results obtained on each individual video frame without
taking into account the time domain. For each video pair, the following following 5 statistics, indicated as S10 S11 S01 S00 and
S02, derived from
the neuro-biological associative memory based treatment of the recognition process, are
computed.
| 10 - The numbers of frames in 2nd video
clip of the pair, when the face in a frame is associated with the correct
person (i.e. the one seen in the 1st video clip of the pair), without any
association with other seen persons. - best (non-hesitant) case |
11 - ... when the face is not associated with one individual, but rather
with several individuals, one of which is the correct one. - good (hesitating)
case |
01 - ... when the face is associated with someone else - worst
case
02 - ... when the face is associated with several individuals (none of
which is correct) - wrong but hesitating case
|
00 - ... when the face is not associated with any of the seen faces - not
bad case |
| S10: neuron corresponding to the correct
person's ID fired (+1), while all other neurons remained at rest (-1).
This is the best case performance: no hesitation in saying the person's
name from a single video frame. |
S11: neuron corresponding to the correct
person's ID fired (+1), but there were some other neurons (at least one)
which also fired. This "hesitating" performance can also be
considered good, as it can be taken into account whn making the
final decision based on the average (or majority) of decisions over
several video frames. This result can also be used to disregard the frame
as "confusing".
|
S01: neuron corresponding to the correct
person's ID did not fire (-1), while another name tag neuron corresponding
to a different person fired (+1). This is the worst case result. It
however is not always bad either. First , when this happens often
there are other neurons which fire too, indicating the inconsistent
decision - this is denoted as S02 result. Second, unless this
results persists within several consecutive frames (which in most cases it
is not - see the log file) it can also be identified as invalid result and
thus be discarded. |
S00: none of the name tag neurons fired. This
result can also be considered as a good one, as it indicates that the
network does not recognize anybody. This is, in fact, what we want the
network to produce when it examines a face which has not been previously
seen or when it examines part of the video image which has been
erroneously classified as a face by the video processing modules, which,
as our experiments show, happens too.
|
|