|( 3 of 4 )|
|United States Patent||6,219,640|
|Basu ,   et al.||April 17, 2001|
Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification. In a supervised utterance verification embodiment, the processed video signal is compared with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.
|Inventors:||Basu; Sankar (Tenafly, NJ); Beigi; Homayoon S. M. (Yorktown Heights, NY); Maes; Stephane Herman (Danbury, CT); Ghislain Maison; Benoit Emmanuel (White Plains, NY); Neti; Chalapathy Venkata (Yorktown Heights, NY); Senior; Andrew William (New York, NY)|
|Assignee:||International Business Machines Corporation (Armonk, NY)|
|Filed:||August 6, 1999|
|Current U.S. Class:||704/246; 704/231; 704/273|
|Intern'l Class:||G10L 015/00|
|Field of Search:||382/115,118 379/88.02 704/273,246,231,251,275|
|4449189||May., 1984||Feix et al.||704/272.|
|5412738||May., 1995||Brunelli et al.||382/115.|
|5602933||Feb., 1997||Blackwell et al.||382/116.|
|5897616||Apr., 1999||Kanevsky et al.||704/246.|
C. Neti et al., "Audio-Visual Speaker Recognition For Video Broadcast News", Proceedings of the ARPA HUB4 Workshop, Washington, D.C., pp. 1-3, Mar. 1999.
A.W. Senior, "Face and Feature Finding For a Face Recognition System," Second International Conference on Audio-and Video-based Biometric Person Authentication, Washington, D.C., pp. 1-6, Mar. 1999.
P. De Cuetos et al., "Frontal Pose Detection for Human-Computer Interaction," pp. 1-12, Jun. 23, 1999.
R. Stiefelhagen et al., "Real-Time Lip-Tracking for Lipreading," Interactive Systems Labortories, University of Karlsruhe, Germany and Carnegie Mellon University, U.S.A., pp. 1-4, Apr. 27, 1998.
P.N. Belhumeur et al., "Eigenfaces vs. Fisherfaces: Recognition Using Class Specfic Linear Projection," IEEE Trans. on PAMI, pp. 1-34, Jul. 1997.
N.R. Garner et al., "Robust Noise Detection for Speech Detection and Enhancement," IEE, pp. 1-2, Nov. 5, 1996.
H. Ney, "On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17,No. 2,pp. 107-112, Feb. 1995.
L. Wiskott et al., "Recognizing Faces by Dynamic Link Matching," ICANN '95, Paris, Francis, pp. 347-342, 1995.
A.H. Gee et al., "Determining the Gaze of Faces in Images," Univeristy of Cambridge, Cambridge, England, pp. 1-20, Mar. 1994.
C. Bregler et al., "Eigenlips For Robust Speech Recognition," IEEE, pp. II-669-II-672.
C. Benoil et al., "Which Components of the Face Do Humans and Machines Best Speechread?, " Institut de la Communication Parlee, Grenoble, France, pp. 315-328.
Q. Summerfield, "Use of Visual Information for Phonetic Perception," Visual Information for Phonetic Perception, MRC Institute of Hearing Research, Univeristy Medical School, Nottingham, pp. 314-330.
N. Kruger et al., "Determination of Face Position and Pose With a Learned Representation Based on Label Graphs," Ruhr-Universitat Bochum, Bochum, Germany and Univesity of Southern California, Los Angeles, CA, pp. 1-19.
G. Potamianos et al., "Discriminative Training ofHMM Stream Exponents for Audio Visual Speech Recognition," AT&T Labs Research, Florham and Red Bank, NJ, pp. 1-4.