( 1 of 5 ) |
United States Patent | 6,345,252 |
Beigi ,   et al. | February 5, 2002 |
Methods and apparatus are provided for retrieving audio information based on the audio content as well as the identity of the speaker. The results of content and speaker-based audio information retrieval methods are combined to provide references to audio information (and indirectly to video). A query search system retrieves information responsive to a textual query containing a text string (one or more key words), and the identity of a given speaker. An indexing system transcribes and indexes the audio information to create time-stamped content index file(s) and speaker index file(s). An audio retrieval system uses the generated content and speaker indexes to perform query-document matching based on the audio content and the speaker identity. Documents satisfying the user-specified content and speaker constraints are identified by comparing the start and end times of the document segments in both the content and speaker domains. Documents satisfying the user-specified content and speaker constraints are assigned a combined score that can be used in accordance with the present invention to rank-order the identified documents returned to the user, with the best-matched segments at the top of the list.
Inventors: | Beigi; Homayoon Sadr Mohammad (Yorktown Heights, NY); Tritschler; Alain Charles Louis (New York, NY); Viswanathan; Mahesh (Yorktown Heights, NY) |
Assignee: | International Business Machines Corporation (Armonk, NY) |
Appl. No.: | 288724 |
Filed: | April 9, 1999 |
Current U.S. Class: | 704/272; 704/275; 704/500; 704/251 |
Intern'l Class: | G10L 015/22 |
Field of Search: | 704/231,250,238,236,251,255,260,200,270,272,275 |
6185527 | Feb., 2001 | Petkovic et al. | 704/231. |
Proceedings of the Speech Recognition Worshop. C. Neti et al., "Audio Visual Speaker Recognition for video Broadcast News" 1999.* ICASSP-97. 1997 IEEE International Conference on Acoustics, Speech and Signal Processing. Roy et al., Speaker Identification based Text to Audio Alignment for audio Retrieval System, Apr. 1997.* ICIP 98. Proceedings. Iternational Conference on Image Processing, 1998, Tsekeridou et al. "Speaker dependent videi indexing based on audio-visual interaction". Pp. 358-362 vol. 1. Oct. 1998.* 1996 IEEE Multimedia. Wold et al. "Content based classification, search, and retrieval of audio" pp. 27-36. Fall 1996.* S. Dharanipragada et al., "Experimental Results in Audio Indexing," Proc. ARPA SLT Workshop, (Feb. 1996). L. Polymenakos et al., "Transcription of Braodcast News--Some Recent Inprovements to IBM's LVCSR System," Proc. ARPA SLT Workshop, (Feb. 1996). R. Bakis, "Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System," Proc. ICASSP98, Seattle, WA (1998). H. Beigi et al., "A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition," Proc. ICASSP98, Seattle, WA (1998). S. Chen, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion," Proceedings of the Speech Recognition Workshop (1998). S. Chen et al., "Clustering via the Bayesian Information Criterion with Applications in Speech Recognition," Proc. ICASSP98, Seattle, WA (1998). S. Chen et al., "IBM's LVCSR System for Transcription of Broadcast News Used in the 1997 Hub4 English Evaluation," Proceedings of the Speech Recognition Workshop (1998). S. Dharanipragada et al., "A Fast Vocabulary Independent Algorithm for Spotting Words in Speech," Proc. ICASSP98, Seattle, WA (1998). J. Navratil et al., "An Efficient Phonotactic-Acoustic system for Language Identification," Proc. ICASSP98, Seattle, WA (1998). G. N. Ramaswamy et al., "Compression of Acoustic Features for Speech Recognition in Network Environments," Proc. ICASSP98, Seattle, WA (1998). S. Chen et al., "Recent Improvements to IBM's Speech Recognition System for Automatic Transcription of Broadcast News," Proceedings of the Speech Recognition Workshop (1999). S. Dharanipragada et al., "Story Segmentation and Topic Detection in the Broadcast News Domain," Proceedings of the Speech Recognition Workshop (1999). C. Neti et al., "Audio-Visual Speaker Recognition for Video Broadcast News," Proceedings of the Speech Recognition Workshop (1999). |