Toward Content-Based Audio Indexing and Retrieval and a New Speaker Discrimination Technique (1995) [12 citations — 0 self]
Abstract:
Several techniques for identifying segment transitions in an audio stream are discussed. Gross features are first identified that control more detailed and computationally expensive analysis down stream. Pitch is tracked using some basic streaming principles, and then used as one cue to speaker transitions. A novel speaker discrimination technique is described that makes segmentation decisions when a continuously updated model of the current speaker suddenly ceases to sufficiently account for the input data.
Citations
| 346 | Perceptual linear predictive PLP analysis for speech – Hermansky - 1990 |
| 148 | Content-Based Video Indexing and Retrieval – Smoliar, Zhang |
| 90 | Control methods used in a study of the vowels – Peterson, Barney - 1952 |
| 32 | Structure out of Sound – Hawley - 1993 |
| 18 | Research on individuality features in speech waves and automatic speaker recognition techniques – Furui - 1986 |
| 16 | A spectral network model of pitch perception – Grossberg - 1995 |
| 2 | Spectral analysis of sung vowels. III. Characteristics of singers and modes of singing – Bloothooft, Plomp - 1986 |
| 2 | Auditory Scene Analysis (M.I.T – Bregman - 1990 |
| 2 | Suggested formlae for calculating auditory filter bandwidths and excitation patterns – Moore, Glasberg - 1983 |

