ICASSP 1989 Vernooy et al abstract

Vernooij, G.J., Bloothooft, G. en Holsteijn, Y van (1989). 'A simulation study on the usefullness of broad phonetic classification in automatic speech recognition', Proc. ICASSP Conferentie, Glasgow, 85-88.

A SIMULATION STUDY ON THE USEFULNESS OF BROAD PHONETIC CLASSIFICATION IN AUTOMATIC SPEECH RECOGNITION.

Gertjan Vernooy
Gerrit Bloothooft
Yvonne van Holsteijn

Research Institute of Language and Speech
University of Utrecht, The Netherlands

Investigations on the use of broad phonetic classes in automatic speech recognition systems are mostly limited to the level of the broad categories themselves. Results are reported in terms of number of cohorts, maximum cohort size, expected cohort size etc. However, the aim of automatic speech recognition is not to identify broad phonetic classes but individual words. A simulation study, using a 12113 word lexicon of high frequent Dutch words, was conducted to make an inventory of the additional acoustic information, needed to identify all words of the lexicon uniquely after broad phonetic classification.

After broad classification the lexicon is divided into a number of cohorts, which each share a unique sequence of global acoustic labels. When more than one word is present in a cohort, some further acoustic processing is needed to identify each word separately. For this, we have to refine one or more of the broad phonetic labels into finer phonetic categories (possibly phonemes). We found that a complete identification of all words in a cohort can be obtained after refinement of different subsets of the labels of the cohort, i.e. there exists a number of different refinement strategies which are all adequate. In our investigation we examined two criteria to make the best choice out of these strategies. The first criterion was to minimize the number of different acoustical refinements, irrespective the types of refinement (including,for instance, an /n/-/m/ distinction). The second criterion was to arrive at acoustical refinements which are relatively simple to perform (for instance, chose to resolve an /a/-/u/ and a /b/-/d/ distinction above resolving the distinction between /n/-/m/). The latter criterion made use of data on perceptual confusions between phonemes.

On the basis of these criteria we developed an iterative procedure which resulted in an optimal hierarchy of acoustical information needed to identify each word of the lexicon uniquely. We will present the resulting broad phonetic classes and the additional acoustic refinements needed to identify all words in the cohorts defined by these classes. This information may be usefull for the design of an acoustic-phonetic speech recognition system.