Bloothooft, G. (1998). 'A Course in SpeechMania'. ELSNews 7.1, 10.


A Course in SpeechMania

Gerrit Bloothooft

In January, six academics attended a full week course in SpeechMania 2.1, the Philips software package for building spoken dialogue systems. SpeechMania is a successful package and well known from the train travel information systems developed for the Swiss and German railway companies. The package includes components for speech recognition, dialogue management, lexicon management, language models, and speech output. The package runs on a a 200 MHz PC (MMX) with a telephone interface board. That’s all.

Our interest was to learn the SpeechMania system in a way that we would be able to develop spoken dialogue systems with our students. We believe that the development spoken dialogue systems will be a key element in future curricula in spoken language engineering. In spoken dialogue systems, speech technology and language technology meet and this is of great value for the student’s education. It is not only the ability to develop systems that is important for the students, it certainly is also the awareness of the problems - solved and unsolved. Therefore, the course by Philips Aachen was a challenge for both participants and trainers. We followed the course with a double interest. First, we wanted to learn how to use SpeechMania and how to build spoken dialogue systems. Second, we were preparing for questions of our own students and wished to know all background and inside information about SpeechMania. Finally, we also had a critical eye on the didactic approach of the course. No doubt, the trainers, Mark Hrabak and Frank Sassenscheid, had a difficult time with us, but they succeeded reasonably well!

After four days, there was the big moment that we could test our very first own application. It is absolutely exciting to see the system respond according to a dialogue flow of own design. The dialogue design was tested beforehand in off-line mode but it was revealing to see what happens when real speech recognition comes in. Our error handling was far from user friendly and complete! The language model we trained on the few test we did with each others systems, could not improve the systems behavior much. Still, there was the confidence that with more investment in prototyping and training an acceptable system could be within reach.

The course consisted of a first day of introduction in spoken language systems and in the approach taken by SpeechMania. For academics in speech technology this could have been presented more condensed, but for newcomers in speech such an introducation is indispensible. The emphasis of the whole course was on HDDL, the dialogue development language. Two days were spend on the HDDL programming language, build upon a case study of a hotel room reservation system. Step by step we learned how to program actions, prompts, rules, implicit and explicit verifications, subdialogues, to handle the status graph and so on. We could check our own programmes in the offline mode of SpeechMania, which was very helpful in debugging. But sometimes it was hard to keep in pace with the trainer and in the same time to find the problems in our software. We badly felt the need of a HDDL editor. During the fourth day, we recorded our own prompts and builded our recognition lexicon. The process of installing these tools could have been automated in our view and now took precious time. The tools themselves were not always self-explaining and we definitely need to study the manuals at home. But it worked, and at the end of the fourth day we had our system in place. The last day was spend on the language model which was trained on a few dialogues we had recorded and transcribed. Finally, we evaluated the system. Typical word error rates of 40% and concept error rates of 60% of our initial system need a lot of improvement!

We were pretty tired afterwards. Looking back we learned the basics of a new programming language (HDDL), and experienced lots of tools for the first time. Of course, this is a confusing process now and then. But the real proof of the pudding will now come at our universities where we have to install the package and to get all tools up and running. It is reassuring that the Philips Speech Processing Division in Aachen won a Philips prize for best consumer service, I think we need it during the coming months.

SpeechMania is a powerful package for fast initial development of spoken dialogue systems. Of course, training and tuning of the system will still take a lot of time. Although SpeechMania has a modular structure, the recognition module cannot be replaced. In general one has to use the HMMs owned by Philips, which are available for various languages. The system prompt are pre-recorded, but could be replaced by a text-to-speech system of own choice. For academics, the system can be obtained for a reasonable price, including the course of a week for two persons, which is an absolute necessity.

BOX

Gerrit Bloothooft
Utrecht Institute of Linguistics OTS
Trans 10, 3512 JK Utrecht, The Netherlands
Email

Christian Dugast
Philips Aachen
Speech Processing Division
Kackertstrasse 10, Aachen, Germany
email: