H02D0A Language Engineering Applications

Spoken dialog technology: from VoiceXML to Markov decision processes

Hugo Van hamme
ESAT-PSI (K.U.Leuven)

Abstract

In this lecture, we discuss how spoken dialog systems can be built. In the most popular approach, the flow of simple spoken human-machine dialogs is described by a finite-state machine. VoiceXML (Voice Extensible Markup Language) is a language designed to accomplish this task. The main features and programming constructs of this dialog description language are treated. The limitations of the speech-only mode and finite-state dialog flow will be explored and solutions that are being designed to overcome these limitations are discussed. Finally, a statistical approach to dialog modeling is explained.

Slides

References

Steve Young et al. The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management. Computer, Speech and Language 24 (2010) 150-174.

Links

voiceXML: http://www.w3.org/TR/voicexml20/
SRGS: http://www.w3.org/TR/speech-grammar
SSML: http://www.w3.org/TR/speech-synthesis/
SEM: http://www.w3.org/TR/semantic-interpretation/