We are currently integrating our work with the world-wide-web. All aspects of the presentations (midi music, morphing faces, text-to-speech) have been tested as applications which run (transparently to the calling modules) as either local or remote applications, where remote applications are established through the Web. Licensing agreements have been considered so that text-to-speech is reduced to Realaudio format before it is transmitted. Higher-quality, lower-bandwidth reproduction is available if the client has an AT&T text-to-speech license. Combined transmission of the real-time signal is under 14k bps.
While not central to the theoretical component of our work, we feel that the fact that our emotion reasoning, and presentation, mechanisms can be integrated into a Web-based environment allows for significant data collection possibilities, and opens up additional applications. Over the years we have consistently operated under the constraints imposed by using a low-bandwidth approach, supported by inexpensive hardware. Because of this we are able to speculate on the very real possibility of constructing real-time, truly multimodal, interactive Internet applications that operate at a social level.
Various methods have been used, varying from client-resident Lisp interpreters, to small multi-port routing modules called from Web-clients, to Java applications. The delivery mechanism is less important than the ratio of usable social information to number of bits, one which we have shown to be effective over a 14.4 modem.
We have additionally run trials using Realaudio-encoded signals as input to the speech-recognition package and believe this to be a viable mechanism for running the speech recognition components of our research over the web.