The remainder of the the paper, including the description of the formal study performed, discusses the Affective Reasoner in virtual actor mode. AR agents are able to interact with subjects, in real time, using a multimodal approach which includes speech recognition, text-to-speech, real-time morphed schematic faces, and music. In virtual actor mode, the agents are given varying degrees of stage direction: from (a) explicit instructions (for face, inflection, size, color, location, music selection, and midi and audio volume), to (b) somewhat more general instructions (wherein they are given the emotion, and the the text, and pick their own faces, music, color, inflection, and size)-- such as used in the following study, to (c) a degree of freedom (where they participate in picking the emotion, based on their personality).
In one virtual actor presentation, four agents participated in a dialog in various combinations. Two of the agents were ``Chicago Bulls fans'' and two were ``New York Knicks'' fans. Without varying the text of the dialog, agents were able to make clear their positions as fans, and get good agreement from viewers about their relative feelings about the events in the game. This was true whether there were two Bulls fans talking, two Knicks fans, one of each, or all four together. An example of the spoken text is, e.g., ``I was really worried about the game tonight. I thought Michael Jordon started out really slowly. Then he just wiped the floor with the Knicks in the second half,'' and so on. Any sentence could be spoken by any agent since they were all simply statements of what happened. It was the agents' portrayal of their interpretations of the events described which conveyed the message.
In another application, children as young as two years old, using a speech-driven interface, were able to manipulate story-telling applications using virtual actors to deliver children's stories.
In a recent study (described below) we hoped to show that users could gather enough information from the agents' different (multimedia) communication modalities to correctly assign intended, complex, (social, emotional) meanings to ambiguous sentences, and specifically that this ability would compare favorably with a human actor's ability to convey such meanings.
In fact, subjects did significantly better at correctly matching videotapes of computer-generated virtual actors with the intended emotion scenarios (70%) than they did with videotapes of a professional human actor attempting to convey the same scenarios