Is it something I did?

Next: Is that is me Up: Greasing the wheels: tempering Previous: Grandfather Jack

Is it something I did?

At the Gesture and Narrative Language research group at the MIT Media Lab, and also previously at the University of Pennsylvania's Center for Human Modeling and Simulation, Justine Cassell has lead a number of projects centering on gesture and facial animation. Of particular note here are Animated Conversation [Cassell et al. 1994], Gandalf [Thorisson1997] (see figure 8), and BodyChat [Vilhjalmsson & Cassell1998].

In BodyChat the group is developing semi-independent avatars (the user's representation in a virtual world) as communicative agents wherein avatars for face-to-face communication are given the ability to autonomously generate behaviors for stance, gaze, and gesture. The autonomous behaviors are rooted in context analysis and discourse theory, and studies of human communicative behavior are used in the design of the lifelike agents. Unlike work that explicitly seeks to make avatars, and other human-like agents, more emotionally expressive, this research focuses on those aspects of displays that are explicitly communicative in nature (and c.f. [Chovil1992]).

The thesis of this research is that modeling visual communicative behavior for attention focus, turn-taking, facial expression, blinking, and the like is crucial for credibility in avatar-based discourse. For example, when a user enters text messages for delivery to a virtual world in which an avatar represents the user, the avatar behavior typically stops or loops. This is not natural, and detracts from both believability and the communicative bandwidth. In natural discourse environments, spontaneous and transition behaviors take place in parallel with cognitively-directed communications. This suggests that producing credible ``physical'' behavior in real time requires that the avatar act autonomously. In this paradigm the user is free to focus on the message to be communicated, while the system handles supporting behaviors.

A natural inclination is to assume that as technology improves for capturing user movements, and expressions, in immersive environments that this can be applied to avatar representations as well (e.g., c.f. the work on this at Upenn/HMS and the University of Washington's Human Interface Technology Lab [Badler, Phillips, & Webber1993],[Billinghurst & Savage1996]). Cassell, et al. point out that in addition to the technical inconveniences posed by this approach, the movements and gestures of a user interacting with a computer may be quite different from those of a user taking part in a real version of the simulated interactions.

One might consider such projects as something on the order of developing the underpinnings of a real-time ``fourth generation language'' for manipulating a humanoid, where a user would declare portions of a communication and the system would compile these into sets of explicit, and implicit, behaviors, that work within the constraints of the real-time dynamic system. The avatar agent is responsible for spontaneous reactions and for smoothing the bridges between instances of user input while keeping the avatar in synch with the environment.

In some ways this work is a metaphor for that of the NYU-IMRPOV group in that the user-author makes demands of the agent system for establishing maximally relevant states in the dynamic continuum, but is freed from both the details of those states, and the transitions between them.

Whether or not such systems ultimately become practical may depend on the development of a natural interface for the real-time direction of the high-level controls. Regardless of this, the central research is most certainly relevant to the development of convincingly believable agent characters, whether avatar, or fully autonomous.

Next: Is that is me Up: Greasing the wheels: tempering Previous: Grandfather Jack

Clark Elliott
Thu Dec 25 19:14:31 EST 1997