What we refer to as ``emotions'' in this paper arise naturally in many human social situations as a byproduct of goal-driven and principled (or unprincipled) behavior, simple preferences, and relationships with other agents. This includes many situations not normally thought of as emotional (e.g., becoming annoyed at someone, a mild form of anger in our theory), but explicitly excludes representation of any physical, body, properties of emotions.
In the current Affective Reasoner (AR) research, embodied in a collection of Common Lisp AI programs, we simulate simple worlds populated with with agents capable of responding ``emotionally'' as a function of their concerns. Agents are given unique pseudo-personalities modeled as both a set of appraisal frames representing their individual goals, principles, preferences, and moods, and as a set of about 440 differentially activated channels for the expression of emotions [Elliott & Ortony1992, Elliott1992]. Situations that arise in the agents' world may map to twenty-six different emotion types (e.g., pride, as approving of one's own intentional action), twenty-two of which were originally theoretically specified by Ortony, et al. [Ortony, Clore, & Collins1988]. Qualities, and intensity, of emotion instances in each category are partially determined by about twenty-two different emotion intensity variables [Elliott & Siegle1993].
To communicate with users the AR agents use various multimedia modes. A central assumption of these modes is that social interaction based on emotion states must run, dynamically, in real time. Agents have about 70 line-drawn facial expressions, which are morphed in real time, yielding about 3,000 different morphs. This is extensible since the morphs are efficiently computed on the client computer each time they are displayed. Agents can select facial expressions, speed of morph, size of the display, and color of the foreground and background. Agents, whose mouths move when they speak, communicate with users through minimally inflected text-to-speech software, which allows us to dynamically construct spoken sentences at run time. For adding qualitatively to their expression of emotions, agents have access to a large database of MIDI files, any portion which they can retrieve in less than a second, and in which they can index down to 1/1000th of a second. Each of these (mostly music) files are real performances (that is, are creations of human performers, NOT of computers playing sounds from scores). Speech recognition software, used with some AR applications, has allowed children as young as two years old to interact with AR application agents. In all cases, the agents respond in real time to input from the world around them: when you speak to them, they speak back.