The current list of research problems related to the use of lifelike agents is probably growing faster than we can note them. Still, some seem to stand out as being more salient, or more unusual and less developed, than others, and as such help to paint a picture of the existing, and unfolding, lifelike agents landscape.
Among the most difficult of these problems will be the development of widely applicable authoring tools. In some larger projects this goal is made explicit (e.g., the Jack project at the University of Pennsylvania, the PPP project at DFKI, and the Steve and Adele projects at USC-ISI); in virtually all others it is at least implicit. Even when a strong theoretical foundation gives rise to a system, it is often the artistic detail that makes the system work. Systems like CyberLife's Creatures and NCSU's Cosmo require long hours in creating personas that are fun, interesting, and informative. Authoring tools which make use of relevant meta-knowledge of the architectures, and past artistic design, will go a long way in reducing the time necessary for building new agents. For example, an authoring tool for PPP will need to understand something about temporal planning constraints, one for COSMO will need to know about sets of overlapping behaviors, and one for AR, or CyberCafe, agents will need to know about related personality features.
On a related note, protocols for building cohesive believable personalities under various theories will also be necessary. Efforts such as those by Elliott, Velazquez, Reilly, Botelho, and Sloman, while helping to define what an agent personality is, do little in creating a ``fourth generation language'' for building them (but note the initial efforts at KSL and Extempo to do this [below]). Emotion and personality models for agents will only be widely applicable when there is an organized way to map the ad hoc intentions of system designers to the effected personality representations. Ignoring whether or not the existing models are adequate, there is still quite a gap between isolated test systems, and mechanisms for allowing easy integration of personality representations into other agent systems.
The automated real-time control of music stands out as a unique paradigm for synthetic agents research. Music is, at once, both emphatically human in quality, yet not at all natural in real-world social interaction. It can be profoundly engaging, highly communicative in a subjective way, and extremely powerful as a mood manipulation tool. The entertainment industry has long recognized music's power to engage us. And yet, although there certainly is work underway at understanding computer-delivered music's effects, on the whole its effectiveness as a computationally indexable resource for agents seems woefully under-utilized. Often it is true that interface efforts focus almost exclusively on graphical representations, but it is not at all clear that this is where the biggest payoff lies, for equivalent effort.
Although there are extant computational models for some forms of humor, theoretical models of humor for proactive use in agents seems ad hoc. Given the story-telling capabilities of agents, and their emerging social intelligence, this seems a natural course to pursue, and one with a wide applicability. Furthermore, it is generative models of humor -- certainly easier than humor understanding (and c.f., [Zrehen & Arbib1998]) -- which would be most useful in the synthetic agent paradigm.
As discussed below, the ethics of building systems that use plausibly lifelike agents interacting with users seems to be the stuff that both sociology dissertations and lawsuits are made of. Open dialog about the state of the field is called for as these systems move to market (and c.f., [Foner1997]).
The integration of past work in natural language understanding, and discourse understanding, with modern speech recognition software seems a likely ``killer app'' for lifelike agents research. Similarly, natural language generation, and text-to-speech technology, seem made to order for extending the capabilities of many extant agents.
Interactive agents for the Web are certainly going to be popular, and will undoubtably be a major contributor to rapid growth in this area. It is not at all clear that assumptions of greatly increased bandwidth, at least of the sort that will support network-based, real-time, animations in the near future of agents research, are warranted. Systems whose intelligence can be focused through the existing narrow bandwidth of home telephone lines will have a big advantage in accessibility. This suggests strong considerations with respect to local execution within the context of Web browsers (e.g., as in ISI's Adele, and DFKI's PPP, below), or through using efficient mechanisms for carrying the ``socially intelligent'' signal (as in the AR).
Building lifelike agents, especially those which mimic human emotion and personality may well improve our ability to model, and detect, similar states in users. At some level, the illusion of life in agents breaks down without an ability to form concepts about, and respond to, the state of the user (and see the sections on affective user modeling in the AR, and in the work of Blumberg, below).