Virtual agents serve as a vital interface within XR platforms. However, generating virtual agent behaviors typically rely on pre-coded actions or physics-based reactions. In this paper we present a learning-based multimodal agent behavior generation framework that adapts to users’ in-situ behaviors, similar to how humans interact with each other in the real world. By leveraging an in-house collected, dyadic conversational behavior dataset, we trained a conditional variational autoencoder (CVAE) model to achieve userconditioned generation of virtual agents’ behaviors. Together with large language models (LLM), our approach can generate both the verbal and non-verbal reactive behaviors of virtual agents. Our comparative user study confirmed our method’s superiority over conventional animation graph-based baseline techniques, particularly regarding user-centric criteria. Thorough analyses of our results underscored the authentic nature of our virtual agents’ interactions and the heightened user engagement during VR interaction. .
@inproceedings{gunawardhana2024toward,title={Toward User-Aware Interactive Virtual Agents: Generative Multi-Modal Agent Behaviors in VR},author={Gunawardhana, Bhasura S and Zhang, Yunxiang and Sun, Qi and Deng, Zhigang},booktitle={2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)},pages={1068--1077},year={2024},doi={10.1109/ISMAR62088.2024.00123},url={https://doi.org/10.1109/ISMAR62088.2024.00123},annotation={[https://doi.org/10.1109/ISMAR62088.2024.00123]},organization={IEEE},dimensions={true},}