Abstract
Conversational agents have traditionally been developed for either task-oriented dialogue (TOD) or open-ended chitchat, with limited progress in unifying the two. Yet, real-world conversations naturally involve fluid transitions between these modes. To address this gap, we introduce TACT (TOD-And-Chitchat Transition), a dataset designed for transition-aware dialogue modeling that incorporates structurally diverse and integrated mode flows. TACT supports both user- and agent-driven mode switches, enabling robust modeling of complex conversational dynamics. To evaluate an agent's ability to initiate and recover from mode transitions, we propose two new metrics: Switch and Recovery. Models trained on TACT outperform baselines in both intent detection and mode transition handling. Moreover, applying Direct Preference Optimization (DPO) to TACT-trained models yields additional gains, achieving 75.74% joint mode-intent accuracy and a 70.1% win rate against GPT-4o in human evaluation. These results demonstrate that pairing structurally diverse data with DPO enhances response quality and transition control, paving the way for more proactive and transition-aware conversational agents.
Overview
Motivation
Most conversational agents are designed around a single interaction mode, either task-oriented dialogue or open-domain chitchat. In practice, however, real conversations move fluidly between these modes. Users often digress from a task to share personal thoughts or casual remarks, then expect the agent to naturally resume the original goal. Existing datasets and models largely fail to capture this behavior, as they either disallow mode transitions or treat them as one-off interruptions without recovery.
Research Focus
This paper studies conversational agents that can explicitly handle dialogue mode transitions as part of a continuous interaction. We frame transition handling not as a binary classification problem, but as a dialogue-level capability that requires both awareness and initiative. A transition-aware agent should detect when the dialogue mode changes, respond appropriately within the new mode, and proactively guide the conversation back to the task when the context allows.
Dataset: TACT
To support this research, we introduce TACT (TOD-And-Chitchat Transition), a dataset designed around recoverable, multi-turn mode transitions. Unlike prior resources that allow at most a single switch, TACT contains dialogues with multiple interwoven transitions between task-oriented dialogue and chitchat. These dialogues are constructed from MultiWOZ and SLURP using structured flow patterns such as TOD–Chitchat–TOD and Chitchat–TOD–Chitchat, making recovery an explicit and learnable phenomenon rather than an edge case.
Modeling and Evaluation
We train unified dialogue models that jointly perform mode prediction, intent detection, and response generation. To directly evaluate transition behavior, we introduce two dialogue-level metrics: Switch, which measures whether an agent attempts a mode transition, and Recovery, which captures whether it successfully returns to a previously suspended mode. These metrics allow us to assess conversational flow control beyond standard task accuracy.
Preference Alignment
Beyond supervised fine-tuning, we apply Direct Preference Optimization (DPO) to align model behavior with human preferences, particularly for conversational qualities such as naturalness, engagement, and smooth transitions. Preference data is constructed by comparing model outputs under identical dialogue contexts and selecting responses preferred by human or LLM-based judges.
Key Findings
Models trained on TACT are the only ones that exhibit meaningful switch and recovery behaviors across diverse dialogue flows. Preference optimization further improves chitchat quality and transition naturalness without sacrificing task performance. These results suggest that transition-aware data and preference-based learning are essential for building conversational agents that can manage dialogue flow in realistic, mixed-mode interactions.
BibTeX
@inproceedings{yoon-etal-2025-beyond,
title = "Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents",
author = "Yoon, Yejin and Son, Yuri and So, Namyoung and Kim, Minseo and Cho, Minsoo and Park, Chanhee and Lee, Seungshin and Kim, Taeuk",
editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.672/",
doi = "10.18653/v1/2025.emnlp-main.672",
pages = "13291--13317",
ISBN = "979-8-89176-332-6"
}