Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Shao Zhang1*, Xihuai Wang1*, Wenhao Zhang1, Chaoran Li1, Junru Song1, Tingyu Li1, Lin Qiu2, Xuezhi Cao2, Xunliang Cai2, Wen Yao3, Weinan Zhang1, Xinbing Wang1, Ying Wen1#
1 Shanghai Jiao Tong University, 2 Meituan, 3 Intelligent Game and Decision Laboratory

*Equal Contribution #Corresponding Author

How DPT-Agent Collaborates with Human Simultaneously in Real-time.

Abstract

Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose
DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent's System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent's System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously.

Visualization

DPT-Agent Demo 1

Map 1

DPT-Agent (in red hat) collaborating with human (in blue hat) through division of labor.

DPT-Agent Demo 2

Map 2

DPT-Agent (in red hat) collaborating with human (in blue hat) by using the central counter.

Experiment Results

ReAct

Models Score Score Efficiency Latency
GPT-4o 21.00(7.01) 3.08(0.30) 7.10(0.29)
GPT-4o-mini -28.50(6.23) 0.60(0.28) 3.06(0.07)
o3-mini-low 5.50(5.86) 2.51(0.25) 8.64(0.27)
DeepSeek-V2.5-236b -21.50(3.56) 1.72(0.24) 6.45(0.18)
DeepSeek-R1-70b -17.00(4.32) 1.48(0.17) 7.79(0.20)
DeepSeek-R1-32b -15.50(4.51) 1.49(0.18) 5.77(0.18)
DeepSeek-R1-14b -7.00(4.94) 2.67(0.19) 2.91(0.03)
Llama3.3-70b 20.00(4.21) 2.86(0.16) 5.44(0.05)
Mistral-nemo-12b -10.00(3.31) 2.40(0.13) 1.10(0.03)
Mistral-small-24b 59.50(5.04) 4.63(0.20) 2.69(0.02)
Mixtral-8x22b -5.00(5.23) 1.73(0.22) 5.56(0.10)
Qwen2.5-14b -5.00(5.31) 1.98(0.21) 1.55(0.03)
Qwen2.5-32b 10.00(0.50)2.94(0.02) 1.93(0.04)
Qwen2.5-72b 16.50(3.22) 2.71(0.09) 4.60(0.09)
QwQ-32b 8.00(2.77) 2.46(0.12) 10.75(0.24)

Reflexion

Models Score Score Efficiency Latency
GPT-4o -1.50(3.78) 2.14(0.17) 7.49(0.27)
GPT-4o-mini -40.00(2.17) 0.00(0.14) 3.11(0.08)
o3-mini-low -16.50(7.12) 1.78(0.26) 8.86(0.23)
DeepSeek-V2.5 -25.56(2.91) 1.24(0.18) 7.64(0.16)
DeepSeek-R1-70b -20.00(4.79) 1.44(0.19) 7.78(0.17)
DeepSeek-R1-32b -37.50(4.77) 0.90(0.21) 7.39(0.11)
DeepSeek-R1-14b -10.50(4.12) 1.93(0.22) 4.01(0.11)
Llama3.3-70b 20.00(4.47) 3.25(0.19) 5.20(0.06)
Mistral-nemo-12b -40.00(0.00) 0.00(0.00) 1.60(0.02)
Mistral-small-24b -5.00(3.63) 1.43(0.03) 3.11(0.05)
Mixtral-8x22b 0.50(4.33) 2.44(0.20) 5.58(0.23)
Qwen2.5-14b -4.00(4.45) 2.44(0.24) 1.87(0.05)
Qwen2.5-32b -40.00(0.00) 0.00(0.00) 2.93(0.05)
Qwen2.5-72b -25.00(2.76) 1.47(0.09) 4.66(0.05)
QwQ-32b -50.00(0.75) 0.00(0.11) 7.75(0.11)

DPT-Agent w/o ToM

Models Score Score Efficiency Latency
GPT-4o 20.50(5.41) 3.05(0.24) 5.08(0.15)
GPT-4o-mini 21.00(4.47) 3.50(0.23) 2.13(0.01)
o3-mini-low 37.50(4.81) 3.68(0.19) 7.03(0.28)
DeepSeek-V2.5 31.50(3.40) 3.40(0.14) 4.73(0.11)
DeepSeek-R1-70b 60.00(4.35) 4.19(0.15) 9.09(0.26)
DeepSeek-R1-32b 39.50(7.68) 3.35(0.27) 6.58(0.25)
DeepSeek-R1-14b 23.00(5.42) 23.00(5.42) 3.87(0.07)
Llama3.3-70b -10.00(6.46) 1.82(0.34) 2.28(0.10)
Mistral-nemo-12b 30.00(5.20) 3.49(0.21) 1.31(0.03)
Mistral-small-24b -1.50(3.63) 2.05(0.17) 3.61(0.31)
Mixtral-8x22b 0.00(15.00) 2.70(0.20) 4.21(0.17)
Qwen2.5-14b 1.50(4.11) 2.68(0.22) 1.18(0.02)
Qwen2.5-32b 1.00(3.83) 2.26(0.13) 1.65(0.03)
Qwen2.5-72b 11.00(4.88) 2.66(0.21) 3.01(0.12)
QwQ-32b -51.00(4.74) 3.90(0.15) 14.96(0.78)

BibTeX

@article{zhang2025ldpt,
        title={Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration}, 
        author={Shao Zhang and Xihuai Wang and Wenhao Zhang and Chaoran Li and Junru Song and Tingyu Li and Lin Qiu and Xuezhi Cao and Xunliang Cai and Wen Yao and Weinan Zhang and Xinbing Wang and Ying Wen},
        year={2025},
        eprint={2502.11882},
        archivePrefix={arXiv},
        primaryClass={cs.AI},
        url={https://arxiv.org/abs/2502.11882}, 
  }