A theoretical and empirical study of reasoning collapse in closed-loop multi-turn agent reinforcement learning, framing collapse as the interaction of task signal, prompt-agnostic regularizers, and reward noise.
@inproceedings{ragen2,title={RAGEN-2: Reasoning Collapse in Agentic RL},author={Wang, Zihan and Gui, Chi and Jin, Xing and Wang, Qineng and Liu, Licheng and Wang, Kangrui and Chen, Shiqi and Li, Linjie and Yang, Zhengyuan and Zhang, Pingyue and Lu, Yiping and Wu, Jiajun and Fei-Fei, Li and Wang, Lijuan and Choi, Yejin and Li, Manling},booktitle={Forty-Third International Conference on Machine Learning (ICML)},year={2026},note={Oral, top 0.7% of 23,918},}
arXiv
CocoaBench: An Evaluation Framework for General Agents with Compositional Cognitive Abilities
CocoaBench Team, Shibo Hao, Zhining Zhang, and 29 more authors
@misc{cocoabench,title={CocoaBench: An Evaluation Framework for General Agents with Compositional Cognitive Abilities},author={Team, CocoaBench and Hao, Shibo and Zhang, Zhining and Liang, Zhiqi and Liu, Tianyang and Zha, Yuheng and Gao, Qiyue and Chen, Jixuan and Wang, Zilong and Cheng, Zhoujun and Zhang, Haoxiang and Wang, Junli and Jin, Hexi and Zheng, Boyuan and Zhou, Kun and Wang, Yu and Yao, Feng and Liu, Licheng and Li, Yijiang and Li, Zhifei and Han, Zhengtao and Promthaw, Pracha and Cerruti, Tommaso and Fu, Xiaohan and Ma, Ziqiao and Shang, Jingbo and Qin, Lianhui and McAuley, Julian and Xing, Eric P. and Liu, Zhengzhong and Srivastava, Rupesh Kumar and Hu, Zhiting},year={2026},primaryclass={cs.LG}}
2025
NeurIPS
Online Prediction with Limited Selectivity
Licheng Liu and Mingda Qiao
In Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
In this paper, we introduce a model of Prediction with Limited Selectivity (PLS) where the forecaster can start the prediction only on a subset of the time horizon. We study the optimal prediction error both on an instance-by-instance basis and via an average-case analysis. We introduce a complexity measure that gives instance-dependent bounds on the optimal error. For a randomly-generated PLS instance, these bounds match with high probability.
@inproceedings{PLS,title={Online Prediction with Limited Selectivity},author={Liu, Licheng and Qiao, Mingda},booktitle={Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS)},year={2025},note={Spotlight, top 3.2% of 21,575},primaryclass={cs.LG},}
arXiv
Let’s Try Again: Eliciting Multi-Turn Reasoning in Language Models via Simplistic Feedback
Licheng Liu, Zihan Wang, Linjie Li, and 5 more authors
In this paper, we propose Unary Feedback as Observation (UFO), a simple multi-turn reinforcement learning method that helps large reasoning models reflect on mistakes and improve through minimal feedback like “Let’s try again.” UFO improves multi-turn reasoning accuracy by up to 14% while maintaining single-turn performance, enabling more deliberate and flexible problem solving.
@misc{ufo,title={Let's Try Again: Eliciting Multi-Turn Reasoning in Language Models via Simplistic Feedback},author={Liu, Licheng and Wang, Zihan and Li, Linjie and Xu, Chenwei and Lu, Yiping and Liu, Han and Sil, Avirup and Li, Manling},year={2025},primaryclass={cs.LG},}
arXiv
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang, Kangrui Wang, Qineng Wang, and 15 more authors
This paper introduces RAGEN, a framework for understanding self-evolution in Large Language Model (LLM) agents through multi-turn reinforcement learning. The work explores how LLM agents can improve their performance over multiple interactions using reinforcement learning techniques.
@misc{ragen,title={RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning},author={Wang, Zihan and Wang, Kangrui and Wang, Qineng and Zhang, Pingyue and Li, Linjie and Yang, Zhengyuan and Jin, Xing and Yu, Kefan and Nguyen, Minh Nhat and Liu, Licheng and Gottlieb, Eli and Lu, Yiping and Cho, Kyunghyun and Wu, Jiajun and Fei-Fei, Li and Wang, Lijuan and Choi, Yejin and Li, Manling},year={2025},primaryclass={cs.LG}}