In this paper, we propose Unary Feedback as Observation (UFO), a simple multi-turn reinforcement learning method that helps large reasoning models reflect on mistakes and improve through minimal feedback like “Let’s try again.” UFO improves multi-turn reasoning accuracy by up to 14% while maintaining single-turn performance, enabling more deliberate and flexible problem solving.
@misc{ufo,title={A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning},author={Liu, Licheng and Wang, Zihan and Li, Linjie and Xu, Chenwei and Lu, Yiping and Liu, Han and Sil, Avirup and Li, Manling},year={2025},primaryclass={cs.LG},}
In this paper, we introduce a model of Prediction with Limited Selectivity (PLS) where the forecaster can start the prediction only on a subset of the time horizon. We study the optimal prediction error both on an instance-by-instance basis and via an average-case analysis. We introduce a complexity measure that gives instance-dependent bounds on the optimal error. For a randomly-generated PLS instance, these bounds match with high probability.
@misc{PLS,title={Online Prediction with Limited Selectivity},author={Liu, Licheng and Qiao, Mingda},year={2025},primaryclass={cs.LG},}
arXiv
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang, Kangrui Wang, Qineng Wang, and 15 more authors
This paper introduces RAGEN, a framework for understanding self-evolution in Large Language Model (LLM) agents through multi-turn reinforcement learning. The work explores how LLM agents can improve their performance over multiple interactions using reinforcement learning techniques.
@misc{ragen,title={RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning},author={Wang, Zihan and Wang, Kangrui and Wang, Qineng and Zhang, Pingyue and Li, Linjie and Yang, Zhengyuan and Jin, Xing and Yu, Kefan and Nguyen, Minh Nhat and Liu, Licheng and Gottlieb, Eli and Lu, Yiping and Cho, Kyunghyun and Wu, Jiajun and Fei-Fei, Li and Wang, Lijuan and Choi, Yejin and Li, Manling},year={2025},primaryclass={cs.LG},}