Source Themes

Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation

This study introduces a novel framework, Recursive Contemplation (ReCon), designed to improve large language models’ (LLMs) abilities to identify and counteract deceptive information, using the deception-rich Avalon game as a testbed.

Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang

Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline RL often faces a distributional shift problem. Current methods typically use a uniform policy constraint for all samples. This paper introduces Guided Offline RL (GORL) which treats samples differently based on the guidance of expert demonstrations. This method is theoretically proven to be rational and near-optimal, and can experimentally enhance various offline RL algorithms significantly.

Qisen Yang, Shenzhi Wang, Qihang Zhang, Gao Huang, Shiji Song

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance