Academic
Academic
Home
News
Featured
Publications
Services
Contact
Light
Dark
Automatic
Source Themes
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation
This study introduces a novel framework, Recursive Contemplation (ReCon), designed to improve large language models’ (LLMs) abilities to identify and counteract deceptive information, using the deception-rich Avalon game as a testbed.
Shenzhi Wang
,
Chang Liu
,
Zilong Zheng
,
Siyuan Qi
,
Shuo Chen
,
Qisen Yang
,
Andrew Zhao
,
Chaofei Wang
,
Shiji Song
,
Gao Huang
Arxiv
Website
机器之心报道
新智元报道
量子位报道
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
Offline RL often faces a distributional shift problem. Current methods typically use a uniform policy constraint for all samples. This paper introduces Guided Offline RL (GORL) which treats samples differently based on the guidance of expert demonstrations. This method is theoretically proven to be rational and near-optimal, and can experimentally enhance various offline RL algorithms significantly.
Qisen Yang
,
Shenzhi Wang
,
Qihang Zhang
,
Gao Huang
,
Shiji Song
Cite
Arxiv
Cite
×