RL rất nhạy cảm với các số liệu, lần trước torch compile đã khiến một số lần chạy bị sập, giờ là vllm v1
Mika Senghaas
Mika Senghaas11:23 12 thg 8
moving from vllm v0 to v1 made our async rl training crash! read how we fixed it we recently migrated from v0 to v1 as part of a larger refactor of prime-rl to make it easier-to-use, more performant and naturally async. we confirmed correct training dynamics on many smaller-scale runs, but hit a wall when trying to reproduce a larger scale run that ran without problems prior to the refactor. Specifically, training DeepSeek-R1-Distill-Qwen-1.5B on single-turn math problems from our INTELLECT-2 math dataset at 8k context with two-step off-policy delay would crash fatally roughly 400 steps into the training
6,77K