Chủ đề thịnh hành
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
RL rất nhạy cảm với các số liệu, lần trước torch compile đã khiến một số lần chạy bị sập, giờ là vllm v1

11:23 12 thg 8
moving from vllm v0 to v1 made our async rl training crash! read how we fixed it
we recently migrated from v0 to v1 as part of a larger refactor of prime-rl to make it easier-to-use, more performant and naturally async. we confirmed correct training dynamics on many smaller-scale runs, but hit a wall when trying to reproduce a larger scale run that ran without problems prior to the refactor. Specifically, training DeepSeek-R1-Distill-Qwen-1.5B on single-turn math problems from our INTELLECT-2 math dataset at 8k context with two-step off-policy delay would crash fatally roughly 400 steps into the training

6,77K
Hàng đầu
Thứ hạng
Yêu thích