A good example for why me and many RL whispers say that you need to use bigger base models for RL today. Better pretraining will make it so RL on smaller base models can solve harder and more interesting tasks. This is the way.