Nous pourrions aborder cela par une exécution contrainte. Limiter la longueur de la sortie, comme la limite de 140 caractères sur Twitter. Ou limiter le temps d'exécution, comme le mode temps réel sous Linux.
Andrej Karpathy
Andrej Karpathy10 août 2025
I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case. For example in coding, the models now tend to reason for a fairly long time, they have an inclination to start listing and grepping files all across the entire repo, they do repeated web searchers, they over-analyze and over-think little rare edge cases even in code that is knowingly incomplete and under active development, and often come back ~minutes later even for simple queries. This might make sense for long-running tasks but it's less of a good fit for more "in the loop" iterated development that I still do a lot of, or if I'm just looking for a quick spot check before running a script, just in case I got some indexing wrong or made some dumb error. So I find myself quite often stopping the LLMs with variations of "Stop, you're way overthinking this. Look at only this single file. Do not use any tools. Do not over-engineer", etc. Basically as the default starts to slowly creep into the "ultrathink" super agentic mode, I feel a need for the reverse, and more generally good ways to indicate or communicate intent / stakes, from "just have a quick look" all the way to "go off for 30 minutes, come back when absolutely certain".
Vous n'avez pas besoin d'utiliser les concepts exactement, mais des idées du développement de systèmes d'exploitation en temps réel pourraient être intégrées comme contraintes lors de l'entraînement et de l'évaluation de l'IA. Un temps réel souple peut être suffisant. À moins que ce ne soit un accident de voiture si cela ne fonctionne pas, comme un temps réel strict.
104,68K