Subiecte populare
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Am putea rezolva acest lucru prin execuție constrânsă.
Limitați lungimea de ieșire, cum ar fi limita de 140 de caractere pe Twitter.
Sau să limiteze timpul de execuție, cum ar fi modul în timp real în Linux.


10 aug. 2025
I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case.
For example in coding, the models now tend to reason for a fairly long time, they have an inclination to start listing and grepping files all across the entire repo, they do repeated web searchers, they over-analyze and over-think little rare edge cases even in code that is knowingly incomplete and under active development, and often come back ~minutes later even for simple queries.
This might make sense for long-running tasks but it's less of a good fit for more "in the loop" iterated development that I still do a lot of, or if I'm just looking for a quick spot check before running a script, just in case I got some indexing wrong or made some dumb error. So I find myself quite often stopping the LLMs with variations of "Stop, you're way overthinking this. Look at only this single file. Do not use any tools. Do not over-engineer", etc.
Basically as the default starts to slowly creep into the "ultrathink" super agentic mode, I feel a need for the reverse, and more generally good ways to indicate or communicate intent / stakes, from "just have a quick look" all the way to "go off for 30 minutes, come back when absolutely certain".
Nu trebuie să utilizați conceptele exact, dar ideile din dezvoltarea sistemului de operare în timp real ar putea fi incluse ca constrângeri în timpul antrenamentului și evaluării AI.
Soft în timp real poate fi suficient. Cu excepția cazului în care este un accident de mașină dacă nu funcționează, cum ar fi în timp real.

104,69K
Limită superioară
Clasament
Favorite