The Truth Can’t Be Improved Upon
You know, I’ve been thinking a lot about this whole ‘data wall’ idea that’s been floating around lately. And I gotta say, I think it’s a load of nonsense. Here’s the deal: it might seem like LLMs (that’s Large Language Models for those who aren’t in the know) are hitting a plateau, but that’s not really what’s happening.
The reason it looks like we’re not making progress is actually pretty simple. A lot of the questions we throw at these models have straight-up correct answers. Think about it - if you ask what 2 + 2 is, and every LLM out there is nailing it with ‘4’, how the heck are you supposed to improve on that?
Here’s the kicker: you can’t improve upon truth. Let that sink in for a minute. It’s a powerful idea, right?
The Illusion of Plateauing
So, we’ve got all these state-of-the-art models that are getting things right most of the time. That’s awesome, but it also makes it tough to see improvement. It’s not that we’ve hit some kind of data wall. It’s just that when you’re already so darn close to perfect, those extra gains are hard to squeeze out.
The Next Frontier: Agentic and Action-Based Improvements
Now, don’t get me wrong. There’s still plenty of room for improvement. One big area that’s got me excited is agentic and action-based improvements. But here’s the thing - we need better ways to evaluate this stuff.
Take the LLM SIS leaderboard, for example. It’s cool and all, but it’s mostly judging what humans think is a good or bad response. That’s a start, but it’s missing a huge piece of the puzzle.
The Need for Planning-Based Evals
What we really need is a planning-based eval. Here’s how I see it working:
- All questions would be requests for tasks
- We’d judge the LLM’s ability to create a proper plan
- We’d take into account the tools and resources available to the LLM
This kind of eval would give us a much better picture of how these models handle complex, real-world scenarios. It’s not just about spitting out facts anymore - it’s about understanding and strategizing.
So, while it might look like we’re hitting a wall, the truth is we’re just gearing up for the next big leap. And let me tell you, it’s going to be awesome.
- Joseph
Sign up for my email list to know when I post more content like this. I also post my thoughts on Twitter/X.