Beyond the Data Wall

The Truth Can’t Be Improved Upon

You know, I’ve been thinking a lot about this whole ‘data wall’ idea that’s been floating around lately. And I gotta say, I think it’s a load of nonsense. Here’s the deal: it might seem like LLMs (that’s Large Language Models for those who aren’t in the know) are hitting a plateau, but that’s not really what’s happening.

The reason it looks like we’re not making progress is actually pretty simple. A lot of the questions we throw at these models have straight-up correct answers. Think about it - if you ask what 2 + 2 is, and every LLM out there is nailing it with ‘4’, how the heck are you supposed to improve on that?

Here’s the kicker: you can’t improve upon truth. Let that sink in for a minute. It’s a powerful idea, right?

The Illusion of Plateauing

So, we’ve got all these state-of-the-art models that are getting things right most of the time. That’s awesome, but it also makes it tough to see improvement. It’s not that we’ve hit some kind of data wall. It’s just that when you’re already so darn close to perfect, those extra gains are hard to squeeze out.

The Next Frontier: Agentic and Action-Based Improvements

Now, don’t get me wrong. There’s still plenty of room for improvement. One big area that’s got me excited is agentic and action-based improvements. But here’s the thing - we need better ways to evaluate this stuff.

Take the LLM SIS leaderboard, for example. It’s cool and all, but it’s mostly judging what humans think is a good or bad response. That’s a start, but it’s missing a huge piece of the puzzle.

The Need for Planning-Based Evals

What we really need is a planning-based eval. Here’s how I see it working:

All questions would be requests for tasks
We’d judge the LLM’s ability to create a proper plan
We’d take into account the tools and resources available to the LLM

This kind of eval would give us a much better picture of how these models handle complex, real-world scenarios. It’s not just about spitting out facts anymore - it’s about understanding and strategizing.

So, while it might look like we’re hitting a wall, the truth is we’re just gearing up for the next big leap. And let me tell you, it’s going to be awesome.

- Joseph

ai cybersecurity productivity

The Truth Can’t Be Improved Upon

The Illusion of Plateauing

The Next Frontier: Agentic and Action-Based Improvements

The Need for Planning-Based Evals

Related Posts

Introducing the Glazing Score 🍩 30 Apr 2025

Self Alignment: How to Know What To Do 25 Apr 2025

High Agency Hacking 28 Mar 2025