Truth, Progress, and AI

The Truth Can’t Be Improved Upon

Hey folks, let’s chat about something cool that’s been buzzing in my head lately. You know how everyone’s going on about LLMs hitting a data wall? Well, I think that’s a bunch of baloney. Here’s the deal: it might seem like these language models are plateauing, but there’s a really interesting reason for that.

Think about it this way: a lot of the questions we throw at these LLMs have straight-up correct answers. Like, if you ask what 2 + 2 is, it’s always gonna be 4, right? Once all the top models get that right, there’s nowhere else to go. You can’t improve on the truth, folks. That’s a powerful idea, isn’t it?

So here’s the thing: it’s not that we’re running out of data or that the models can’t get smarter. It’s just that when you’re dealing with facts, there’s a ceiling. Once you hit it, that’s it. Game over.

The Challenge of Measuring Progress

Now, this creates an interesting problem. How do we measure progress when all the top models are nailing the basics? It’s like trying to judge a bunch of Olympic swimmers by seeing if they can float. Not very helpful, right?

This is why it might look like we’re plateauing from an intelligence perspective. But don’t be fooled. It’s not a data wall we’re hitting - it’s just that squeezing out extra gains when they’re all so correct so often is a tough job.

What’s Next? Action and Planning

So, what’s the next big thing? I reckon it’s gonna be all about agentic and action-based improvements. We need to set up better ways to evaluate these models, focusing on stuff like planning and task execution.

Take the LLM SIS leaderboard, for example. It’s cool and all, but it’s mostly judging what a human finds as a good or bad response. That’s not enough. We need to step up our game.

Here’s my two cents: we should create a planning-based eval. Every single question should be a request for a task. Then we judge how well the LLM can create a proper plan, considering the tools and resources it has. Now that would be awesome!

The Bottom Line

Look, the fact that LLMs are getting so good at factual stuff is great. But to really push the boundaries, we need to think outside the box. We need to challenge these models in new ways, focusing on skills like planning and task execution.

Remember, you can’t improve upon truth. But there’s still a whole lot of room for improvement in how these models understand and interact with the world. And that, my friends, is where things are gonna get really interesting.

- Joseph

Sign up for my email list to know when I post more content like this. I also post my thoughts on Twitter/X.