LLMs: Beyond Factual Accuracy

Hey there, tech enthusiasts! Today we’re diving into a hot topic that’s been buzzing around the AI world. You’ve probably heard folks saying LLMs are hitting a wall. But guess what? I think that’s a load of baloney. Let me break it down for you.

The Truth Can’t Be Improved Upon

Here’s the deal: it might seem like LLMs are plateauing, but there’s a cool reason for that. A lot of the questions we throw at these models have straight-up correct answers. Think about it - if you ask an LLM what 2+2 is, and it says 4, how the heck are you gonna improve on that? You can’t! And that’s because you can’t improve upon truth. Pretty awesome, right?

Why It Looks Like We’re Stuck

So, here’s the situation:

  1. Most state-of-the-art models are nailing the answers to a ton of questions.
  2. These questions often have a true, factual answer.
  3. When all the top models are getting things right so often, it’s super hard to see improvement.

It’s not that we’ve hit a data wall. It’s just that squeezing out extra gains when they’re all so darn correct is a tough nut to crack.

What’s Next? Action and Planning!

Now, don’t get me wrong - there’s still plenty of room for LLMs to level up. One big area that’s ripe for improvement is agentic and action-based stuff. We need to set up better ways to evaluate this kind of performance.

For example, take the LLM SIS leaderboard. It’s cool and all, but it mostly judges what a human thinks is a good or bad response. That’s not enough! We need to focus on planning skills.

Here’s what I think we should do:

  1. Create planning-based evaluations
  2. Make all the questions about requesting tasks
  3. Judge the LLM’s ability to create a solid plan based on its tools and resources

By doing this, we’ll get a much better picture of how these models can handle real-world scenarios and complex planning tasks.

So, there you have it! The next time someone tells you LLMs are hitting a wall, you can hit ‘em with the truth - we’re not plateauing, we’re just nailing the facts. And the future? It’s all about action and planning. Exciting times ahead in the world of AI!

- Joseph

Sign up for my email list to know when I post more content like this. I also post my thoughts on Twitter/X.