58 Comments
User's avatar
⭠ Return to thread
Alex Potts's avatar

For a few years now, AI sceptics have argued "well it can answer question A, but it still gets harder question B wrong", ignoring that six months ago it couldn't answer A either and it's the direction of travel that's important. It feels like we are now beginning to run out of room to make the questions harder (unless it's to ask questions that humans can't answer either); and the rate of AI improvement shows no sign of slowing down.

Expand full comment
Michael Sullivan's avatar

Now do this for self-driving cars in 2016.

I'm not making a prediction here! I have very little sense of when the large language model S-curve will change slope. But I think it's pretty clearly the case that the end of progress can come very suddenly.

Expand full comment
JHal's avatar

Self driving cars are already safer then the average driver(not saying a lot tbh) the main reason we don't see more is government regulation.

Expand full comment
Michael Sullivan's avatar

This is just not true (or perhaps it has become true very recently, I don't have real-time information about Waymo's results or anything, but it wasn't true say a year ago). It's something that people who like their simple narratives on AI progress tell themselves as cope.

Expand full comment
G466's avatar

Are you aware that Waymo cars are being actively used as an autonomous taxicab service in the Bay Area and a few other places?

Expand full comment
Michael Sullivan's avatar

I sure am!

I'm also aware that they can't be used as an autonomous taxicab service almost anywhere, and that Waymo is not pushing hard for expansion.

So, look, there's certainly some nuance here. Waymo's cars may well be safer than the average driver, *in places where they have extremely good inch-by-inch mapping data*, and *in some weather conditions*, and *while driving in somewhat restricted ways that don't express the range of driving that normal people do* (such as taking unprotected left turns and going around double-parked cars and so forth). And that's legitimate. I think we can round that off to "not safer than the average driver," but if you want to express that as "safer than the average driver but not well-suited to driving in all the places and conditions where the average driver can," then that's cool too.

But what's very clear is that in 2016 or so, we'd seen these great strides since 2007, where each year autonomous cars got vastly better than the previous, and where a straight-line extrapolation put full-self-driving in like 2018 or so, maybe skeptically 2020. And then, just as abruptly as that progress started, it flattened way the hell out. And it has not been because the government jumped in their way.

Expand full comment
Dustin's avatar

This kind of just glosses over the fundamental fact that we don't have a way to measure the improvements here like we do for other things like LLM's.

This very article you're commenting on has a nice benchmark. What's the equivalent for self-driving cars?

Expand full comment
gwern's avatar

You couldn't have in 2016, though, because there were no meaningful benchmarks or problem sets you could run a Waymo car on. A Waymo car's ability in 2016 was... '???'. A Waymo car's ability in 2023 is... '???'. (And if you think that DL benchmarks like MMLU are flawed, wait until you look at the California numbers like 'miles per disengagement' everyone is forced to use because there's literally nothing else!) This has been one of the biggest frustrations in following self-driving car progress. There just aren't any relevant benchmarks you can even begin to extrapolate on. They surely exist *internally*, but self-driving car companies are extraordinarily opaque, and what numbers they release tend to be either stripped of any context which would make them meaningful or actively deceptive (looking at you, Tesla).

There is no comparison here with language models which have lots of suites and benchmarks, excellent scaling curves on relevant properties, active prediction markets & forecasters on them, and so on. Thanks to all that, we can say that there are no indications of an S-curve kicking in (note for example the beautiful fit OA shows for GPT-4, with no 'bounce' indicating an unexpected flatlining or divergence from the projected loss).

Expand full comment
Michael Sullivan's avatar

This strikes me as a lot of handwaving and cope. Do we need something that we can perfectly plot on a graph here?

In 2007 (or 2006 or something, I forget), autonomous vehicles couldn't navigate through the open desert and reach a finish line. The idea that they could be on a road with people was laughable -- it was obvious they'd kill everyone and then themselves, basically instantly. In successive DARPA challenges, they started finishing the course, then finishing harder courses, going through simulated traffic.

By 2013, we had cars that could, in certain situations, safely drive in traffic (on freeways). By 2015, we had cars that could handle (a subset of) urban traffic situations, probably not actually safely compared to human drivers, but like not two orders of magnitude worse or anything. By 2018, Waymo launched autonomous vehicles to non-employees in Scottsdale. And then... we've inched forward. We have a small fleet driving, almost certainly deeply unprofitably, in San Fancisco. The Scottsdale area has increased a bit.

This clearly was a surprise to companies working in the autonomous vehicle space. Their internal metrics didn't give them any better prediction that this slowdown was coming.

Does this mean that LLMs will suddenly have a giant slowdown in progress post GPT-4?

It absolutely does not.

Does this mean that people should rein in their confident predictions that LLMs will increase steadily with no end in sight? It does.

Expand full comment
Alex Potts's avatar

This is true, but I think if a technology is showing consistent progress over time, the null hypothesis should be that the progress should continue, and the burden of proof should be on people who think it will stop to give reasons why this should be the case.

Sometimes there are good reasons. Moore's Law held true for decades but we're now running up against a limit imposed by the discrete/atomic nature of matter itself. But I can't think of a good reason why LLM progress should suddenly stop any time soon.

Expand full comment
Michael Sullivan's avatar

I don't think there is a burden of proof! We aren't in a courtroom, and I suggested a bit of humility in your predictions, I didn't say your wife was ugly.

There is clearly a lot of team thinking here. Like, "Oh, you have to prove that my side is wrong, or else we're right." But there aren't actually sides. I suggest that you shouldn't identify with team "AI will progress quickly."

Expand full comment