2 Comments
User's avatar
⭠ Return to thread
gwern's avatar

You couldn't have in 2016, though, because there were no meaningful benchmarks or problem sets you could run a Waymo car on. A Waymo car's ability in 2016 was... '???'. A Waymo car's ability in 2023 is... '???'. (And if you think that DL benchmarks like MMLU are flawed, wait until you look at the California numbers like 'miles per disengagement' everyone is forced to use because there's literally nothing else!) This has been one of the biggest frustrations in following self-driving car progress. There just aren't any relevant benchmarks you can even begin to extrapolate on. They surely exist *internally*, but self-driving car companies are extraordinarily opaque, and what numbers they release tend to be either stripped of any context which would make them meaningful or actively deceptive (looking at you, Tesla).

There is no comparison here with language models which have lots of suites and benchmarks, excellent scaling curves on relevant properties, active prediction markets & forecasters on them, and so on. Thanks to all that, we can say that there are no indications of an S-curve kicking in (note for example the beautiful fit OA shows for GPT-4, with no 'bounce' indicating an unexpected flatlining or divergence from the projected loss).

Expand full comment
Michael Sullivan's avatar

This strikes me as a lot of handwaving and cope. Do we need something that we can perfectly plot on a graph here?

In 2007 (or 2006 or something, I forget), autonomous vehicles couldn't navigate through the open desert and reach a finish line. The idea that they could be on a road with people was laughable -- it was obvious they'd kill everyone and then themselves, basically instantly. In successive DARPA challenges, they started finishing the course, then finishing harder courses, going through simulated traffic.

By 2013, we had cars that could, in certain situations, safely drive in traffic (on freeways). By 2015, we had cars that could handle (a subset of) urban traffic situations, probably not actually safely compared to human drivers, but like not two orders of magnitude worse or anything. By 2018, Waymo launched autonomous vehicles to non-employees in Scottsdale. And then... we've inched forward. We have a small fleet driving, almost certainly deeply unprofitably, in San Fancisco. The Scottsdale area has increased a bit.

This clearly was a surprise to companies working in the autonomous vehicle space. Their internal metrics didn't give them any better prediction that this slowdown was coming.

Does this mean that LLMs will suddenly have a giant slowdown in progress post GPT-4?

It absolutely does not.

Does this mean that people should rein in their confident predictions that LLMs will increase steadily with no end in sight? It does.

Expand full comment