This kind of just glosses over the fundamental fact that we don't have a way to measure the improvements here like we do for other things like LLM's. This very article you're commenting on has a nice benchmark. What's the equivalent for self-driving cars?
This kind of just glosses over the fundamental fact that we don't have a way to measure the improvements here like we do for other things like LLM's.
This very article you're commenting on has a nice benchmark. What's the equivalent for self-driving cars?