42 Comments
User's avatar
Age of Infovores's avatar

The real question is whether the AI’s exam performance means anything at all. Studies show very little overlap between what AIs do in school and the skills they actually need on the job.

Expand full comment
forumposter123@protonmail.com's avatar

My very limited experience with ChatGPT is that it will give you a shallow summary of something with a lot of data on the internet without taking much of a side.

That's probably passes the test in many tasks but not all.

Do you need mediocre but cheap answers to things without deep understanding? We've got a Voxsplainer writer in a box!

Expand full comment
SolarxPvP's avatar

This is partially because the designers are terrified of it being offensive. They have explicitly said they've tried to make it as inoffensive as possible.

Expand full comment
forumposter123@protonmail.com's avatar

Ok, but I asked it a question about my industry that doesn't touch on race or sex or anything and the output was just as mediocre.

Expand full comment
Jason Crawford's avatar

If your goal is to actually identify breakthrough technologies even slightly ahead of the curve, then I don't think it's helpful to apply base rates, for this exact reason. You will always predict “no”, you will be right 95+% of the time, and you will miss every transformative technology until it's too obvious to ignore.

I think AI is on a strong trajectory to be extremely useful, but I'm not sure I would take this bet. “Passing exams” is not an economically useful function (except to students who want to cheat?) and it's not clear to me that AI will be engineered or optimized for this. If you picked something with a clear economic value, like generating marketing copy or writing scripts for TV and movies, I would be much more likely to take the bet.

Expand full comment
Byrel Mitchell's avatar

If you interpret 'apply a 95% negative base rate' as 'just say no to all transformative techs', then of course you're right. But that's not really how one should apply a base rate. You just use Bayes rule, and allow the negative base rate to pre-weight your odds that a given tech will be transformative appropriately low.

Expand full comment
Jason Crawford's avatar

Good point, but if you're really seriously doing that then I don't see how you could dismiss everything that AI has just become capable of in the last couple of years. That is an extremely strong trajectory towards some very fundamental capabilities—far more than enough to overcome 19:1 odds.

Expand full comment
Byrel Mitchell's avatar

This boils down to what we mean by transformative, at least in my view. I mean, my personal evaluation is that AI is 90+% likely to be very useful as a tool in many fields by 2030. It's FAR less likely to replace entire fields. I'm not clear exactly what Bryan is estimating here.

Expand full comment
Dave Friedman's avatar

This seems like the correct interpretation to me. In any event, ChatGPT (or a similar tech) purportedly has passed assorted medical exams and bar exams. So I don't know what insight is gained by this bet. You can make a test arbitrarily difficult, such that ChatGPT or its future descendants can't pass it, but what does that prove other than arbitrary difficulty?

Expand full comment
Calion's avatar

Since he expects his students to pass it, it can't be arbitrarily difficult.

Expand full comment
Ferran Casarramona's avatar

D is not bad for a guy that didn't attend to your lectures.

Expand full comment
Nicholas Spina's avatar

Shouldn't you use a third-party grader or even a set of graders? Grading is inherently subjective. What you consider a D, another professor might consider a C depending on the rubric, their mood, student quality, etc. And even if we assume no progress in this technology, which seems unlikely, a beta version of a new tech scored a marginally passing grade in an advanced economics course - probably as good or better than a substantial percentage of all college students in the country. That seems pretty amazing to me.

Expand full comment
SolarxPvP's avatar

Caplan's exams also seem hard. His grading seems particularly demanding (as his Rate My Professor reviews confirm).

Expand full comment
Kurt's avatar

How about blinding the AI’s exam by including it with all other student’s exams for grading? That way, Brian won’t know whether he’s grading a human student or the AI.

Expand full comment
SolarxPvP's avatar

Seems fun, but I don't think Caplan is that biased.

Expand full comment
SolarxPvP's avatar

As in it would be fun to see Bryan's reaction to it being an AI.

Expand full comment
Kenny Easwaran's avatar

Wouldn't the right way to do this be to include the AI test among the exams you actually grade during the semester, without identifying it as an AI test? Grading without knowing the identity of the student who wrote the test is probably good for a variety of reasons (though it can introduce complications if you're dealing with essays that students have worked on drafts of) and would make the test more fair.

Expand full comment
Shasta's avatar

You should do this in a blinded way! You likely will grade the AI very differently because you know it is an AI. My old econ teacher used to do this to avoid bias – have students write their name on the back of the last page.

Expand full comment
Davis Yoshida's avatar

This is probably the first Bryan bet I've thought he was way off the mark on. Exciting!

Expand full comment
Enrique Guerra-Pujol's avatar

I have a feeling that Caplan will either become an especially hard grader or that he will lose this bet!

Expand full comment
William Connolley's avatar

Now we need a prediction market on this bet. I'd go for the AI's side, certainly at evens.

Expand full comment
Danno28's avatar

Somebody posted this joke on twitter, apologies I forgot who, but i think it is highly relevant -

I was in the park the other day, and walked past a man playing chess against a dog. "Wow," I said , "That's a smart dog."

"Not that smart," the man replied. "I'm winning 3 games to 1."

Seriously, what % of the population could get a D or higher on an labor econ midterm. Maybe 10%?

For certain tasks the ChatGPT is already outperforming humans (eg some coding tasks, organizing rough notes in to a coherent structure). It's underperforming on internal consistency of answers and general knowledge. But I can't imagine those things won't be fixed in six years.

Expand full comment
JSM's avatar

One thing I don't understand: if Matthew is right, why would he pick the 6 latest midterms from ~2028? If he's right, professors might be forced to change their assignments and midterms by that point. I think you should the 6 latest midterms from today, not from 6 years from now.

Additionally, by allowing "any AI selected by Matthew" does that mean you'd allow Matthew to train an AI on your class lectures and midterms? Because if so, there's a chance ChatGPT could pass right now with the right training.

Expand full comment
Infinita City's avatar

You miss 100% of the moonshots you don't take - that's the problem with the base rate argument

That said, I think you're correct when it comes to Generative AI

I think ChatGPT was a PR stunt for potentially more valuable but far less flashy use cases, such as B2B automation, data aggregation, the workplace etc.

There is a reason that Microsoft is the biggest investor in OpenAI

Expand full comment
Maxim Lott's avatar

My prediction: By 2029, it will be common knowledge that AI aces college exams, in general.

However, Bryan's exams are idiosyncratic enough that the AI might not quite hit this high grading bar, due to being trained on conventional economics textbooks (Krugman etc.) So I think Bryan will win the bet. The AI would need to be trained on his lecture transcripts to avoid this issue.

Expand full comment
Andrea's avatar

"2. Bryan will then grade the AI's work, as if it were one of his students"

How will you know you'll be fair? Will you accept the 6 manuscripts shuffled in between your students and grade anonymously?

Expand full comment
Ian Sherman's avatar

Great bet! I've posted this elsewhere, but you (and other commenters) may be interested in seeing a working data scientist's opinion about ChatGPT that I wrote about a month ago, wherein I more-or-less agree with the sentiment of "grossly overpromising and underdelivering:" https://ipsherman.substack.com/p/an-opinion-about-ai-chatgpt-and-more

Unrelatedly, in 2021 I did a post on how much to worry about COVID for kids. I wouldn't usually comment at all, let alone about something unrelated, but in this post I refer to Kahneman’s maxim as well (using the same terminology!): https://ipsherman.wordpress.com/2021/09/11/why-i-dont-make-my-kids-wear-masks/ <- I was (at least partially) inspired to write this and a previous post by your questions about how much worse was COVID than the normal flu.

Thank you Professor Caplan for your years of insightful, prolific, social-desirability-bias-eschewing blogging!

Expand full comment