I tried to answer your questions based on my experience and knowledge (as a retired civil engineer), and I'm pretty sure the AI did better than you would have graded me. On the other hand, I am sure that if I had sat through your classes, hearing how you expressed these concepts, and getting feedback to my questions, I would have done much better.
Given that ChatGPT uses available sources - which include a hodge-podge of divergent opinions - it is not surprising that it failed to respond to your questions as you outlined. But if it had access to transcripts of all your classes, and knew to give priority to your input over what is generally available, I suspect that it would have returned something much closer to what you expected.
One of the chief drawbacks to ChatGPT, at least as I understand it, is that it simply looks at all the information - both correct and incorrect - and tries to provide an answer that weights all opinions. It does NOT yet have the ability to evaluate logically ideas against data and to put together a thesis that is based on facts but that runs contrary to widely established opinions.
I think what's most impressive about ChatGPT is not its current capabilities, but its momentum. Just a few years ago the idea of an AI taking an IQ-test or an SAT was almost laughable. AI experts predicted that this level of capabilities wouldn't be achieved until 2030s and general public considered even those predictions too optimistic.
Just a couple of years ago GPT-3 was mostly being compared to 7 year old kids. Today you are comparing ChatGPT to a college student.
Thank you @bryancaplan. I wonder where is the really thoughtful, long term, consideration of the impacts (ethical/moral/academic/professional) this software brings to our world... and no, I'm not a gray goo adherent. But I am curious.
There are many thoughtful, long term, considerations of the impacts this software brings to our world being offered up by academics, intellectuals, journalists, etc. And those considerations are having precisely zero impact on the accelerating pace of AI development and deployment. The djinn is coming out of the bottle whether we're ready or not.
Thank you for trying this -- that is a very useful contribution!
It does seem like there is a bigger-picture point though, in that this is software available to the general public interpreting a free-format natural language economics exam and writing essay-style answers that are mostly coherent -- an earthshaking development compared to the state of the art just three years ago. It seems a bit like critiquing the ballet-dancing bear's pointe technique and docking 2 points for performance while grudging acknowledging that the choreography and presentation are passable, while others observers are going "Holy hot sauce, that bear is doing ballet!"
Since different ChatGPT prompts result in different answers, don't you need to tell us your exact inputs that produced these answers? Further, isn't it possible that some prompts could result in significantly better performance, such as telling it to respond like an economist or economic student who is taking a test? Given what I have seen elsewhere with attempts to improve outputs, it's highly likely there is more optimization that you could do to improve the test score
I don't know if this is idiosyncratic to me or not, but I find the way Brian writes questions confusing. Consider the below snippet:
"T, F, and Explain: Krugman argues that such employment loss is a market failure that justifies government regulation."
I take it from context that Krugman *does* in fact argue this and the question is not "can you recapitulate the content of Krugman's argument?" but rather "is the content of this argument true?". I think a capable student will get there, but given that testing is generally pretty stressful anyway, if I were a student, I would be a LOT happier if the question was: "Krugman argues {x}. Is {x} actually true?".
Not a Krugman fan, but I think we should all be able to agree it would be awfully arrogant of Caplan or any other econ prof to teach his students as FACT that the assertions of a living Nobel prize-winning economist are FALSE, full stop. I don't think that's what he's doing. He's asking if Krugman said that and to then, if true, explain why, and if false, explain Krugman's actual opinion on the matter.
I had the same impression you did initially about the Krugman question, although I eventually figured it out. It's possible I wouldn't have been at all confused if I'd taken the class, but I agree that the question could have been worded better.
Years ago I watched the Jeopardy! with the IBM AI Watson. It "won" in that it regurgitated answers faster than the human contestants and dominated the board. In Final Jeopardy is answered to the category "U.S. Cities" was "Toronto". It was so far ahead that it didn't matter, but it exhibited a habit you occasionally see with AI. Occasional gross and obvious errors no human would make.
The big thing they wanted it to do was medical industry, but it didn't work. You can't make errors like that in medicine.
At the same time, AI does appear to be good at producing "mediocre work for very cheap." I think someone in the translation industry noted that not great translation for 90% less of the price is usually "good enough" for most customers. If what you want isn't sensitive to these kind of big dumbfounding errors from time to time, it might matter.
Early guns weren't as good as bows, but they were a cheaper weapons system.
Basically, AI can replace mediocore and fairly unimportant work, of which we still have a lot of.
ChatGPT does this too. I asked it for the 10 largest companies headquartered in Silicon Valley and it gave me 9 correct ones plus Amazon. I told it Amazon was in Seattle. It said, yes that was a mistake and then made a new list but put Intel in place of Amazon thereby listing Intel twice.
I agree that ChatGPT is very bad about making lists with oversights that it will immediately agree are oversights, which seems like very low-hanging fruit. It seems if it took a little more time to answer, it could easily do much better. But also it's free to play with right now and I suppose OpenAI doesn't want to spend too much compute on it.
A more important question may be: if ChatGPT had transcripts of your class lectures, and were told to refer to those in answering these questions, what grade would it get then? Based on my use of AI, I suspect it would do very well if given the same materials students are given.
I also suspect that most college students who had *not* taken your class, but were reliant on ChatGPT's database to answer these questions, would also score very poorly.
The reason that's important is because then it's just a matter of feeding the right info to ChatGPT -- its ability to use it well is already mostly there.
From a Turing Test perspective, these are good answers--much, much higher quality than you would get asking a random college graduate who had not taken your class. Additionally, simply feeding the questions on the test to ChatGPT is not a fair comparison to your students. Your students probably had a lot of additional context as to what level of detail to go into when answering questions and what sorts of things it is important to mention in answers in order to get a good grade. I expect that if you provided that sort of context in the prompt (and especially if you used a few different prompts and selected the best answer provided) that the bot would get a substantially better grade.
I gave ChatGTP the final exam of my strategic management course and and asked a colleague (who teaches the same course and gave his students the same exam) to grade it without mentioning that they were ChatGTP answers.
The outcome: ChatGPT performed comfortably above average, both in the multiple choice questions and in the open ended ones.
Also, there was a lot of variability. Some answers were excellent while others were considerably subpar.
As people here comment, it is actually amazing. I have already used chat-gpt a lot and still had no idea it could even grasp these complicated economy questions.
I’m pretty much amazed that the answers were as good as they were. Getting a “D” on this test for a class it didn’t take with nothing to go on but the questions? Crazy.
I also feel like we might be under appreciating the strengths. For example, you didn’t say how many extra points for making the answers rhyme. (If you say zero, I won’t believe anything else you say either.)
Re: 1. Suppose a country’s workplaces start to experience periodic blackouts.
T, F, and Explain: Aggregate Labor Demand will clearly fall even though some workers will benefit.
True, it's quite plain
If blackouts cause businesses to refrain
From hiring workers, then it's plain to see
That Aggregate Labor Demand will surely be
Affected by the blackouts, even though some workers may
Find themselves with more hours to play
But if businesses suffer, then overall demand will shrink
And fewer workers will be hired, it's not hard to think.
Or what about acting out the answer with a scene from Friends?
INT. MONICA AND CHANDLER'S APARTMENT - DAY
Monica is pacing back and forth, looking worried. Chandler is sitting on the couch, flipping through channels on the TV.
MONICA
I can't believe it. All these blackouts at the restaurant are really hurting business.
CHANDLER
Yeah, it's been rough. But at least we still have our jobs.
MONICA
True, but what about all the other workers? If business keeps falling, they might have to let people go.
CHANDLER
(pauses TV) Wait, you mean Aggregate Labor Demand could fall because of these blackouts?
MONICA
Exactly. Even though some workers might benefit from the blackouts by getting more hours, overall demand for labor could drop if businesses can't operate normally.
CHANDLER
Wow, I never thought about it that way.
MONICA
Exactly. It's important to think about how economic events can affect the whole picture, not just one group.
CHANDLER
(nods) You're right. Good thing we have each other to help weather the storm.
I was surprised that you gave it any points at all for question 2 (the kind of BS paraphrase of the question that you can usually do in a subject you know nothing about), and also that you didn't give more points for question 4 (nothing in the T/F statement itself suggested to me that you'd expect me to restate the transparent meaning of the final part of the Landsburg quote).
What this perhaps shows is that ChatGPT has been trained with material from textbooks and other sources that do not reflect GMU's economics department curriculum. Had it been trained with, say, transcripts of Caplan's lectures, it most likely would have achieved a higher score.
When I wrote the above comment, I hadn't looked at ChatGPT, but commented based on my general knowledge of neural networks. Now that I've seen Stephen Wolfram's explanation of it (https://www.youtube.com/watch?v=zLnhg9kir3Q), I think that even if it were trained with lecture transcripts, it would not have done much better. OTOH, something more along the lines of an IBM Watson, if given lectures, economics textbooks, etc., could possibly get a better grade on an exam.
Sure, but four years ago the best AI probably would have gotten a 0. And I'd be willing to bet even money that within 5 years, the best publicly available AI can get a B or better on tests of this sort (I'll let you adjudicate). Interested?
That AI has gotten better in five years does not imply that it will be enhanced in the future.
Perhaps this is the best it gets. As humans cobble together more coherent AI systems these systems will appear to be better, but articulation is not a sign of a rise in intelligence, in the same way that the bullet train of today doesn't infer that the steam locomotive was inferior.
When I was young they came out with the first version of a portable answering machine, a clunky piece of electronics about the size of a cereal box, the device utilized two magnetic cassette tapes, one tape to record a greeting and the other tape to store messages. The machine efficiently answered the phone, and allowed the caller to leave a message which could be retrieved later by the phone owner.
Today, we have digital voice mail. The technology underlying is unquestionably more sophisticated, and yet the functionality is basically the same. The automation of a call and the ability to leave a message.
The locomotive and a bullet train both travel on rails. If your sole purpose is to get from point A to point B without regard to time, then either option is adequate.
My observation, particularly in regard to ChatGPT, is this is a new stage of technology or just another iteration, a faster train, a better answering machine. I'm skeptical of hype, even more so when it comes from high places.
There has only been one industrial revolution, and maybe there will be a "singularity" that is just as transformative or even more transformative, but probably not.
Technological progress is generally plateauing. Though there are still transformative technologies (like the original answering machine) and there are incremental technological improvements (like the addition of automatic transcription to voicemails). Transformative technologies are much more impactful than incremental technologies, but transformative technologies are still generally plateauing as well, as the industrial revolution plays itself out.
I'm inclined to think AI applications of this sort will prove to be a transformative technology. Because of the plateau, they will still prove to be less impactful than, say, the Internet has been up to this point, but much more impactful than the incremental improvements in Internet search from 2000 - 2023 have been.
AIs are only as good as the training material. Train it on A material, you will get A answers. Train it on Wikipedia, you will get garbage on anything that is slightly or more political.
I tried to answer your questions based on my experience and knowledge (as a retired civil engineer), and I'm pretty sure the AI did better than you would have graded me. On the other hand, I am sure that if I had sat through your classes, hearing how you expressed these concepts, and getting feedback to my questions, I would have done much better.
Given that ChatGPT uses available sources - which include a hodge-podge of divergent opinions - it is not surprising that it failed to respond to your questions as you outlined. But if it had access to transcripts of all your classes, and knew to give priority to your input over what is generally available, I suspect that it would have returned something much closer to what you expected.
One of the chief drawbacks to ChatGPT, at least as I understand it, is that it simply looks at all the information - both correct and incorrect - and tries to provide an answer that weights all opinions. It does NOT yet have the ability to evaluate logically ideas against data and to put together a thesis that is based on facts but that runs contrary to widely established opinions.
I think what's most impressive about ChatGPT is not its current capabilities, but its momentum. Just a few years ago the idea of an AI taking an IQ-test or an SAT was almost laughable. AI experts predicted that this level of capabilities wouldn't be achieved until 2030s and general public considered even those predictions too optimistic.
Just a couple of years ago GPT-3 was mostly being compared to 7 year old kids. Today you are comparing ChatGPT to a college student.
Thank you @bryancaplan. I wonder where is the really thoughtful, long term, consideration of the impacts (ethical/moral/academic/professional) this software brings to our world... and no, I'm not a gray goo adherent. But I am curious.
There are many thoughtful, long term, considerations of the impacts this software brings to our world being offered up by academics, intellectuals, journalists, etc. And those considerations are having precisely zero impact on the accelerating pace of AI development and deployment. The djinn is coming out of the bottle whether we're ready or not.
Thank you for trying this -- that is a very useful contribution!
It does seem like there is a bigger-picture point though, in that this is software available to the general public interpreting a free-format natural language economics exam and writing essay-style answers that are mostly coherent -- an earthshaking development compared to the state of the art just three years ago. It seems a bit like critiquing the ballet-dancing bear's pointe technique and docking 2 points for performance while grudging acknowledging that the choreography and presentation are passable, while others observers are going "Holy hot sauce, that bear is doing ballet!"
Since different ChatGPT prompts result in different answers, don't you need to tell us your exact inputs that produced these answers? Further, isn't it possible that some prompts could result in significantly better performance, such as telling it to respond like an economist or economic student who is taking a test? Given what I have seen elsewhere with attempts to improve outputs, it's highly likely there is more optimization that you could do to improve the test score
I don't know if this is idiosyncratic to me or not, but I find the way Brian writes questions confusing. Consider the below snippet:
"T, F, and Explain: Krugman argues that such employment loss is a market failure that justifies government regulation."
I take it from context that Krugman *does* in fact argue this and the question is not "can you recapitulate the content of Krugman's argument?" but rather "is the content of this argument true?". I think a capable student will get there, but given that testing is generally pretty stressful anyway, if I were a student, I would be a LOT happier if the question was: "Krugman argues {x}. Is {x} actually true?".
Not a Krugman fan, but I think we should all be able to agree it would be awfully arrogant of Caplan or any other econ prof to teach his students as FACT that the assertions of a living Nobel prize-winning economist are FALSE, full stop. I don't think that's what he's doing. He's asking if Krugman said that and to then, if true, explain why, and if false, explain Krugman's actual opinion on the matter.
My mistake! Thanks for helping to clarify.
I had the same impression you did initially about the Krugman question, although I eventually figured it out. It's possible I wouldn't have been at all confused if I'd taken the class, but I agree that the question could have been worded better.
Years ago I watched the Jeopardy! with the IBM AI Watson. It "won" in that it regurgitated answers faster than the human contestants and dominated the board. In Final Jeopardy is answered to the category "U.S. Cities" was "Toronto". It was so far ahead that it didn't matter, but it exhibited a habit you occasionally see with AI. Occasional gross and obvious errors no human would make.
The big thing they wanted it to do was medical industry, but it didn't work. You can't make errors like that in medicine.
At the same time, AI does appear to be good at producing "mediocre work for very cheap." I think someone in the translation industry noted that not great translation for 90% less of the price is usually "good enough" for most customers. If what you want isn't sensitive to these kind of big dumbfounding errors from time to time, it might matter.
Early guns weren't as good as bows, but they were a cheaper weapons system.
Basically, AI can replace mediocore and fairly unimportant work, of which we still have a lot of.
ChatGPT does this too. I asked it for the 10 largest companies headquartered in Silicon Valley and it gave me 9 correct ones plus Amazon. I told it Amazon was in Seattle. It said, yes that was a mistake and then made a new list but put Intel in place of Amazon thereby listing Intel twice.
I agree that ChatGPT is very bad about making lists with oversights that it will immediately agree are oversights, which seems like very low-hanging fruit. It seems if it took a little more time to answer, it could easily do much better. But also it's free to play with right now and I suppose OpenAI doesn't want to spend too much compute on it.
A more important question may be: if ChatGPT had transcripts of your class lectures, and were told to refer to those in answering these questions, what grade would it get then? Based on my use of AI, I suspect it would do very well if given the same materials students are given.
I also suspect that most college students who had *not* taken your class, but were reliant on ChatGPT's database to answer these questions, would also score very poorly.
The reason that's important is because then it's just a matter of feeding the right info to ChatGPT -- its ability to use it well is already mostly there.
From a Turing Test perspective, these are good answers--much, much higher quality than you would get asking a random college graduate who had not taken your class. Additionally, simply feeding the questions on the test to ChatGPT is not a fair comparison to your students. Your students probably had a lot of additional context as to what level of detail to go into when answering questions and what sorts of things it is important to mention in answers in order to get a good grade. I expect that if you provided that sort of context in the prompt (and especially if you used a few different prompts and selected the best answer provided) that the bot would get a substantially better grade.
Another piece of evidence.
I gave ChatGTP the final exam of my strategic management course and and asked a colleague (who teaches the same course and gave his students the same exam) to grade it without mentioning that they were ChatGTP answers.
The outcome: ChatGPT performed comfortably above average, both in the multiple choice questions and in the open ended ones.
Also, there was a lot of variability. Some answers were excellent while others were considerably subpar.
As people here comment, it is actually amazing. I have already used chat-gpt a lot and still had no idea it could even grasp these complicated economy questions.
I’m pretty much amazed that the answers were as good as they were. Getting a “D” on this test for a class it didn’t take with nothing to go on but the questions? Crazy.
I also feel like we might be under appreciating the strengths. For example, you didn’t say how many extra points for making the answers rhyme. (If you say zero, I won’t believe anything else you say either.)
Re: 1. Suppose a country’s workplaces start to experience periodic blackouts.
T, F, and Explain: Aggregate Labor Demand will clearly fall even though some workers will benefit.
True, it's quite plain
If blackouts cause businesses to refrain
From hiring workers, then it's plain to see
That Aggregate Labor Demand will surely be
Affected by the blackouts, even though some workers may
Find themselves with more hours to play
But if businesses suffer, then overall demand will shrink
And fewer workers will be hired, it's not hard to think.
So the answer is true, blackouts can cause a fall
In Aggregate Labor Demand, after all.
Or what about acting out the answer with a scene from Friends?
INT. MONICA AND CHANDLER'S APARTMENT - DAY
Monica is pacing back and forth, looking worried. Chandler is sitting on the couch, flipping through channels on the TV.
MONICA
I can't believe it. All these blackouts at the restaurant are really hurting business.
CHANDLER
Yeah, it's been rough. But at least we still have our jobs.
MONICA
True, but what about all the other workers? If business keeps falling, they might have to let people go.
CHANDLER
(pauses TV) Wait, you mean Aggregate Labor Demand could fall because of these blackouts?
MONICA
Exactly. Even though some workers might benefit from the blackouts by getting more hours, overall demand for labor could drop if businesses can't operate normally.
CHANDLER
Wow, I never thought about it that way.
MONICA
Exactly. It's important to think about how economic events can affect the whole picture, not just one group.
CHANDLER
(nods) You're right. Good thing we have each other to help weather the storm.
MONICA
(smiles) Always.
They hug as the scene fades to black.
I was surprised that you gave it any points at all for question 2 (the kind of BS paraphrase of the question that you can usually do in a subject you know nothing about), and also that you didn't give more points for question 4 (nothing in the T/F statement itself suggested to me that you'd expect me to restate the transparent meaning of the final part of the Landsburg quote).
What this perhaps shows is that ChatGPT has been trained with material from textbooks and other sources that do not reflect GMU's economics department curriculum. Had it been trained with, say, transcripts of Caplan's lectures, it most likely would have achieved a higher score.
When I wrote the above comment, I hadn't looked at ChatGPT, but commented based on my general knowledge of neural networks. Now that I've seen Stephen Wolfram's explanation of it (https://www.youtube.com/watch?v=zLnhg9kir3Q), I think that even if it were trained with lecture transcripts, it would not have done much better. OTOH, something more along the lines of an IBM Watson, if given lectures, economics textbooks, etc., could possibly get a better grade on an exam.
Sure, but four years ago the best AI probably would have gotten a 0. And I'd be willing to bet even money that within 5 years, the best publicly available AI can get a B or better on tests of this sort (I'll let you adjudicate). Interested?
That AI has gotten better in five years does not imply that it will be enhanced in the future.
Perhaps this is the best it gets. As humans cobble together more coherent AI systems these systems will appear to be better, but articulation is not a sign of a rise in intelligence, in the same way that the bullet train of today doesn't infer that the steam locomotive was inferior.
When I was young they came out with the first version of a portable answering machine, a clunky piece of electronics about the size of a cereal box, the device utilized two magnetic cassette tapes, one tape to record a greeting and the other tape to store messages. The machine efficiently answered the phone, and allowed the caller to leave a message which could be retrieved later by the phone owner.
Today, we have digital voice mail. The technology underlying is unquestionably more sophisticated, and yet the functionality is basically the same. The automation of a call and the ability to leave a message.
The locomotive and a bullet train both travel on rails. If your sole purpose is to get from point A to point B without regard to time, then either option is adequate.
My observation, particularly in regard to ChatGPT, is this is a new stage of technology or just another iteration, a faster train, a better answering machine. I'm skeptical of hype, even more so when it comes from high places.
Here's my heuristic, which has served me well:
There has only been one industrial revolution, and maybe there will be a "singularity" that is just as transformative or even more transformative, but probably not.
Technological progress is generally plateauing. Though there are still transformative technologies (like the original answering machine) and there are incremental technological improvements (like the addition of automatic transcription to voicemails). Transformative technologies are much more impactful than incremental technologies, but transformative technologies are still generally plateauing as well, as the industrial revolution plays itself out.
I'm inclined to think AI applications of this sort will prove to be a transformative technology. Because of the plateau, they will still prove to be less impactful than, say, the Internet has been up to this point, but much more impactful than the incremental improvements in Internet search from 2000 - 2023 have been.
It might have gotten a D but when you combine with enough context your student might have, it will definitely do better than a D.
AIs are only as good as the training material. Train it on A material, you will get A answers. Train it on Wikipedia, you will get garbage on anything that is slightly or more political.