You'll understand as a writer, if you blame the audience when your heavy work is lightly received.. you'll only continue to be lightly received (like a true academic).
At a glance:
- the files and tabs are overly dispersed for comparability
- I can't see a reference table or comments to explain the headers?
- there are constant numbers hidden in the formulas, e.g. -9500 for taxable income.. this doesn't just add work for the user, but means your work can't be updated with financial years..
I commend your rigour... there's usually not much behind the curtain.. but as you've found, few will bother to reverse engineer a mammoth spreadsheet before they're engaged by the claims.
Idea: You worked with someone to turn your writing into digestible comic book form.
..Why not work with someone to turn your spreadsheet results into an infographic?
SQL is is so easy to follow, and everyone knows exactly what math one is doing. I suspect Python is so common because idiots like using libraries they do not understand. I have had this repeatedly proven to me when I see people using continuous distributions for discrete variables or utterly ridiculous mixtures of functions that should never be used with each other (I have had to teach supposed experts in analytics that the log-normal distribution exists, and even that is better than these masters' program graduates who do not even understand what a probability distribution is).
I mean, the point of the months was finding the errors so that readers wouldn't, right? And there are returns to being the sort of person who does the math before staking too hard on a position.
Exactly. You do the math to verify your hypothesis and to convince others that your hypothesis is sound. If you get those spreadsheets wrong, given a somewhat controversial position and a researcher with an opposing viewpoint finds the errors, you'll be sorry.
Sadly, Bryan's take on this is 180 degrees from where it should be. I understand the frustration having done the same thing many times in (as someone else mentions on this thread) a finance setting, but if I hadn't done the math the people who didn't bother to look at the math, wouldn't have bothered to look at the conclusions.
If you hadn't included the spreadsheets people might have called BS, but there's something very convincing about tables of numbers. We think 'he did the numbers so he must know what he's doing'. Being able to call on hard data during debate always sets people apart. Friedman and Sowell always had the facts at their fingertips, and you just accept that they have their numbers right.
Agreed. The trouble is that code is deterministic by default, while spreadsheets are non-deterministic by default.
Code starts at the first line and runs it and every line after in order until the end. If the code "jumps around" via function calls (or heaven forbid, goto lines), that path can always be traced by default.
With a spreadsheet, the default is that one doesn't known the order of operations. Did the author start in cell A1 and then proceed down? Or start in a different cell and proceed horizontally? Which operations were done in which order? One can guess but it isn't encoded by default.
A good spreadsheet author can make order of operations clear, but the default is that they aren't clear, and that results in many poorly constructed spreadsheets that are very difficult to try to parse through.
Haskell is famous for being a language where the compiler figures out the order of evaluation by itself. Nd I find it very readable.
Determinism is a completely separate issue from order of evaluation. Spreadsheets are mostly deterministic: any admissable order of evaluation will lead to the same result.
However, I agree that making sense of a spreadsheet is hard, and that even figuring out the best order to read them in is hard.
If it makes you feel any better, I really appreciated your spreadsheets. I spent so much time talking with you in the run up to publication that I almost didn't bother to buy the book because hell, I knew the argument forwards and backwards anyway. I did wind up buying a copy, and goddamn... the quantitative section was shocking. If anything you were soft pedaling the case in the early parts of the book, (or I was mentally correcting for my own biases in wanting the argument to be true.) I don't think a halfway numerically literate person could look at the math and not come away thinking you were either right or at least extremely clever in your error. Only prior commitments, ideological or otherwise, could keep someone from saying "We screwed up."
Which is, I suspect, why no one ever addresses it. Calling any attention to the math is analogous to smashing your forces attacking the strongest part of the castle; better to avoid it if at all possible. Only people interested in truth would care, and I don't think there are many of those in academia.
It seems like the people that would most care about it are educators and they may have assumed that every time they raised an objection, it would give your argument free publicity so it would be better to just ignore it.
You see something very similar in investment banking: employees spend a lot of time building super complex spreadsheets which aim to justify the value put on companies to be acquired or sold, but none of the principal decision makers pay them much heed.
Charles Murray has had the same experience as you since he first started publishing quantitative social science in 1984's "Losing Ground" through 1994's "Bell Curve," 2003's "Human Accomplishment," 2012's "Coming Apart," 2020's "Human Diversity," and 2021's "Facing Reality." So, yeah you wasted a year. Murray's "wasted" a career and a lifetime.
Ultimately, people don't have the expertise or the time to do their own peer review on every new idea. So as a heuristic we see that you are credentialled and take that as a substitute for expertise. (Ironically, the very problem The Case Against Education rails against!)
The real answer is that it's embarrassing you think you're doing science. Go read Gelman and Pearl. Your excel regressions and "models" are as relevant to the real world as the fluid dynamics equations for an oil pipeline with variables renamed would be.
O.G. Science / Academic thinking was about status via discovering truth.
Modern Academics are large institutions and large institutions are always (minus some predictable corner cases) status growth via box-checking and politics.
The belief that Truth matters lots is something we middle-class geeky 80s-teens assume, but which doesn't really appear to be true outside of small (handwave: Sub-dunbar) groups or "bubbles"
I think the reason that nobody cared about your math is that you were answering a question that people weren’t asking. Nobody decides whether to go to college or not based on the ROI.
If, on the other hand, you developed the Bryan Caplan College Rankings, and told me, given an audience of smart people who are definitely going to college, despite finding your argument intriguing, which college is the best to go to? Then people would care a lot about your spreadsheets!
Tell someone Harvard is worse than independent study, whatever it’s not really offensive. Tell them that actually Harvard is worse than Princeton, that might actually irritate people.
I cared! I cared! I think many (most?) may have assumed you knew what you were doing and the tables/figures were correct and bolstered your points. Others looked more closely at what you presented and were convinced and found no reason to challenge you. If your figures or other statistics were not presented, your arguments would not have been as convincing (at least not to me). I don't think I would have enjoyed the book nearly as much.
Thomas Sowell's book 'Charter Schools and their Enemies' also has tons of tables, which again I found compelling and bolstered his arguments even if I didn't study them extensively.
In the days of Lotus 123, someone wrote that every complex spreadsheet contains an error. My career has been an exploration of that theme. But I've developed a set of practices that have helped me reduce errors.
Bryan, would you please share your practices for reducing/eliminating errors from spreadsheets?
You'll understand as a writer, if you blame the audience when your heavy work is lightly received.. you'll only continue to be lightly received (like a true academic).
At a glance:
- the files and tabs are overly dispersed for comparability
- I can't see a reference table or comments to explain the headers?
- there are constant numbers hidden in the formulas, e.g. -9500 for taxable income.. this doesn't just add work for the user, but means your work can't be updated with financial years..
I commend your rigour... there's usually not much behind the curtain.. but as you've found, few will bother to reverse engineer a mammoth spreadsheet before they're engaged by the claims.
Idea: You worked with someone to turn your writing into digestible comic book form.
..Why not work with someone to turn your spreadsheet results into an infographic?
By the way, whoever suggested using Python only want you to share their pain. A crazy idea
SQL is is so easy to follow, and everyone knows exactly what math one is doing. I suspect Python is so common because idiots like using libraries they do not understand. I have had this repeatedly proven to me when I see people using continuous distributions for discrete variables or utterly ridiculous mixtures of functions that should never be used with each other (I have had to teach supposed experts in analytics that the log-normal distribution exists, and even that is better than these masters' program graduates who do not even understand what a probability distribution is).
I mean, the point of the months was finding the errors so that readers wouldn't, right? And there are returns to being the sort of person who does the math before staking too hard on a position.
Exactly. You do the math to verify your hypothesis and to convince others that your hypothesis is sound. If you get those spreadsheets wrong, given a somewhat controversial position and a researcher with an opposing viewpoint finds the errors, you'll be sorry.
Sadly, Bryan's take on this is 180 degrees from where it should be. I understand the frustration having done the same thing many times in (as someone else mentions on this thread) a finance setting, but if I hadn't done the math the people who didn't bother to look at the math, wouldn't have bothered to look at the conclusions.
If you hadn't included the spreadsheets people might have called BS, but there's something very convincing about tables of numbers. We think 'he did the numbers so he must know what he's doing'. Being able to call on hard data during debate always sets people apart. Friedman and Sowell always had the facts at their fingertips, and you just accept that they have their numbers right.
Spreadsheets are approximately a write-only programming language. They are almost impossible to review, and doing so is extremely tedious.
That's one big reason why programmers hate them.
If the calculation has instead be done in eg Python as suggested and the source uploaded to eg GitHub, more people would have checked them out.
Agreed. The trouble is that code is deterministic by default, while spreadsheets are non-deterministic by default.
Code starts at the first line and runs it and every line after in order until the end. If the code "jumps around" via function calls (or heaven forbid, goto lines), that path can always be traced by default.
With a spreadsheet, the default is that one doesn't known the order of operations. Did the author start in cell A1 and then proceed down? Or start in a different cell and proceed horizontally? Which operations were done in which order? One can guess but it isn't encoded by default.
A good spreadsheet author can make order of operations clear, but the default is that they aren't clear, and that results in many poorly constructed spreadsheets that are very difficult to try to parse through.
Order of evaluation isn't really a problem.
Haskell is famous for being a language where the compiler figures out the order of evaluation by itself. Nd I find it very readable.
Determinism is a completely separate issue from order of evaluation. Spreadsheets are mostly deterministic: any admissable order of evaluation will lead to the same result.
However, I agree that making sense of a spreadsheet is hard, and that even figuring out the best order to read them in is hard.
If it makes you feel any better, I really appreciated your spreadsheets. I spent so much time talking with you in the run up to publication that I almost didn't bother to buy the book because hell, I knew the argument forwards and backwards anyway. I did wind up buying a copy, and goddamn... the quantitative section was shocking. If anything you were soft pedaling the case in the early parts of the book, (or I was mentally correcting for my own biases in wanting the argument to be true.) I don't think a halfway numerically literate person could look at the math and not come away thinking you were either right or at least extremely clever in your error. Only prior commitments, ideological or otherwise, could keep someone from saying "We screwed up."
Which is, I suspect, why no one ever addresses it. Calling any attention to the math is analogous to smashing your forces attacking the strongest part of the castle; better to avoid it if at all possible. Only people interested in truth would care, and I don't think there are many of those in academia.
It seems like the people that would most care about it are educators and they may have assumed that every time they raised an objection, it would give your argument free publicity so it would be better to just ignore it.
You see something very similar in investment banking: employees spend a lot of time building super complex spreadsheets which aim to justify the value put on companies to be acquired or sold, but none of the principal decision makers pay them much heed.
Charles Murray has had the same experience as you since he first started publishing quantitative social science in 1984's "Losing Ground" through 1994's "Bell Curve," 2003's "Human Accomplishment," 2012's "Coming Apart," 2020's "Human Diversity," and 2021's "Facing Reality." So, yeah you wasted a year. Murray's "wasted" a career and a lifetime.
Ultimately, people don't have the expertise or the time to do their own peer review on every new idea. So as a heuristic we see that you are credentialled and take that as a substitute for expertise. (Ironically, the very problem The Case Against Education rails against!)
The real answer is that it's embarrassing you think you're doing science. Go read Gelman and Pearl. Your excel regressions and "models" are as relevant to the real world as the fluid dynamics equations for an oil pipeline with variables renamed would be.
Challenging arguments with words is easy and pleasurable. Double-checking somebody else's maths is hard and tedious.
Figures don't lie, but liars sure can figure.
I just made that up. Good, huh?
I've gone pretty Hansonian here.
O.G. Science / Academic thinking was about status via discovering truth.
Modern Academics are large institutions and large institutions are always (minus some predictable corner cases) status growth via box-checking and politics.
The belief that Truth matters lots is something we middle-class geeky 80s-teens assume, but which doesn't really appear to be true outside of small (handwave: Sub-dunbar) groups or "bubbles"
I think the reason that nobody cared about your math is that you were answering a question that people weren’t asking. Nobody decides whether to go to college or not based on the ROI.
If, on the other hand, you developed the Bryan Caplan College Rankings, and told me, given an audience of smart people who are definitely going to college, despite finding your argument intriguing, which college is the best to go to? Then people would care a lot about your spreadsheets!
Tell someone Harvard is worse than independent study, whatever it’s not really offensive. Tell them that actually Harvard is worse than Princeton, that might actually irritate people.
I cared! I cared! I think many (most?) may have assumed you knew what you were doing and the tables/figures were correct and bolstered your points. Others looked more closely at what you presented and were convinced and found no reason to challenge you. If your figures or other statistics were not presented, your arguments would not have been as convincing (at least not to me). I don't think I would have enjoyed the book nearly as much.
Thomas Sowell's book 'Charter Schools and their Enemies' also has tons of tables, which again I found compelling and bolstered his arguments even if I didn't study them extensively.
Highly recommend both books!
From a spreadsheet guy:
In the days of Lotus 123, someone wrote that every complex spreadsheet contains an error. My career has been an exploration of that theme. But I've developed a set of practices that have helped me reduce errors.
Bryan, would you please share your practices for reducing/eliminating errors from spreadsheets?
thanks.