Data Is Not Expertise: Polling Edition
I just want you all to know that I originally wrote “Polling Addition” and it took me a good ten minutes to figure out why it looked wrong.
One of the things that bugs me about a lot of algorithmic technology is that it is often used as an excuse to push expertise to the side. The recent story about how Nevada used an algorithm to cut the number of at-risk kids in its schools by two-thirds is a good example of this trend. The people who worked with these kids knew that the result was garbage and had good, fair, and understandable reasons why that was so. I bring this up because I saw a piece by Nate Cohen, the New York Times’ poll guru, that made me think that at least some pollsters are doing something similar from the other direction.
Cohen’s article made a couple of things very clear. First, pollsters are likely over-correcting for their misses in 2016 and 2020 and under correcting for their misses in 2021 and 2022. Apparently, pollsters were “traumatized” by missing trump voters before 2021 and pretty okay with missing Democratic voters after 2020. Second, what they call polling isn’t — they are modeling. Pollsters are taking their raw numbers and adjusting them to fit the electorate they have in their collective heads.
You could argue, reading the above, that pollsters are not actually replacing expertise with data — that they are letting their expertise guide their data. I can see the argument, but I do not think that is quite what is going on here. Cohen in his article states that pollsters simply do not believe more democratic results and that some of them adjust those results away. I don’t think that is substituting expertise for data. I think pollsters are being driven by their misses in the last two presidential elections and are trying to make 2024’s data look more like 2016 and 2020’s data. They are, in my opinion, substituting old data for current data. It is, I admit, a subtle point, but one I believe is valid.
Cohen himself is a great example of this. He says in the article that he would not believe a +7 result for Harris in PA. That quote gives away the game. A +7 result for Harris is a potentially reasonable result. Why? The Dobbs decision has angered a lot of women. Trump underperformed his polling in the primary and it is reasonable to assume that some percentage of those people will vote against him or not vote for him. There has been an upsurge in female and younger voter registrations since Dobbs and a second one since Harris became the nominee. And, as mentioned, polls have consistently underestimated Democrats since Dobbs. Now, there are counter arguments to each of those points, but Cohen doesn’t make them. His premise is simply that good numbers for Harris must be wrong because good numbers for Democratics in 2016 and 2020 were wrong.
That is not applying expertise — that is abandoning expertise in favor of old data. Reflexively leaning on past data, past performance, is, in my view, merely a version of the data replacing expertise problem. Pollsters as a class are so wedded to their numbers that when their models fail them, they insist that the future must look like the past. In doing so, they are using data — past results — as a crutch for not dealing with the potential reality their current data is trying to help them see. Instead of their expertise being informed by up-to-date data, it is being replaced by four-year-old data.
I am not an enemy of data driven decisions. But it is important to remember that data is never the whole story. It is a map to the territory, but maps are not the territory. You must be willing to have your expertise informed by data, but you must also not allow your data to overwhelm your expertise, to substitute for it. There is no wisdom in numbers by themselves. Wisdom only comes from using numbers to inform your thinking. Anything else is just an abdication of your responsibility as a thinking person.
Today is election day in the US. If you haven’t voted, please do so. A lot is riding on making the right choice.

