The Atlantic and The New Yorker each have good pieces recently on the complexities of gathering data. The Atlantic’s is by the founders of the Covid Tracking Project about all the work and subtleties that went into tracking even basic metrics about the pandemic:
Data might seem like an overly technical obsession, an oddly nerdy scapegoat on which to hang the deaths of half a million Americans. But data are how our leaders apprehend reality. In a sense, data are the federal government’s reality. As a gap opened between the data that leaders imagined should exist and the data that actually did exist, it swallowed the country’s pandemic planning and response.
The New Yorker’s is an essay about new books. At points it threatens to fall into the genre of criticizing data only to recommend some sort of hazy intuitionism. But, despite the headline “What data can’t do”, it ultimately gets it completely right:
But to recognize the limitations of a data-driven view of reality is not to downplay its might. It’s possible for two things to be true: for numbers to come up short before the nuances of reality, while also being the most powerful instrument we have when it comes to understanding that reality.
For all the difficulty and subjectivity of defining, collecting, and analyzing data, it sure seems to help.
It’s hard to give evidence for that statement without being circular, but here’s some anyway:
- Analytics appears to have significantly changed lots of industries. Sports is one notable example; basketball is just played a lot differently today than a decade ago, in large part because of descriptive and correlational data. Ditto baseball.
- Firms that report using data more tend to perform better than those that report using it less.
- Simple statistical algorithms outperform people in a wide variety of contexts.
- The best human forecasters often start their process by finding a “base rate,” essentially a rough numerical estimate of how often something generally does or doesn’t occur, before delving into the often more qualitative specifics of a forecast question. (Update: Here’s an experienced, reliable forecaster walking through her method.)
And then there’s this 2020 paper on “The Value of Descriptive Analytics”:
Does the adoption of descriptive analytics impact online retailer performance, and if so, how? We use the synthetic control method to analyze the staggered adoption of a retail analytics dashboard by more than 1,000 e-commerce websites, and find an increase of 13–20% in average weekly revenues post-adoption.
Data helps, even in plenty of cases when it has problems or contains spurious correlations or cannot on its own support formal causal inferences. But why?
Why does it help to be in possession of a correlation that itself can’t support any particular causal inference? The answer, I think, is that the alternative is even more flawed. Humans get really creative in our reasoning, especially in order to defend our preexisting beliefs and identity. Quantification anchors you to something, and though it’s biased and reductive, it still makes it harder for your reasoning to go astray.