More on why data helps

I’ve written before about the puzzle of why data helps. Why do even really basic, descriptive analytics seem anecdotally to be so useful?

First, just one more time evidence that data sure seems to help. From a paper in Nature about forecasting social change:

We compared forecasting approaches relying on (1) no data modelling (but possible consideration of theories), (2) pure data modelling (but no consideration of subject matter theories) and (3) hybrid approaches. Roughly half of the teams relied on data-based modelling as a basis for their forecasts, whereas the other half of the teams in each tournament relied only on their intuitions or theoretical considerations… Forecasts that considered historical data as part of the forecast modelling were more accurate than models that did not… There were no domains where data-free models were more accurate than data-inclusive models.

(Amazingly the data-only models did even better than hybrid models.)

By contrast:

The results from two forecasting tournaments conducted during the first year of the COVID-19 pandemic show that for most domains, social scientists’ predictions were no better than those from a sample of the (non-specialist) general public.

The only time the experts did better than the public? When they used data:

Which strategies and team characteristics were associated with more effective forecasts? One defining feature of more effective forecasters was that they relied on prior data rather than theory alone. This observation fits with prior studies on the performance of algorithmic versus intuitive human judgements21. Social scientists who relied on prior data also performed better than lay crowds and were overrepresented among the winning teams

Why does data work? Why does quantifying seem to be so useful?

Here’s a totally separate study at Voxeu that compares stories to data and illustrates a key driver of the systematic biases that drive human judgment awry: memory.

To examine the belief impact of stories versus statistics, we conducted controlled online experiments. The key idea of these experiments is to compare the immediate belief impact of stories and statistics to the belief impact after some delay, to isolate the role of memory. Participants in our experiment were informed that hypothetical products received a number of reviews. The task of participants was to guess whether a randomly selected review is positive. Before stating their guess, participants either received news in the form of a statistic, a story, or no information. We conceptualise statistics as abstract summaries of multiple data points (multiple reviews). Stories, by contrast, contain one datapoint (one review), but in addition provide contextual qualitative information about the review. Each participant saw three different product scenarios across which they were presented with one story, one statistic, and once no information. Crucial to our experimental design was that we elicited beliefs from participants twice, once immediately after they received the information and once following a delay of one day. This allows us to track the belief impact of stories versus statistics over time…

…both stories and statistics have an immediate effect on beliefs. On average, subjects immediately adjust their beliefs by about 20 percentage points for statistics, and by about 18 percentage points for stories. This pattern, however, looks markedly different after a one-day delay. While there remains a substantial belief impact for stories (about 12 percentage points), the belief impact of statistics drops to about five percentage points. In other words, we document a pronounced story-statistic gap in the evolution of beliefs. While the impact of statistics on beliefs decays rapidly, stories have a more persistent effect on beliefs. Using recall data, we confirm that the reason for this dynamic pattern is that stories are more easily retrieved than statistics.

As I wrote in my summary of behavioral economics, “We rely heavily on the information we can easily recall.” Memory gives us a biased view based on the stories we can most easily recall. But what comes easily to mind may have little to do with what’s actually going on: we’re misled by what’s most unusual or extreme or striking. Data works because it focuses us on what usually happens, not what is most memorable, and so has a de-biasing effect.

The amazing thing is that our judgment is so poor that a lot of the time we can’t do any better than just totally deferring to a super basic, content-free extrapolation from that data. Quantification has its own problems, of course. It helps not because it’s so great but because of how limited human reason can be.

Forecasting on INFER

Last year I had the opportunity to be a “Pro” forecaster on INFER, the crowd forecasting tournament run by Cultivate Labs and the University of Maryland (formerly Foretell of Georgetown’s CSET). Basically you get a small stipend to participate each month. It was fun and I recommend it!

Ultimately, I decided not to keep going as a Pro in 2023. I started Nonrival in August, which I’ll blog about one of these days, and since then my forecasting time has been focused on that project.

I still plan to collaborate with the INFER team in some other capacities (more on that soon too perhaps) but I won’t be paid to make forecasts there.

I’ve “exited” the three questions that I’d forecasted on and weren’t yet resolved, but I’ll still be scored eventually for the period of time I was active on them — so this assessment isn’t quite a true scoring of my time there, but it’s close. How’d I do?

  • In the 2022 season, 344 users made at least 5 forecasts (that resolved), and by default that’s the cutoff INFER uses on its leaderboard so I’ll use it here, too.
  • I finished 76th, with 8 questions resolved and scored. That puts me at the 78th percentile.
  • On the “all-time” leaderboard for INFER (which for me counts my two questions forecast in 2021) I’m 71st of 620, which puts me at the 89th percentile.
  • Lifetime on INFER, I’m better-than-median on 9 out of 10 questions (7 out of 8 for 2022 season), with my one blunder being a forecast of Intel’s earnings where I seemingly underrated the chance of an out-of-sample result.

Overall, my MO seems to be consistently being just a tiny bit better than the crowd. Not bad! But that leaves plenty of room for improvement. Some of it is I think I could do better by simply spending more time and updating more regularly on news and shifts by other forecasters.

But there’s also a “tenaciousness” that Tetlock describes when talking about the superforecasters that includes a willingness or even an eagerness to sift through details when necessary until you find what you need. I saw some of that with teammates during my year as a Pro. And that’s something I’ve not had the time or maybe the inclination for. I think I’ve done a pretty consistent job of avoiding the basic mistakes that lead to poor forecasts: I look for quantitative data, seek out multiple perspectives, I often blend my own judgment with others’, etc. But if I want to get to the next level I need to immerse myself more in the details of a topic, at least some of the time.

Past forecasting record posts here and here.

Code is not law

I’m fond of Lessig’s saying that “code is law” and I often mention it on the blog. But there’s a deeply distorted version of this idea cropping up in crypto lately and it’s worth distinguishing it from the original meme.

Lessig’s idea was that human behavior was affected by four types of “governance” including markets, laws, norms, and what he called “architecture.” Architecture (if I’m remembering the book correctly) encompassed stuff we built in the physical world that affected human behavior. If I build a speed bump, you might drive differently; if I build a skyscraper, it might affect your view or change your walk to work or what-have-you. The things we build impose certain constraints on us — they shift how we behave.

Lessig then argued that in the digital world, code was the architecture. You could make some things possible or impossible, easy or hard, through the way the software was built. Code became a form of digital governance, alongside markets, laws, and norms.

Compare that to the crypto-maximalist version of “code is law,” which holds that anything the code allows is fair game. Here, via Matt Levine, is the defense provided by a trader who allegedly rigged a crypto market in a way that clearly would not be allowed in any normal financial market:

I believe all of our actions were legal open market actions, using the protocol as designed, even if the development team did not fully anticipate all the consequences of setting parameters the way they are.

You see the logic here: If you wanted what I did to be illegal, why did you write the code to make it possible? This is code-is-law maximalism.

There’s a less maximalist, dorm-room version of this that you sometimes see in crypto that maybe deserves some consideration. This version doesn’t argue that anything the code allows is OK. But it does say we should rely more on code for our regulation. It wants code to play a bigger role in setting the rules, bringing us closer to a world where anything the code allows is OK — even if we’re not there yet and even if we never get all the way there. I’m OK with a bit of utopianism and so I don’t mind entertaining this as a thought experiment. But so far crypto has mostly served to show why anything the code allows is OK is not OK.

To see just how damaging this maximalism is, compare it to a totally different case:

The U.S. Supreme Court on Monday let Meta Platforms Inc’s (META.O) WhatsApp pursue a lawsuit accusing Israel’s NSO Group of exploiting a bug in the WhatsApp messaging app to install spy software allowing the surveillance of 1,400 people, including journalists, human rights activists and dissidents.

If you exploit a bug to do bad things, you can’t just hide behind anything the code allows is OK. In this case, we’re talking about the murky world of international affairs where law is less effective. No one thinks this is a good thing: the world of international espionage is much closer than other spheres to anything the code allows is OK and no person in their right mind would want to run the rest of an economy that way. Code is law maximalism forfeits three-fourths of the original code-as-law formulation: Governing human behavior is hard, and we need all the tools we can find. As much as code increasingly does govern our behavior, laws, and incentives, and norms are all still essential.

On falsification

From Richard McElreath’s textbook Statistical Rethinking, via Data Elixir:

…The above is a kind of folk Popperism, an informal philosophy of science common among scientists but not among philosophers of science. Science is not describe by the falsification standard, and Popper recognized that. In fact, deductive falsification is impossible in nearly every scientific context. In this section, I review two reasons for this impossibility: 1) Hypotheses are not models… 2) Measurement matters…

…For both of these reasons, deductive falsification never works. The scientific method cannot be reduced to a statistical procedure, and so our statistical methods should not pretend. Statistical evidence is part of the hot mess that is science, with all of its combat and egotism and mutual coercion. If you believe, as I do, that science does often work, then learning that it doesn’t work via falsification shouldn’t change your mind. But it might help you do better science. It might open your eyes to many legitimately useful functions of statistical golems…

…So if attempting to mimic falsification is not a generally useful approach to statistical methods, what are we to do? We are to model. Models can be made into testing procedures–all statistical tests are also models–but they can also be used to design, forecast, and argue…

Related: Strevens’ iron rule of explanation.

Notes on trade and globalization

I’ve been trying to revisit the arguments and evidence for global trade and trade liberalization recently. I want to post a few links so I don’t lose track of them.

Overall, I came away suspecting that the pro-trade side, which I’ve been sympathetic toward, is a bit overconfident relative to the evidence. But also that the anti-trade or trade-skeptical side has even less evidence to back it up. More liberal trade policies do seem, on net, economically positive — but with a lot of uncertainty around just how positive and when they might be less and more so. And, all things considered, trade seems slightly second-tier, behind say technology and good policy and political institutions and public health as a driver of prosperity. Plus, as left-leaning trade proponents have said forever, it’s incumbent on policymakers to put in place the complementary policies that make trade as positive for people as it can be.

That’s the TLDR. Here are some notes:

What are the arguments for trade?

I’m sticking just to the economic arguments here. Raghuram Rajan has a one paragraph summary in a recent Foreign Affairs piece that’s worth quoting (speaking in reverse about the losses from deglobalization):

“Deglobalization has many costs, some of which are already evident. They include the higher cost of goods and services as production no longer takes place in the most efficient locations, the loss of scale economies as production becomes fragmented, the increase in the power of domestic oligopolies as global competition is restrained, the decline of learning by doing as multinational corporations no longer spread best practices, and the rise in inflationary pressures as local supply-demand imbalances are no longer tempered by a global market.”

This paper runs through a very similar list: There is the traditional argument of comparative advantage, but also returns to scale, increased competition, more learning and therefore innovation, and more product variety.

How much does the US gain from trade?

There’s really not a satisfying answer. This paper tries to provide one but it’s powered by some very heroic theoretical assumptions about willingness to pay and elasticity. Basically, gains from trade are driven by how much you trade, and how easily you could find substitutes if that trade stopped. Fair enough, but a pretty simplified story. Nonetheless, the upshot: “Our analysis points towards welfare gains from trade ranging from 2 to 8 percent of GDP.” That’s meaningful! But it’s not everything, and it’s coming from the fairly trade-friendly assumptions of mainstream economics.

What about micro evidence?

If the estimate above didn’t do much to convince me, what about more traditional microeconomic evidence? For this, I read Our World In Data’s briefing on the subject which summarizes several papers but I did not read the papers myself. With that said, this type of evidence I find more convincing: It’s looking directly at the data on trade and trying to use econometrics to find plausibly causal relationships. Their upshot:

 “On the whole, the available evidence suggests trade liberalization does improve economic efficiency. This evidence comes from different political and economic contexts, and includes both micro and macro measures of efficiency.”

What about the China shock?

A lot of attention has been paid to a series of papers by David Autor and colleagues on the “China Shock” — basically the rapid increase of trade between the US and China. Those papers find concentrated job losses in a number of regions of the US. The Our World In Data briefing summarizes those papers, too.

But researchers at CSIS recently published a literature review of the various papers on the China shock, and I commissioned a shorter writeup for HBR. Here is their ultimate conclusion after comparing Autor’s results with two other research groups looking at similar questions with slightly different datasets:

So, what does a broader review of the data from multiple studies show? Scholars generally find that prior to 2010, imports from China negatively affected manufacturing jobs in the U.S. However, there are mixed findings on the net effect on the economy, the final balance of jobs lost in manufacturing, and the growth in service sector jobs. There is also no evidence of trade with China having a significant negative effect on jobs after 2010 — the job loss in manufacturing documented in the early 2000s due to trade with China is not continuing today. There is one other result that all scholars seem to agree on: better-educated, more economically diverse regions of the United States were affected far less by the surge in imports from China.

So very real job losses in some regions, but no clear evidence of net job loss for the US much less a net loss to the economy overall.

The Autor China Shock papers did overturn conventional wisdom — just not about the aggregate effects of trade. As his co-author Gordon Hanson writes in a different Foreign Affairs piece:

Our findings went against cherished economic frameworks, which predict that workers in struggling communities migrate in search of employment elsewhere and that new industries expand into downtrodden areas to take advantage of an idle labor pool. Neither type of recovery materialized. Overall, relatively small percentages of people left their communities, and businesses didn’t expand enough to absorb workers who had earlier lost their jobs. Economists still can’t explain why workers did not abandon regions in decline, but relationships may play a role. Moving can mean separating from family members, who care for children, provide support when times are tough, and offer a comforting social network.

The shock to conventional wisdom was how long-lasting and geographically concentrated the costs of trade were.

The upshot

As I said, I take all of this to be sort of a mixed bag for the conventional wisdom on trade. On the one hand, it really seems like we don’t totally know exactly how and how much the US economy has been affected by trade. I doubt we understand the variety of circumstances under which those effects could be larger or smaller. Caution is therefore in order. And the estimates of trade’s benefits to the US, such as they are, are large but not staggering. They’re a big deal but they’re not the thing that explains our overall level of prosperity at least according to the estimate I cited.

On the other hand… The arguments that trade helps an economy grow do make a lot of sense and do have considerable evidence behind them. At least on the economic merits it’s hard for me to come away from this review feeling skeptical about trade, except in the broader “It’s hard to know stuff for sure” sense of generic humility. Even the China shock, as persistently bad as it seems to have been for some parts of the country, was a very mixed bag that helped lots of people and probably grew the US economy somewhat.

The case for trade therefore seems fairly solid, provided it’s kept in perspective and made with some humility. And — as we’ve known forever — the public policy that surrounds it really matters. There’s a lot the government can do to make things better or worse.

Technology adoption

Derek Thompson has a piece at The Atlantic on why America “doesn’t build what it invents.” It covers a lot of good ground. Here I just want to link to a few other things that I think speak to one piece of this topic.

Paul Romer testified in 2020 that the US was first a leader in technology (adoption) not science, then briefly led in both, and now leads just in science not in technology. (I wrote a two paragraph post.)

James Bessen at BU may be the most underrated chronicler of technology’s diffusion. His book “Learning By Doing” is all about that but his newer book “The New Goliaths” is a fascinating look at how today’s giant companies might prevent adoption of new technologies. Here’s an excerpt.

On the theme of complementary assets that you might need for technology to spread, here’s Raffaella Sadun writing about CEOs needing to have skills and knowledge specific to their firms. You might add this to Bessen’s story and think about a lack of managers with the right complementary knowledge to allow new firms to use technologies housed in the dominant firm.

And here’s a ProMarket piece on incentives to innovate as just one piece of innovation:

The creation of these mRNA vaccines tells two stories about encouraging innovation. The first story is familiar: how enormous incentives (which have made senior executives at Moderna, Pfizer, and BioNTech billionaires) can marshal capital and talent to achieve herculean feats (let’s call this story Innovation-as-Incentives). The second story is less discussed but just as critical: how innovation happens when public and private agents share knowledge and combine capabilities, supported throughout by government and non-profit institutions (let’s call this story Innovation-as-Capabilities).

Derek hits on a lot of crucial reasons why the US lags in technology adoption and they all make sense to me. In the links above I’ve been trying to expand on one of them, closest in Derek’s framework to the “incumbency bottleneck.”

It’s not just that we have too many large firms or not enough antitrust. It’s that we need to actively create an economy where knowledge spreads, and where innovation incentives combine with innovation capabilities. We need to make it possible and profitable to adopt a technology, start a new firm, tinker and improve something invented elsewhere. That’s an antitrust problem, a management problem, an industrial policy problem, a labor market problem, an education problem, and more.

AI governance is governance

ChatGPT has kickstarted a bunch of discussion of how AI/ML will change the world. No argument there.

But reading some of it, I’m reminded of something that has bugged me in some of the discussions of AI “safety”. Take this bit, from Sam Hammond, via Marginal Revolution:

ordinary people will have more capabilities than a CIA agent does today. You’ll be able to listen in on a conversation in an apartment across the street using the sound vibrations off a chip bag. You’ll be able to replace your face and voice with those of someone else in real time, allowing anyone to socially engineer their way into anything.

A natural follow up to these concerns: Will these things be legal? If they’re not legal, will they be monitored? Will there be enforcement? Will there be strong norms against them?

(To its credit Hammond’s post does describe the interaction of law and AI. My issue is less with the post than the discussion surrounding ChatGPT.)

AI raises a ton of legitimately new and interesting questions about how we’ll interact. The “AI safety” conversation largely aims to deal with those questions technically. Can you design an AI that we know for sure won’t do X, Y, or Z? That’s good, important work: code is law and all that. Technical fixes are one major, legitimate way to regulate things.

But Lessig puts code (“architecture”) alongside markets, norms, and laws in his discussion of code-as-law.

And most of AI governance is about those other three. What will we incentivize? What will we stigmatize and valorize? What will be legal? Those are boring governance questions and most of what changes with AI will involve those things, not engineering different sorts of reward functions.

New AI capabilities are going to raise new questions of human governance at a time when we’re seemingly getting worse at governing ourselves. There’s no technical fix to that challenge.

Editing as humility

A colleague of mine once called editing “a helping profession.” It’s a nice idea that speaks to how different the craft of editing actually is from how people imagine it. There’s a stereotype of the dictatorial editor, assigning stories they want, rejecting others, and creating a whole publication in their image. Maybe somewhere that exists but it’s not been my experience. Most editing is about trying to make someone else’s work better, and I want to share a bit here on how I came to embrace that.

When I showed up at Harvard Business Review as an associate editor in 2013 I was hungry for bylines: all I wanted to do was to write. Partly, I didn’t think much of editing since the internet made it so easy to publish. Why edit when you could, as Clay Shirky put it, “Publish then filter”? And partly I didn’t see a career path: I thought being “out there” with bylines and takes was the way to build a career in digital media. My thoughts on both of those things changed gradually. I came to appreciate the importance of editing and I found that my career was progressing just fine.

But more than that I came to see editing as a form of humility. This is perhaps tied to the kind of work I learned at HBR: editing experts, many of whom didn’t write for the public very often. Editing was a way for me to help really smart, knowledgeable people think and write even better. There was something healthy for me in doing that instead of trying to prove that mine was the smartest take–even though frankly it’s not something I would have sought out. I was brimming with overconfidence, but my work got to be questioning and tinkering and quibbling to help someone else who knew much more than I did.

I still enjoy writing but, as someone who dreamed of being a columnist, I’ve come to be thankful that I learned to be an editor instead.

Moneyball

Derek Thompson has a great Atlantic piece about how the Moneyball-ization of everything has changed culture and sports. The thesis is that analytics push homogenization. I write about data stuff so I should have something thoughtful to say about that but instead I want to veer outside my normal lane and register a basketball take: analytics made the NBA way better.

Here’s Derek:

When universal smarts lead to universal strategies, it can lead to a more homogenous product. Take the NBA. When every basketball team wakes up to the calculation that three points is 50 percent more than two points, you get a league-wide blitz of three-point shooting to take advantage of the discrepancy. Before the 2011–12 season, the league as a whole had never averaged more than 20 three-point-shot attempts per game. This year, no team is attempting fewer than 25 threes per game; four teams are attempting more than 40.

This trend is chronicled in the excellent, chart-filled book Sprawlball which also tends to see it negatively.

But when I started watching the NBA in the ’90s it was way less interesting. It was the just-barely post-Jordan era and every wing just did a lot of iso- 1-on-1 Jordan-imitation stuff. Centers did the post-up equivalent. There was not that much ball movement.

The discovery that 3-point shots were extremely valuable changed all that. When I started watching again a few years back after well over a decade away from the sport I was shocked by how much ball movement there was. The 3-pointer suddenly meant that getting a mediocre player a good shot from outside could be more valuable than just letting your best player go 1-on-1.

Yes, there’s some homogenization in that all teams shoot 3’s. Yes, the mid-range game has faded. And yes there’s a lot of pick-and-roll. But there’s still a good amount of diversity in the skillsets that set those 3’s up. Luka, Giannis, Jokic, and Morant are wildly different players. All of them anchor an offense that involves supporting players shooting 3’s. But the way they set them up is extremely varied and the end result is movement and passing and switching and double-teaming and just lots more excitement (and beauty) than the ’90s NBA.

Anyway, the overall Moneyball take seems right. But basketball got a lot better thanks to analytics.

Scientific understanding

A perspectives piece in Nature on AI and science provides a nice description of scientific “understanding” that I want to share here:

Scientific understanding

Imagine an oracle providing non-trivial predictions that are always true. Although such a hypothetical system would have a significant scientific impact, scientists would not be satisfied. They would want “to be able to grasp how the predictions are generated, and to develop a feeling for the consequences in concrete situations”13. Colloquially, we refer to this goal as ‘understanding’, but what does this really mean? To find criteria for scientific understanding, we seek guidance from the philosophy of science… Numerous philosophers [have tried] to formalize what ‘scientific understanding’ actually means. These proposals suggest that ‘understanding’ is connected to the ability to build causal models (for example, Lord Kelvin said “It seems to me that the test of ‘Do we or not understand a particular subject in physics?’ is, ‘Can we make a mechanical model of it?’”13), connected to providing visualizations (or Anschaulichkeit, as its strong proponent Erwin Schrödinger called it26,27) or that understanding corresponds to providing a unification of ideas28,29.

More recently, Henk de Regt and Dennis Dieks have developed a new theory of scientific understanding, which is both contextual and pragmatic12,13,24. They found that techniques such as visualization or unification are ‘tools for understanding’, thereby connecting previous ideas in one general framework. Their theory is agnostic to the specific ‘tool’ being used, making it particularly useful for application in a variety of scientific disciplines. de Regt and Dieks extended Werner Heisenberg’s insights30 and, rather than merely introducing theoretical or hypothetical ideas, the main motivation behind their theory is that a “satisfactory conception of scientific understanding should reflect the actual (contemporary and historical) practice of Science”. Simply put, they argue that: “A phenomenon P can be understood if there exists an intelligible theory T of P such that scientists can recognise qualitatively characteristic consequences of T without performing exact calculations”12,13. de Regt and Dieks defined two interlinked criteria:

  • Criterion of understanding phenomena: a phenomenon P can be understood if a theory T of P exists that is intelligible.
  • Criterion for the intelligibility of theories: a scientific theory T is intelligible for scientists (in context C) if they can recognise qualitatively characteristic consequences of T without performing exact calculations.

We decided to use this specific theory because it can be used to ‘experimentally’ evaluate whether scientists have ‘understood’ new concepts or ideas, rather than by inspecting their methodology, by simply looking at the scientific outcome and the consequences. This approach also coincides with Angelika Potochnik’s argument that “understanding requires successful mastery, in some sense, of the target of understanding”11.

Scientific discovery versus scientific understanding

Scientific understanding and scientific discovery are both important aims in science. The two are distinct in the sense that scientific discovery is possible without new scientific understanding… Many discoveries in physics occur before (sometimes long before) a theory or explanation, which provides scientific understanding, is uncovered. Examples include the discovery of superconductivity (and its high-temperature version), the discovery of the cosmological microwave background, neutrino oscillations and the discovery of a zoo of particles before the invention of the quark model.

These examples show that scientific discoveries can lead to scientific and technological disruption without directly contributing to scientific understanding11,24.