Of the various responses to my decision in the covid origins debate I co-judged, one pro-lab-leak-theory critique stood out as level-headed and grounded in reason.1Is it a coincidence that the author is also in the minority(-ish) of lab-leak-theory proponents who acknowledge climate change? For the record, I believe Saar also recognizes climate change. Michael Weissman, physics professor at UIUC, wrote a well-thought out Bayesian analysis of covid origins, followed by a briefer discussion of other Bayesian analyses, including those that stemmed from the covid origins debate.
(In my quick skim of Weissman’s Bayesian analysis, I noticed the evidence it uses is substantially the same as Rootclaim’s, and thus I have addressed most of these points at length in my report. I think there is ample room for reasonable disagreement on some of these items, such as the significance of the FCS or the priors, but not with others. For example, the mahjong room, HSM bathrooms, or Bloom’s negative correlation in HSM samples are plainly non-credible even if you believed the evidence for them to be true; and the evidence for the mahjong room completely falls apart under scrutiny, being reduced to a single sentence published in the New Yorker containing a fourth-hand rumor from 2020 August.)
I was concerned that people might interpret my Bayesian analysis as the “main product” of the report, in isolation from the rest, which is why I heavily cautioned against taking it too literally; indeed I explicitly state that I do not think it is an appropriate technique for this problem, and am only producing one because it feels fair to do so after criticizing Rootclaim’s analysis. So it is a little unfortunate that Weissman chose narrowly to critique my Bayesian analysis and not the more interesting parts of the report. Alas, you must work with the critiques you are given and not the ones you want. While I do not stand by my Bayesian analysis, it is not totally without merit.
I will quote the relevant part of Weissman’s article in full, interleaved with my responses.
The two rootclaim debate judges, Eric Stansifer and Will Van Treuren posted coordinated but partially independent analyses of the evidence presented in the organized debate. I have major disagreement with both analyses, including on basic methods. For now, I’ll just describe some key points. These comments will be extended and perhaps modified when I get a chance to look over these analyses more carefully. I will use first names here, following the convention used by the debaters.
Not sure in what sense they were “partially” independent? Will and I were not allowed to coordinate in any capacity, and I did not find out his decision until well after I had committed my own. While the current version of my report includes edits made after I had seen Will’s report and comments on my own, there is a version on my website (also the version I think most people saw, including evidently Weissman) that does not.
Neither Will nor Eric use my new argument showing that Worobey has strong internal evidence of ascertainment bias, since it came out after the debate.
Weissman’s argument is elsewhere in that article. It is a fair argument, but is addressed in Worobey’s paper:
[Worobey:] One of the key findings of our study is that ‘unlinked’ early COVID-19 patients, those who neither worked at the market or knew someone who did, nor had recently visited the market, resided significantly closer to the market than patients with a direct link to the market. The observation that a substantial proportion of early cases had no known epidemiological link had previously been used as an argument against a Huanan market epicenter of the pandemic. However, this group of cases resided significantly closer to the market than those who worked there, indicating that they had been exposed to the virus at, or near, the Huanan market. For market workers, the exposure risk was their place of work not their residential locations, which were significantly further afield than those cases not formally linked to the market.
In short, workers at a market tend to come from further afield than shoppers. Therefore people infected at the market while shopping (or indirectly from such people) will tend to be closer than those infected at the market while working. Since known links to the market are almost exclusively the latter, that accounts for the disparity.
Update: From our conversation, I have learned that Weissman was unaware of the data in the WHO report about December cases. In the absence of such data his argument is quite plausible, but the data shows an extremely clear confounder between working at the market and having an identifiable epidemiological link to the market.
Continuing with Weissman,
Neither uses the recent re-examination of the restriction enzyme site pattern prompted by Kopp’s publication of the detailed DEFUSE plans for those, matching the observed results. These may also have only become available after the debate.
I am unfamiliar with this.
Both use probabilities P(Wuhan|ZW) based pretty closely on population, but then use the HSM-based data to obtain by far the biggest ZW-favoring factor. This misses the key point about sub-hypotheses— the HSM results can only be used to boost the market-based sub-hypothesis of ZW, but the Wuhan wet markets got far less of the relevant animal trade than would be expected from the population fraction. One cannot combine an inter-city likelihood of one sub-hypothesis with an intra-city likelihood of a disjoint sub-hypothesis.
Unfortunately I did not have access to any data on wildlife-exposure that would give improvement over assuming exposure risk to scale proportionally with population (other than the arguments that only urban populations are relevant, which notably eliminates the large majority of the wildlife-exposure). As Saar said that HSM was the largest such market in central China, and Peter indicated that it had an atypically-large wildlife presence, I suspected population might underestimate zoonotic risk; however this was insufficient to justify deviating from a population baseline. Both Peter and Saar used population in their calculations for the zoonosis prior. While a wide range of priors are plausible, I feel fairly okay with my choices, especially considering that the very shop Eddie Holmes identified as a potential zoonosis source in 2014 is in the exact center of the highest concentration of the earliest cases.
First, I’ll briefly discuss Eric’s analysis. I’m not sure this will matter in the end but after an unusually clear explanation of what frequentist p-values mean and an acceptable description of Bayesian reasoning, Eric gives a flawed description of how they are connected. Bayesian likelihoods are not p-values and the ratio of likelihoods is not the ratio of p-values. Perhaps this is more a case of mistaken explanation than a relevant mathematical error.
This is a very good catch, and is the reason I am responding to Weissman’s remarks, as it is clear that he actually read (at least part of) what I wrote. Calling a -value is an error, and it is repeated in my more detailed explainer on Bayesian analysis and hypothesis testing, as I didn’t feel like going into the exact differences and I didn’t think anyone would care anyhow; the pertinent section is even left as just an empty “todo” in my more detailed post. It would be more correct to say that corresponds to a -value, as eg this paper on the relationship does.
There is a very small but intractable distinction between , the probability of making an observation under the null hypothesis , and a -value. Suppose we are doing hypothesis testing and find some -value. To convert to a Bayesian framework we need a binary outcome: some observation that we either observe or do not. To do this we choose a significance threshold , and make the observation that . Then , the probability of a false negative, actually equals , not .
But what is ? We can’t just use , because of the key distinction between hypothesis testing and Bayesian inference: the threshold must be chosen before the experiment. This is the intractable difference that prevents direct conversion between Bayesian inference and hypothesis testing. (Fortunately, there’s no real reason to do such a conversion – just stick with the one framework that is most suitable for the task.)
In section 3 I write, “I am eliding some details and assumptions here; basically, I have assumed that any evidence against humanity is for space lizards, as this maximizes the statistical power of the test. I also have a nagging suspicion that I have omitted a factor of 2 in converting to Bayes factors, but the end result appears not to be missing any factors of 2.”, in reference to this problem, and that a principled solution might be to choose post facto. None-the-less I get the correct answers without introducing a factor of 2, so I guess it worked out.
(Just to be clear, none of this is relevant to my critique of Rootclaim’s Bayesian analysis.)
Update: After further communications Weissman indicated that he did not have this discrepancy in mind, and it seems completely misunderstood what I was saying. This rather discourages me from this conversation.
Update: The root of the disagreement appears to be that Weissman misunderstood my definition for . With that resolved I believe there is no longer a disagreement here.
With regard to the bottom line, Eric includes a likelihood factor of 10,000 based on the Worobey HSM location data. We have seen strong evidence that the case location data was severely biased. Bloom showed that the internal HSM nucleic acid correlations lacked the signature found for actual animal corona viruses. Just common sense would say that the chance that Chinese authorities would initially distort the available data to support the market hypothesis over the LL hypothesis was far larger than 1/10,000! Could anyone seriously claim that chance was much smaller than 1/10? The issue is analogous to the one I ran into for the LL-favoring CGGCGG factor. A very large odds calculation can be obtained within a narrow model. Both nature and people have ways of stepping around the narrow models, giving much lower odds.
(The Bloom correlations are discussed in section 4.6.2.)
Fortunately, I did not have to rely on Worobey’s location data for my analysis, as Saar mooted that the market really was the location of the first significant cluster of covid cases. Once you accept that, it is hard to imagine what is the most probable explanation under the lab-leak theory for how the first cluster arose at HSM other than coincidence, which I put at about 1 in 5000 (after accounting for potential biases in viral transmission that might favor the virus spreading at HSM). The most probable non-coincidence explanation is probably something I have not been able to think of; an unknown unknown. I am not sure what probability to give that, but it is low. I think it is fair to at least partially include unknown unknowns at the end, after including all factors, as some unknown unknowns (eg: if I am overestimating my ability to assess the data) will be entangled with multiple observations in unpredictable ways. This dampens the final number towards 1/2, as I stated in the report, but cannot make it cross the 1/2 line.
I do think it is fair to modestly reduce the 1 in 5000 number to account for uncertainty, but as I said in the report, it is the only Bayes factor that actually comes from calculating something (as opposed to me making up a number that feels good), and I already gave it a modest reduction.
None-the-less, even had it not been mooted, the evidence pointing towards the HSM was quite strong, and again it is a great challenge to describe a reasonable scenario in which this evidence exists but that the first significant cluster of cases was elsewhere. There do exist biases in the data – certainly known biases in cases from January 1 through 18, when an epidemiological link to HSM was a diagnostic criterion; but the market link is extremely obvious before then. While we should remain appropriately skeptical of the veracity of the raw data and guard against the possibility of it being deliberately altered by Chinese authorities (or others), it is not fair to simply dismiss the WHO report as “well, maybe China invented the data” without some kind of appropriate level of detail and analysis of the plausibility of this alternate scenario. Such a conspiracy theory would involve a vast number of people, including much of the international WHO investigative team; or alternatively, if the WHO team is not in on it, this theory requires China inventing a Potemkin market for the WHO team to investigate – but rather than riverside facades of Potemkin villages along the Dnieper, this fictionalized zoonotic theory would have to survive a multiweek investigation conducted by subject-matter experts directly collecting evidence and conducting interviews. (And, of note, the original “Potemkin villages” are mythical anyhow.)
If Weissman wishes to explain the Worobey (etc) data by positing such a conspiracy, I think it is necessary to go into details about how the conspiracy occurred, such as which actors are aware or naive to the deceit. I suspect that any attempt to nail down details of this deceit will fall apart as implausible; much as in section 6, where I tried to invent a conspiracy for how people at WIV could have engineered sars-cov-2 without anyone finding out, and each hypothetical explanation I came up with sounded plainly ridiculous.
I don’t know what numbers I would assign to the “China made up the data” theory, but it is definitely under 1%. For example, I’d say much less than 10% chance that China would try to create a Potemkin market, and a (conditional) 90% chance that people on the WHO team would have suspicions of malfeasance (or that evidence of it would come out later), which would bring that scenario below 1%.
It seems Weissman favors more of a middle-ground hypothetical, that China might have distorted the data rather than inventing it entirely, particularly through selective presentation. A conspiracy in which China selectively hid the existence of the earliest non-HSM linked cases could plausibly evade detection by the WHO team and falsely elevate the significance of HSM. However, this is not at all consistent with the data in the WHO report and elsewhere; for example, it was found that the earliest non-HSM-linked cases in the WHO report were erroneous, and additional HSM-cases were identified later. Thus the WHO data appears, if it is distorted, to be distorted away from HSM.
Additional work is required by Weissman to justify the 1/10 probability suggested here. There are many possible conspiracy theories I could write for how fictionalized data could have come to be, but rather than arguing against all of them at once the onus falls on people who believe in them to come up with a specific theory to falsify.
Eric also includes a probability of 1/50 for WIV even attempting DEFUSE-like work, This seems strangely low, completely in disagreement with the universal reaction of the Proximal Origins authors and their correspondents even before they knew of the DEFUSE proposal. Without worrying about other factors, just bringing these two factors into something even close to a reasonable range (say 1/5 for the work going ahead, 1/10 for the Worobey data coming from selective ascertainment and presentation) would swing Eric’s final odds to favoring LL by about a factor of 8.
I agree with Weissman that 1/5 is a pretty reasonable number for the probability of WIV performing DEFUSE-like work; scientific research is highly uncertain, plans change, and plans are especially likely to change when they don’t get funding they require. Note that DEFUSE-like work here is mostly collecting and sequencing coronaviruses from bats, as opposed to genetic engineering, as WIV is not involved with that work under Project DEFUSE.2If it were just collecting and sequencing viruses, I would go well above 1/5, but DEFUSE also includes things like conducting seropositivity surveys and cave surveillance and such. I am not including some of the speculative stuff near the end of the project like deploying bat vaccines that seemed to me like they were included in the proposal more to look good to funders rather than as likely research directions.
However my 1/50 estimate here is of the chance that WIV would attempt to engineer a dangerous virus (just to be clear, I am not supposing ill intent by WIV – presumably they would have some scientific purpose behind engineering such a virus). I relied on the text of the Project DEFUSE proposal, which I read closely, to ascertain what the project would have entailed if it went forward. (I also considered past publications by WIV researchers to see the range of activities they can or do undertake.) I am not a virologist, so reading the proposal was difficult and I may have failed to understand parts of it, but I did not see within the proposal a route to creating sars-cov-2. I did see repeated assurances of using known viral backbones that do not have pandemic potential, which implies the authors are aware of this risk and that it is desirable to avoid it.
Recently there has been some suggestion that the authors of the proposal may have downplayed how much of the work would have been conducted at WIV (that is, implied that work was going to be at US labs) to appease US regulators. What work, in particular, was downplayed is left unspecified. This implies that WIV had the technological capability to conduct that work (whatever that work is) and therefore raises (however imperceptibly) the prior that WIV went on to conduct it.
I don’t think the prior probability for WIV engineering sars-cov-2 goes much above the background level of any virology lab doing so; the Project DEFUSE proposal was mostly relevant in that it showed the sorts of research that the virology community is interested in doing, rather than some particular blueprint for WIV creating sars-cov-2.
I would like to end with something a little different:
None of the three clear existential threats to humanity– global warming, new pathogens, and nuclear war– can be addressed without science. I think that some public trust in science is a necessary though not sufficient condition for successful defenses against those threats. For example, public awareness of the scientific conclusion that SC2 mainly spreads by aerosols and of the value of indoor air filtering would have limited and still could limit the disease burden.
On this we agree. It is pleasing to see a pro-science voice. And there exists a big gap between what we know scientifically and what we have implemented in policies: there is ample low-hanging fruit in bringing our policies in line with the science. Fortunately there are at least a few things we can do as individuals, and improving the air quality in our homes (title: “Better air quality is the easiest way not to die”) is one of them. (Also, wear an N95 on the subway!)
Follow RSS/Atom feed for updates.