Update: were aware of some data problems and are working on a revision of this post.
Inspired by the visualization that Donald Berry did in his Nature article, we'd like now to look again at the Landis data and positivity criteria.
We will look at the data presented in Exhibit 26, LNDD 433-436, plus data from the Landis B sample in USADA 352, as well as the reported results from Exhibits 86, 87, 89, 90, 92 and 93. The last group are the "alternate B samples", plus some "controls" provided by Dr. Aquilerra.
Note that in this discussion we completely ignore the issue of correct peak identification, which some have argued should have been the end of the story. We're now taking the reported values at face value, and looking at them.
The data presented here is available in an Excel spreadsheet in the archive for folks to do their own manipulation should they like.
Here is a picture with all the data from LNDD 433-436, plus all the Landis samples, in a form similar to that used by Berry. We've mapped the 5aA against the 5bA, and the Andro vs. the Etio, which differs from his chart.
Let's walk through this slowly to be sure we understand what is shown. The yellow vertical and horizontal lines represent the -3.00 delta unit limits, and the purple lines are the -3.8 practical limit because of LNDD's measurement uncertainty.
One of Berry's major points is questioning whether these limit lines are in the right place to achieve statistically valid conclusions. (There are others, to which we will return).
Solid red triangles represent 5aA and 5bA values reported by LNDD on samples they have deemed "negative". Outlined red triangles are the same values for those LNDD has deemed "positive".
Solid Red squares represent Andro and Etio deltas for "negative" samples, and outlined red squares are those for "positive" samples.
Blue triangles represent Landis 5aA and 5bA values, and Blue squares represent his Andro and Etio values.
We see in this picture that none of Landis' Andro and Etio values met the -3.8 criteria for any sample test; one exceeds it in one dimension for a -3.0 criteria.
On the 5aA and 5bA values, we see one of Landis' sample tests exceeds -3.8 on on two criteria; this is the Stage 17 A sample. When the B of the same control was run, only one of the values exceeded the -3.8 limit. On the "alternate B" samples tested, two exceed the -3.8 on one measure.
Because the B sample is legally controlling, all of the Landis controls result in three tests that exceed the -3.8 limit on a single value. When we say controlling, it's from the requirement that the A and B confirm each other. What counts is what they agree on, in this case, on only one metabolite over the -3.8 limit.
Is a single value exceeding a -3.8 limit proof of doping?
We believe Berry questions this on two grounds, first, that value of the limit hasn't been validated to a supportable level of confidence, and second, that a multi-dimensional analysis for reliability hasn't been done.
We see, in practice, that the LNDD is willing to declare a positive on a single measurement over that value, whether or not they have measured any other values. In the Landis case, CAS has accepted this from a legal point of view. Berry, among others, would question whether this is scientifically supportable. Others think CAS may have erred in the legal analysis, but there is no recourse for such a mistake.
Let us look at the LNDD provided data, and see what we can learn.
On LNDD 436, there are 27 reported "positives." We break them down in Table 1.
|Number of positive metabolites ||All |
|5aA, 5bA |
|5aA, 5bA |
|0 ||1 ||1 |
|1 ||4 ||1 ||8 ||6 |
|2 ||12 ||9 ||17 ||20 |
|3 ||3 ||6 |
|4 ||8 ||11 |
Reported positives and number of metabolites exceeding limits.
Of the 27 positives, 8 were 4 for 4 over the -3.8 threshold value. 3 more were 3 for 4. The number that has only two positive metabolites is considerable, 12. Four were declared positive on the basis of a single metabolite greater than the -3.8 limit, and there was one case of a positive declared with a single value that exceeded the -3.0 limit. By LNDD's own testimony on the proper use of measurement uncertainty, this should not have been reported as a positive, yet here it is in the report. The average LNDD positive has 2.5 metabolites over -3.8.
The Landis A sample is one of the 12 considered positive for two on LNDD 436, but it is not controlling because it wasn't confirmed by the B. Adding that Stage 17 B sample and the alternate B's, Landis doubles the number of tests that LNDD has declared positive on the basis of a single metabolite.
Because the difference between the 5aA and 5bA measurements were raised as issues with the Landis defense through the testimony of John Amory, we've also counted the positives involving those two in separate columns. There are only two positives declared by LNDD without either being positive; 8 are positive based on one o them being beyond -3.8, and 6 based on -3.0. There are 17 where both are positive beyond -3.8. In 20 of the 27 reported positives, both values exceeded the -3.0 value. On average, LNDD positives had 1.5 positives between the two 5aA and 5bA metabolites.
How many metabolites should be positive to prove doping?
This the second of the major points in the Berry's opinion: we don't have enough data to know. We do know from studies conducted by the UCLA Olympic Laboratory and the Sydney Australia lab that they believe there are false positives with fewer than two positive metabolites. We also know that WADA, as presented in the Landis case, disagrees, and is content with declaring an AAF on a single metabolite. This is no small part of where Berry and Nature do not buy WADA's story.
Let's look at the LNDD data again. If the criteria were more stringent based on metabolites, where would that leave us? Let us confine ourselves to the bottom left part of Figure one, expanded here as Figure 2.
If we believe the -3.8 delta value limit, and take the most restrictive view, the quadrant below and to the left of the purple lines contains high-confidence positives. This interpretation would deem inconclusive the Landis Stage 17 B sample, and all of the other Landis alternate B samples.
If we took a requirement that all four metabolites LNDD measures should exceed -3.8, then there are 8 total positives instead of 27. If we said three of four, then we end up with 11 instead of 27.
If we say two of four, then there are 23 of 27, and none of them is Landis.
The values used to sanction Landis reflect the loosest possible interpretation of the criteria, and by definition the most likely to contain false positives.
We have noted before that the LNDD has a higher rate of reported steroid positives than other labs. They have 5% of the AAFs, but 10% of those for steroids. Why that might be. A particularly dirty group of tested athletes compared to other labs? Positivity criteria that are looser than other labs? Differences in the testing methodology?
It is hard to believe LNDD tests dirtier athletes than anyone else. One avenue of clear concern would be the "single metabolite" criterion, which we have discussed. Having a 2 or 3 metabolite standard might account for a much of the discrepancy.
Quite a lot of the Landis case dealt with questions about the methodology, so let's look at what the data may say about that.
A point that Landis raised through the testimony of John Amory was the likelihood (or not) of 5aA and 5bA measurements being as different as seen by LNDD in some of the Landis samples. If we look again at Figure two, there aren't many points below and right of the Landis data, particularly the S17 samples. This reflects the differences between the reported 5aA and 5bA deltas.
Of all the positives reported by LNDD, the average difference between the 5xA measurements is 0.87. On the4 Landis samples reported positive, it is 3.09. Amory testified this is biologically unlikely even if he was doping. USADA argues it is because of doping, and offered a single subject in a progress report by Schanzer in Cologne to make the point. We suspect Berry would be unimpressed by the statistical validity of that single data point.
Landis has two reported positives (the S17 A and B) where the difference exceeds 3.00; the only other positive reported by LNDD with a 5xA difference greater than 3.00 was the female E27.
In the very first filing on the case, the ADRB submission, Howard Jacobs suggested the difference was so great it was likely to have come from measurement error. WADA has argued it is because of how he was doping, and they have no need to or interest in finding out the details.
The opinion one forms on the Landis case comes a lot from the prejudices one brings into the discussion, and how much you look at the data. We believe that looking at the data, a significant number of qualified biochemists as exemplified by Donald Berry can come to quite reasonable doubt about the soundness of the "guilty" decision handed to Landis.
Under the WADA Code, this is irrelevant. They have secured their sanction against Landis, and have marched onward to Beijing.