Tuesday, February 27, 2007

Uncertainty -- what about that 0.8 delta unit?

In all editions of the slide show, Arnie Baker has made a point to mention the 0.8 delta unit claimed accuracy of the CIR test, and removed it from the -3.57 value to get a value smaller than the 3.0 "positive" level. This has been met with skepticism here at TBV and elsewhere. The main argument against this is that the accuracy had to have been factored into the 3.0 cutoff level.

In a recent slideshow, Baker attempts to justify this, with a reference to The role of measurement uncertainty in doping analysis", which we've come across for the first time now. The abstract seems clearly on point:

The determination of measurement uncertainty is a critical issue in all fields of experimental science; its importance becomes maximal in the specific case of forensic analytical chemistry (including doping analysis), where uncertainty has not only to be calculated with precision, but it also has to be both small and reliable enough to support effective decision making. This contribution gives a general view of the organisation of the activity of the network of the anti-doping laboratories accredited by the World AntiDoping Agency (WADA), focusing in particular on the current situation related to the determination of measurement uncertainty in doping analysis. Representative examples referring to the procedures followed in the case of threshold substances are also presented and discussed.

In particular, on page 378, it says the following, which we'll call The Paragraph.

In practice, to report an adverse analytical finding, it is necessary to identify a prohibited substance, and, in the case of substances with a reporting threshold, to measure a value exceeding the threshold; in the latter case, it is mandatory to express the measurement uncertainty. It is not unusual for a result apparently exceeding the threshold, if taken as a single value or even as a mean value to not be correctly reported as `above the threshold' if the measurement uncertainty is not taken into account. In Figure 2 it is evident that only case `E' shows a value that is above the threshold also taking into account the measurement uncertainty.

On the surface, this supports Baker's approach, so were left with some questions:
  1. It this paper in any way authoritative?
  2. Is the CIR a "threshold" test to which the approach applies?


The paper is not an official WADA publication, so it's difficult to claim it is a controlling document that must be obeyed by the arbitrators. However, it is by Spiroto and Botre, the Quality Manager and Scientific Director of the the Rome laboratory. The Paragraph says, "In practice, to report...", so we are in the realm of interpretation, not hard rule.

At the same time, the conclusion of the paper asserts that the goal of the labs and WADA is to,
ensure that all methods followed by the laboratories are `fit to purpose', the latter being to provide `clear and convincing' evidence of any detectable doping offence based on an adverse analytical finding.

which would argue that failure to consider the uncertainty may fail the 'clear and convincing' part of the goal.

It appears to us the authority of this argument rests on ethical and moral grounds, not on a de-jure reading of the rules. This can be seen as a failing in the rules.

Applicability to the CIR

Should we conclude that the uncertainty bars need to be taken into account for substances with threshold amounts, we then need to determine whether the CIR is a threshold test or not. This is not clear cut either. The description in WADA TD2004EAAS, in the same paragraph with the dreaded "metabolite(s)", does not use the word "threshold":

The results will be reported as consistent with the administration of a steroid when the 13C/12C value measured for the metabolite(s) differs significantly i.e. by 3 delta units or more from that of the urinary reference steroid chosen. In some Samples, the measure of the 13C/12C value of the urinary reference steroid(s) may not be possible due to their low concentration. The results of such analyses will be reported as “inconclusive” unless the ratio measured for the metabolite(s) is below -28‰ based on non-derivatised steroid.

But, we also notice for the first time that the "3 delta units" is not offered as a hard value either -- it is used an an example ("i.e.") of what ought to be the controlling restriction, "differs significantly".

If one takes 3.0 as a fixed value, then it looks like a threshold value, and it's a fair argument to make that the uncertainty should be taken into account.

If one does not take 3.0 as a fixed value, and latches on to the "differs significantly" wording, then there is clearly discretion allowed in the rule, and one can equally argue the uncertainty should be taken into account. In fact, the rule as written encourages this view, because it doesn't say three-point-oh, it says "3 delta units", indicating (to us) an intentional vagueness of the precision.

Because of the vagueness of the rule as written, there is clearly opportunity for variance in lab reporting of the same measured values. This is inconsistent with the WADA goal of harmonization, but it's not clear that needs to be considered by the arbitration.

All in all, it's a nice mess were left with, Ollie.

Attempts to "simplify" the issues all seem to come back to the same Rorshach -- do you think they are all dopers, so you convict shades of gray, or do you think grey means "probably not black?"

Our opinion, after looking at the paper and reading the rule again is changed. We're inclined to accept the argument that the 3.57 metabolite should not be considered a positive.

That leaves us with the previously unresolved problem of whether the single metabolite at 6 delta units is sufficient to declare a positive. And we're not going there now.

update: Further discussion of this topic continues at DPF.

update 2: An emailer sends the following on 6-Mar-2007:
Readers interested in learning more about the importance of measurement uncertainty may want to visit the following National Institute of Standards and Technology (NIST) webpage: http://physics.nist.gov/cuu/Uncertainty/index.html
NIST Technical Note 1297 is available for free via: http://physics.nist.gov/Pubs/pdf.html
If some enterprising readers wish to look at the Uncertainty paper and think about how it applies to the Landis case, we'd be obliged. I have a suspicion the LNDD quoted uncertainties are not in conformance with the NIST recommendations, but I don't understand the implications.



Thomas A. Fine said...

I've pointed this out before somewhere, but in the LDP, they explicitly provide the calculated delta-d13C values, as well as the +0.8% and -0.8% numbers.

I think the fact that it is reported in this way makes it clear that this is the range of possible results, and therefore it is entirely possible that a -3.57 could really be -2.77.

I'll leave the actual slide numbers as an exercise for the student.


Anonymous said...

Baker is clearly right.

Anonymous said...

If it helps i.e. is an abreviation for "that is" not an abreviation for "for example." Thus, under your interpretation is clearly defines a "threshold." The debate, I guess, is about whether the uncertainty is incorporated in that threshold.



Anonymous said...

I've come across some info on this issue, TBV. I'll start a thread at DP Forums, it seems easier that way.


Jim T said...

In cases involving statistics and uncertainty, I read "significantly" as "statistically significant" which means that the uncertainty should be taken into account. But under this interpretation, there should also be a specification of the degree of statistical significance required (5% level of significance, 1% level of significance, etc.) So who knows what "significant" means in this case.

Anonymous said...

If the 3.0 already has the marging of error factored in, the rules would have to state somewhere else what that margin of error is. If they don't then it's reasonable to assume that the margin of error should be added to the 3.0 threshold.

Is there anything in the rules that says you have to use a specific spectrometer model (or models), or that defines the required accuracy of the spectrometer? If not then I don't see how the specified thresholds could include a margin of error.

But maybe I'm rambling incoherently again.

~ Cub

Jim T said...

I've done some more reading on this and have concluded that it's reasonable to think that the 3.0 delta unit difference already account for measurement uncertainty.

The CIR test apparently has a standard error of 0.8 delta units, so a difference of 3 between the 13C/12C value measured for the metabolite and that of the urinary reference steroid would be 3.75 standard errors, which is statistically significant by any reasonable standard.

But if this interpretation is correct, then the Spiroto and Botre paper's example doesn't make sense. Why add on the uncertainty if the threshhold has already been set high enough to account for measurement uncertainty? Or better yet, why not just specify that the difference must exceed the measurement error by some chosen value - say three or four standard errors? Spiroto and Botre's "account for uncertainty by setting a threshhold and then account for uncertainty again by subtracting the standard error from the sample's value" method seems strange.

At any rate, the interpretation doesn't seem clear even to the ADA folks.

Duckstrap said...

You are assuming that a difference of 0 is the expected value (it is not for 3 of 4 of the metabolites in the IRMS analysis), and that the uncertainty really is 0.8. The origin of that value is not specified anywhere, and in the Landaluze decision, one of the experts asserted that a more appropriate value was 1.35.

Jim T said...

You're right - I was assuming a difference of zero is the expected value and I was taking the 0.8 standard error as given.

But even if these assumptions are incorrect, it didn't seem to make sense to account for uncertainty by setting a threshhold level and then again accounting for uncertainty by adding (or subtracting as appropriate) the uncertainty from the measurement.

However, I was forgetting that we have two sources of uncertainty here - measurement error and the natural variation between individuals. Perhaps the threshhold level is set to account for variations between individuals (like the 4/1 T/E ratio when the "average" ratio is 1/1). Then it might make sense to account for the measurement error separately. But still, just adding (or subtracting as appropriate) the measurement error doesn't give a very high level of statistical significance (two standard errors is about the five percent level of significance for normal distributions).

I suppose the ADA could argue that the 3 delta unit difference is meant to account for BOTH the measurement error and the variation between individuals. But it would be nice to know where this number came from and what, if any, uncertainty or measurement error it supposed to account for.

Daniel said...

I'm with the two variance party:
1) the 3 delta units being the (arbitrary) cutoff for the upper limit of acceptable 13C/12C difference (which will have an associated false positive rate), and
2) the individual variation (which is, most likely, an average estimate of individual variation, not an estimate of the individual Floyd Landis' variation) in this measurement.

So I officially I would call it a negative for that particular metabolite. We still don't know the false positive rate for using one, two, three, or all four metabolites above the threshold of 3 units, since no one to my knowledge has bothered to validate this test in a larger study than those I've seen published.

Ask yourself, if this was the test you were taking to find out you had cancer, and the docs were saying you should start the chemotherapy because one out of four metabolites was positive, would you do it? There is no evidence/literature supporting a positive test interpretation for one (or even two) of four metabolites above the 3 delta unit threshhold, is there?

Thomas A. Fine said...

I think it was Maitre that described some metabolites returning to normal faster than others, which implies that there is a period in which only one metabolite might be above some threshold.


tbv@trustbut.com said...

Our #1 fan, Will @ topix has been beating the drum for Maitre's single metabolite positive for a while. I believe this is where the 'single best metabolite' argument made in the ADRB filing comes into play, because the one with the slowest decay rate isn't positive in the Landis sample.

I don't know/recall what the one in Maitre was, and I'm not able to look it up right now. Can someone else?


Duckstrap said...

In Maitre, the metabolite that was slowest to return to normal was 5beta-androstanediol, which was not positive in the Landis samples. For Landis, the "positive" metabolite was 5alpha-androstanediol. It should be noted, however, that the graphs shown in Maitre are for a single subject, however. Thus we don't have a great idea about how the different metabolites respond in a population of individuals.

Daniel said...

If more could appreciate (what I see as uncertainty in) this state of the art, "confirmatory" test, I would hope the Floyd-is-getting-off-on-a-technicality rap might slowly be replaced by a WADA-didn't-research-this-test-enough-before-using-it rap.

Thomas A. Fine said...

Yeah, dan. That worked so well for Tyler.


Daniel said...

I hear you Tom, but by "rap" I was not suggesting putting this forth in the confines of the Court of Arbitration for Sport, but in the public view, like the Hiltzik series of articles in the LA Times, except printed in more papers than the just the LA Times, or dare I say, L'Equipe, to give me a sense people are hearing two sides of the case.

I don't know much about the HBT test, or Tyler's results, so I'll defer to you as to whether he chose the right defense.