Saturday, July 05, 2008

Larry: The 20% Solution

Larry sends us the following rumination.

When the Landis case is discussed, one popular topic is the applicable margin of error for the lab's testing that led to Landis' suspension for doping. Thanks to the CAS proceeding and the documents released by the Landis team, we have new information regarding this margin of error. In this article, I will consider this new information to see if it affects our analysis of the work performed by the lab in the Landis case.

[MORE]


Let's start with the basics. In the Landis case, the French lab (LNDD) reported an "adverse analytical finding" (AAF) that Landis doped with exogenous (artificial) testosterone. The AAF was based on the lab's
performance of a "carbon isotope ratio" (CIR) test. The CIR test measures the "isotopic value" of two substances in an athlete's urine, and subtracts one value from the other. The result of the subtraction is called the "delta - delta" value. (This result is expressed in parts per thousand, or "0/00", but to keep things simple, I've dropped the "0/00" from our discussion.) If the delta - delta is less than -3.0, then the athlete is
presumed to have doped with exogenous testosterone.

Sounds complicated? Then let's make it simple. To determine if Landis doped, the lab measured "A" and "B", then subtracted B from A. In other words, A - B = C. If C was less than -3.0, then Landis failed the test. In
one of the measurements performed by LNDD on a Landis sample, C WAS less than -3.0.

How does margin of error come into this discussion? Well, every measurement has a margin of error -- no measurement is perfect. The key questions are, how large is the margin of error for a given measurement, and is the measurement accurate enough to be "fit for purpose". If you measure the size of your foot and you're off by 1%, that's probably "fit for purpose" -- you'll probably end up with a shoe that fits. If you work for NASA and your measurements are off by 1%, that's not "fit for purpose" - your rockets are not going to go where you want them to go.

Correct determination of a lab's margin of error is key to the lab's operations. I discussed measures of lab accuracy in my "Curb Your Anticipation" series, and if you'd like a fuller picture of how the lab rules address margin of error, I'd point you in particular to parts 7 and 8 of this series.

Let's get more specific. LNDD has indicated that its margin of error for its CIR testing is ± 0.8. In other words, the result of its delta - delta calculation might be off by as much as 0.8 in either direction. So, if LNDD measures a delta - delta of 2, then the true result might be as great as 2.8, or as little as 1.2.

One measure of the delta - delta for Landis was -6.14. If we apply the stated margin of error to this measurement, we end up with a delta - delta that can be no greater than -5.34. Remember, any measure of delta - delta less than -3.0 is supposed to be a violation. -5.34 is smaller than -3.0, so on this measurement Landis flunked the delta - delta test even taking the 0.8 margin of error into account.

(For those not so mathematically inclined, please understand that the isotopic values for compounds in CIR tests are typically negative numbers. A negative number is a number less than zero. With negative numbers, our sense for which numbers are larger and which are smaller can get confused. A negative number like -5 is "less negative" than the number -6, so -5 is larger than -6. This is confusing for those of us who don't deal with negative numbers in our day-to-day lives!).

Of course, if the correct margin of error at LNDD was a lot higher than 0.8, then LNDD might not have been able to prove that Landis doped.

We've long suspected that the LNDD's stated margin of error is too small. Our suspicions are based on measurements reported by LNDD that vary by a lot more than ± 0.8. For example, if you look at the latest version of the Arnie Baker wiki defense, at p. 203, you find a chart (figure 141) showing a peak with a measured "isotopic value" of -31.64. The measurement here is for the isotopic value of a substance called 5a Androstanol AC (5aA AC), which is added by LNDD to all urine samples. 5aA AC is a reference material that LNDD buys from a lab supply store, and we know that it has an "isotopic value" of -30.46. So, in this particular case we know that LNDD's measurement of this single isotopic value was off by 1.18. And remember, the delta - delta measurement requires the lab to measure the isotopic values for TWO substances. If the measure of the isotopic value of both of these substances is off by 1.18, then we could conclude that the correct margin of error at LNDD might be ± 2.36, not ± 0.8.

Let's summarize. The determination of the LNDD's margin of error for CIR testing is CRITICAL to the determination of whether the lab correctly found an AAF in the Landis case. The lab's stated margin of error is ± 0.8. Some of us think that the real margin of error at LNDD is a lot higher.

With the release of the CAS decision and a host of new documents here on TBV, we now have a critical new piece of information relating to the LNDD margin of error.

It turns out that the LNDD's accreditation to perform the delta - delta test did NOT initially indicate a margin of error of ± 0.8. It indicated a margin of error of 20%. See The Wiki Defense p. 87.

Before we look at what a 20% margin of error might mean in the Landis case, let's take a look at how USADA reacted to the revelation that the LNDD accreditation documents referred to a 20% margin of error. USADA argued that (1) the reference to a 20% margin of error was ITSELF an error, (2) this error was corrected in the accreditation documents -- the correction being made AFTER the analysis of the Landis samples, but in a retroactive manner effective PRIOR to the analysis of the Landis samples, and (3) even if the lab actually had to apply a 20% margin of error to its CIR measurements, the Landis samples would STILL have had a delta - delta of less than -3.0 and would have flunked the CIR test.

It is CRITICAL that we understand USADA's position here. USADA has stated that the reference to a 20% error rate was a mistake, and that this mistake was timely corrected. In USADA's post-hearing brief, p. 8, USADA states that the applicable margin of error was ±0.8, not 20%. In his opening statement for USADA (pp. 183, 187), Richard Young stated that the 20% error rate initially reported for the CIR test was in reality the margin of error for a DIFFERENT test, and that the stated margin of error for the CIR test was corrected in December of 2006 effective as of May 1, 2006.

But before we address whether the reference to a 20% margin of error was a mistake, we need to first try to understand how a 20% margin of error would work. This requires us to ask the question: 20% of WHAT? Remember, a delta - delta calculation is, ultimately a subtraction problem: isotopic value A minus isotopic value B equals delta - delta C, or just A - B = C. If the margin of error is 20%, is that margin of error applicable to C, or to A and B?

If you play around with the numbers a bit, you'll find that if you apply a 20% margin of error to C, you are claiming MUCH greater accuracy than if you apply the 20% to A and B. Let's say that we're looking at a calculation of A - B = C, where A is 100 and B is 50. If we apply the 20% margin of error to C, then we're saying that C might be anywhere between 40 and 60. That's not too bad, as accuracy goes. But if you apply the 20% to A and B, then you could end up with a C that might fall anywhere between 20 and 80. That's not nearly as accurate.

I'm not a math guy, but as it turns out, the difference between these two applications of percentage error seems to become greater the closer that A gets to B. For example, let's use values for A and B that might be typical.

In a real-world delta - delta test, where A might be 30 and B might be 25. In that case, if you apply the 20% margin of error to C, you get a range for C between 4 and 6, or ± 1. If you apply the margin of error to A and B, you get a range for C between -6 and 16, or ± 11! Clearly, there's a big difference between applying the margin of error to A and B and applying it to C.

Both sides have BRIEFLY considered how the 20% margin of error might be applied to CIR testing, and predictably, the two sides disagreed on how this would work. In the Wiki Defense, Arnie Baker applied the 20% to the calculation of A and B, with the result that all of Landis' CIR test results were normal. See Wiki Defense p. 87. USADA applied the 20% to the calculation of C, effectively increasing the applicable margin of error to about ± 1.2, but not affecting the finding against Landis. See USADA Post-Hearing Brief p. 8 footnote 7.

Assuming that the 20% margin of error was applicable, which side is right - was there an AAF, or wasn't there? It's a close question, requiring us to dive into the convoluted logic of the Landis case. The answer probably is, it's impossible to say that either side was right.

Why do I say that neither side was right? Well, without more information, a stated 20% margin of error is close to meaningless. As we've already pointed out, a 20% margin of error can be applied in different ways to imply different levels of accuracy. In addition, a 20% margin of error produces different ABSOLUTE ranges of error, depending on the size of what we want to measure. If we apply our 20% figure to a delta - delta of 10, we get a margin of error of ± 2. If we apply it to a delta - delta of 2, we get a much smaller margin of error, ± .4. And for a delta - delta of 0, our 20% rate would indicate that the calculation is exactly on the nose, with NO error (20% of zero is still zero). That can't be right.

What makes sense is to apply the 20% margin of error to the kinds of delta - delta calculations that we would expect the lab to encounter in the real world testing of athletes. In particular, we care most about the lab's margin of error when the lab is measuring a sample with a delta-delta close to -3.0 (which, as we noted above, is the standard for determining whether an athlete doped with artificial testosterone). If a lab measures a delta - delta of +30, or -30, we're not so concerned about margin of error. But if the delta - delta computes to something close to -3.0, say -3.1, then we're VERY concerned about margin of error. It makes sense to measure a test's margin of error for results that are CLOSE to the level required to prove an AAF.

(An aside for the more scientifically inclined: USADA also appears to have claimed that the LNDD's margin of error was better for peaks with longer retention times, and that the ± 0.8 margin of error was specifically applicable to peaks with retention times roughly corresponding to the peaks being measured for the CIR tests. This is a topic for another day's exploration.)

So, perhaps the original lab accreditation meant to say that, when the lab is measuring a delta - delta close to -3.0, then the lab's margin of error is 20%. THAT would make more sense, and would largely eliminate the concern we expressed about widely different delta - delta calculations having widely different absolute margins of error. Of course, the LNDD CIR accreditation did not SAY this - it did not say "20% margin of error at a delta - delta near -3.0", it just said "20%". But let's try this out as an assumption. What would happen if we tried to work out a margin of error of 20% near a delta - delta calculation of -3.0?

Well, let's do some math. 20% of 3.0 is 0.6. You might then assume that a 20% margin of error should translate into an error rate of ± 0.6. But the math doesn't exactly work out that way. If the lab measures a delta - delta of -3.6 with a 20% rate of error, then the TRUE delta - delta might fall anywhere in the 20% range above or below -3.6. This 20% range (calculated with a round-off favoring the athlete) is from -2.8 to -4.4 - and remember, -2.8 is too large a number to allow the lab to find an AAF. A delta - delta of -3.7 has the same problem - the high end of the 20% range is too high for an AAF. But a delta - delta reading of -3.8 allows for a finding of an AAF even with a 20% range - in fact, it's the highest possible delta - delta (expressed to the nearest tenth of a percentage point, with rounding off favoring the athlete) that could result in an AAF given a 20% margin of error. Compare this -3.8 figure to the -3.0 figure required under the CIR rules, and you get an absolute margin of error of ± 0.8. The exact same margin of error used by LNDD in the Landis case.

In other words, if we assume that the accreditation for the LNDD CIR method should be understood as specifying a 20% margin of error near a delta - delta calculation of -3.0, then the inclusion of the 20% error rate was NOT a mistake. With this assumption, the 20% margin of error was DEAD ON accurate. Or more precisely, given our assumption, a 20% margin of error is the SAME THING as an error rate of ± 0.8. And most importantly, the 20% margin of error would change nothing in the Landis case: the critical delta - delta calculation used to find the Landis AAF would not be affected one iota by applying a 20% margin of error - again assuming the correctness of our assumption on how the 20% margin should be understood.

Is that the end of our discussion? Not exactly! Yes, *I* think that the 20% margin of error can be most accurately translated into an absolute margin of error of ± 0.8. But there's a powerful and authoritative witness in this case who ABSOLUTELY disagrees with me! There's a witness in this case who has testified that a 20% margin of error is NOT applicable to the CIR test at LNDD and does NOT equate to ± 0.8. And while all of the other witnesses for Landis were largely ignored by the two arbitration panels, the witness on this point CANNOT be ignored by USADA or the arbitrators.

On this point, the witness in favor of Landis is USADA itself.

Go back to our earlier discussion. In this case, USADA has argued that the 20% margin of error was a MISTAKE. Not just a mistake, but effectively a LARGE mistake, a mistake that required retroactive correction. USADA argued that the correct margin of error was not 20%, it was ± 0.8. Apply some simple logic: if 20% is a mistake, and ± 0.8 is not a mistake, then 20% does not equal ± 0.8. Ergo, I must have been WRONG to conclude that a 20% margin of error is best understood as a margin of error of ± 0.8.

OK, I can hear some of you raising objections. Sure, USADA claimed that the 20% margin of error was a mistake, but that doesn't necessarily mean that 20% does not equate to ± 0.8. Perhaps when USADA argued that 20% was a mistake, it meant that the accreditation body should have explicitly stated the assumption that the 20% was applicable to delta - delta calculations of around -3.0. Or perhaps USADA meant that 20% was a mistake in that it was a confusing way to express the margin of error, and that a stated margin of error of ± 0.8 would be easier to understand. Perhaps USADA was REALLY arguing that the 20% margin of error was essentially correct, but was EXPRESSED in a mistaken way.

But that's not what USADA argued.

As we pointed out above, USADA argued that the 20% margin of error was a mistake caused by the accreditation body confusing margins of error for DIFFERENT tests. According to USADA, the accreditation body took the margin of error for the "T/E" test, and accidentally reported it as the margin of error for the CIR test. In other words, the reported 20% margin of error had NOTHING TO DO with CIR testing. It's not that the 20% margin of error should have been expressed differently - USADA's position is that the CIR test did not have a 20% margin of error in any way we could imagine.

And just to make this point crystal-clear, USADA provided us with an example of how a 20% margin of error would have been applied to CIR testing, if the 20% margin of error HAD been applicable:

"Even using 20% uncertainty, Appellant's sample would still have been declared positive (6.14 delta-delta units ± 20%=4.91 delta-delta units.)"

In other words, USADA argued that a 20% uncertainty did not translate into a margin of error of ± 0.8. It argued that a 20% uncertainty would translate into a 50% larger margin of error -- roughly ± 1.2. And for whatever it's worth, it appears that the CAS ruling agrees with USADA on this point. See CAS Opinion paragraph 48.

So . given that USADA categorically disagrees with my interpretation that a 20% margin of error translates into a margin of error of ± 0.8, I'm forced to consider how else we might understand a 20% margin of error.

Can I adopt the USADA interpretation quoted above, which effectively translates the 20% margin of error into an absolute margin of ± 1.2 delta - delta units? No. The USADA method will produce a different absolute margin of error depending on the size of the delta - delta measurement: there will be a large margin of error for a large calculation of delta - delta units, and no margin of error whatsoever where the delta - delta equals zero. We rejected this approach before as making no sense. The only approach that makes sense is to apply the margin of error to calculations that we expect to see in the real world; in particular, those calculations close to the -3.0 delta - delta used to determine an AAF.

Can we apply the 20% to delta - delta calculations close to -3.0? No, that's the approach we discussed in detail above, the one we tried to use but that USADA effectively rejected.

Let's go back to our explanation of the delta - delta calculation: the calculation boils down to A - B = C. We can't seem to find a way to apply the 20% error rate directly to C. What about trying the Arnie Baker approach, and applying the calculation to the isotopic values A and B? Based on what we know, I'll pick a set of isotopic values that could produce a delta - delta value of -3.0: I'll pick a value for A of -28 and a value for B of -25. If we apply the 20% margin of error to each of these calculations, we get an absolute margin of error of ± 10.6.

To note, we'll get a slightly different absolute margin of error if we apply the 20% figure to different possible real-world calculations of A and B. For example, we noted above that where A is -30 and B is -25, application of the 20% figure will give us an absolute margin of error of ± 11. I won't try to go through any more of these calculations. For the moment, let's say that applying a 20% error rate to real world values of A and B produces an absolute error rate of about ± 10.

There are three things we can say about a margin of error of ± 10. First, it completely exonerates Floyd Landis. Second, it's about 12 times larger than the LNDD's reported margin of error of ± 0.8. Third, it's an absurd margin of error - not absurd from the standpoint that it could not conceivably be true, but absurd from the standpoint that no decent accreditation body could possibly approve a CIR method with a margin of error of ± 10. A method with this kind of margin of error could not be fit for any purpose.

Where does this leave us? What can we possibly conclude from all this?

I'm not sure.

Obviously, we can't prove much from a stated percentage margin of error that might mean either ± 0.8 or ± 10. We might argue that if the margin of error is ambiguous, it should in fairness be interpreted in a manner favorable to the athlete. But this is the sort of legal "technical" argument that seems to appeal to me and to no one else, not even here on TBV.

We might argue that USADA got so caught up in lying and covering up, that it became more convenient for USADA to simply cover up the 20% margin of error ("it was a mistake!") than to take the time to understand that it might well be the real margin of error at LNDD.

Or perhaps we should take USADA at face value, and assume that the stated 20% margin of error WAS a mistake. But since the ± 0.8 margin of error seems to be based on this 20% calculation, we might also conclude that the LNDD's stated margin of error of ± 0.8 is ALSO a mistake. We would then ask what the real margin of error might be at LNDD, and whether this margin of error was ever determined and accredited.

We might also ask questions about the accreditation process. The margin of error for a particular test is critical to understanding the test. If the 20% margin of error was correct all along, how was it that LNDD persuaded the accrediting body to change this margin (and to testify before the CAS that the 20% figure was a mistake)? If the 20% margin of error was itself an error, how can we explain how the accrediting body made such a critical mistake, and how is it that LNDD did not notice the mistake until the Landis team pointed it out? Can we really take the accrediting process seriously as an assurance of lab quality, as the CAS did in its decision, when both the lab and the accrediting body have acted so casually about a piece of data this critical to the overall process?

Personally, I conclude that the ADA system is so flawed that it's difficult to draw conclusions from anything that happens there. But that's my subjective conclusion, and I encourage you to reach your own conclusions.

[edited for formatting!]


18 comments:

wschart said...

It makes no sense to me to apply the margin of error, whether 0.8 or 20%, to the C value. The C value is a simple calculation, whatever errors occur there come from the errors in the A and B values. Whatever the size of those errors, it means that the value of A can be more or less than the stated value, same for B. There is no gaurantee the errors will be in the same direction for A and B; so if you add the margin of error to A and subtract if from B you get a different C than the opposite case.

Of course, it is possible that this margin of error, whether it is 0.8 or 20% was determined by having the lab run a number of tests on a sample and comparing the final results (C value) with the known value of the sample. So it would help to know not only what the value is of the margin of error, but how it was determined.

I would think, however lax LNDD might be regarding paperwork, the accreditation agency would have such paperwork and could clarify what the situation is, so we wouldn't just have USADA's word.

Larry said...

wschart, you're right, there's not going to be an error in the computation of C: that's just a matter of subtracting B from A. The error is going to be in the measurement of A and in the measurement of B.

But this doesn't tell us whether the margin of error is supposed to be applied to the A and B values, or to the C value. I don't think there's any question that the +/- 0.8 is intended to be applied to the C value.

As to how the +/- 0.8 was determined ... the new documents indicate that LNDD established the delta-delta uncertainty by analyzing a single urine pool 30 different times over 7 months, and calculating (Four means and standard deviations. See testimaony of Dr.Buisson qoted at CAS opinion p. 47. The results of this validation data is upposed to be shown on Exhibit 26 at LNDD 0451-0457. I have not looked to see if these documents are posted on line.

whareagle said...

And once again - how does one go about petitioning the governing body that hands out licenses and checks for standards to be kept, to have this lab's certifications pulled?

This IS criminal.

Larry said...

TBV, thanks for the editing.

whareagle, peut-etre tu a les amis en France? Non? Merde. Because sitting here in the U.S.A., I cannot imagine any way to pressure COFRAC to revoke LNDD's accreditation. (but maybe you can speak French better than I do)

wschart said...

OK, I can sort of see the logic here: we run a bunch of tests on a sample, and calculate the standard error. What they are saying is that, when everything is "averaged out", this is how much error we have. And it very well could be that 20% = 0.8 for whatever concentration their test solution was. And this would be based on their C values.

Or could it be possible that they ran the 30 tests, calculated the standard errors for A and B each separately, then used all possible combinations of A and B to calculate the range of C values possible for each test, and finally calculate the standard error for C based on this?

In any event, it could be that the concentration of the solution they tested was such that 20% = 0.8. As you have said, at a C value of about 3.8 we switch from a situation where applying the margin of error in the athlete's favor will put the C into the non-negative range to where applying the margin of error still leaves the C in the forbidden range.

Whatever, from a statistical standpoint, I'd say if the margin of error is calculated from the C values obtained in the certification trial, then it would be applied to the C value of any given test under question. Whether or not this approach is valid is another question. It still seems to me that whatever errors are in C are a result of the separate errors in A and B, and hence any adjustment should be done on A and B. But then I am not an expert in these things.

At any rate, whatever the method, it should be clearly stated so that there is no question about how to apply it and whether any particular results are negative or note.

Larry said...

wschart, great post.

One thing I learned from writing the "Curb Your Anticipation" series is that the setting of an applicable margin of error is a serious and complicated matter. Not for the faint of heart.

I don't know about your statement that the margin of error should be applied to A and B and not C, given that any actual ERRORS will result from mismeasurement of A and B and not from calculation of C. However, my guess is that a lot of lab measurements are themselves the result of numbers of submeasurements behind the scenes. For example, we know that one of the factors that goes into the measurement of A and B at LNDD is the so-called "manual integration" performed by the lab technicians, and that this manual integration has the potential for error.

Are we saying that to properly establish a margin of error for the CIR measurement, the lab should identify all of the steps taken to produce that measurement, assign a margin of error to each step, and then sum the margins for each step to produce a master margin? That's an interesting idea. My sense is that this is not what labs do -- I think labs estimate a method's margin of error based on ultimate method results and not on the potential for error in each step along the way. But I'm far from expert here, and I see the logic in what you're arguing for.

Your speculation has caused me to consider how the LNDD came up with its estimated margin of error, which was to measure the same sample 30 times over a period of 7 months. I wonder, did the lab use the same technician to perform these 30 measurements? Did the lab technician or technicians KNOW that they were measuring this same sample over and over, did they know WHEN they were performing a measurement for purposes of estimating margin of error (as opposed to measuring a real athlete's sample)? Because if I take a measurement of the same thing 30 times to make sure I get it right, chances are pretty good that my measurements are going to come out the same way all 30 times -- and if there's an error, you won't be able to see the error in any variation in my reported measurements.

wschart said...

Larry;

As I reflect on things further, I can see the other side of the coin now, that since even A and B themselves might be results of calculations on other values, that perhaps simply calculating a margin of error for the final results makes sense. On the other hand, there is the principle that the number of significant digits in a final calculation cannot be more than the least number of significant digits in any of the numbers used in the calculation. Not quite the same, admittedly, but the idea is that when you make calculations with numbers that involve errors, the errors tend to compound. But if the LNDD margin was calculated based on final and not intermediate results, that is how it should be applied. I'll leave the final decision on how valid this might be to other more expert in these things.

Another question I would have is what exactly they are trying to get at with this margin of error. Error can arise in the equipment itself, even if the lab techs do everything absolutely correctly. Then there is the error that results because it is impossible for human beings to do everything absolutely perfectly every time. Then there is the error that arises because we as humans sometimes deliberately take shortcuts in procedures, etc. If you just want to show the margin of error for the particular equipment as used by LNDD, then it might make sense to have your best tech do all 30 tests. However, if you want to factor in the various imperfections of the various lab techs (and such imperfections exist for all techs in all labs, so I am not casting aspersions here) then you would want to spread tests over all the techs.

Maybe there are documents that are available to the powers that be that we mortals don't have access to, or simply aren't aware of to clarify these things. But what I see is that WADA, USADA, LNDD, etc., seem to be keeping some things under wraps. This can lead to suspicion that not everything is above board. I, for one, would like a lot more transparency in the process. I don't think that WADA or LNDD has to hide any process with the idea that, if the dopers know how we test they will know how to beat the test. If their methods are so sound in principle and execution, then they should be quite willing to put them out there for all to see.

Thomas A. Fine said...

I remember a discussion at DPF, back when they bothered having such discussions, about how the error should combine. Some were of the opinion that the margin of error probably represented a gaussian distribution to some number of standard deviations. In that case, simply adding the margins of error together is not appropriate, because this would change the number of standard deviations included in the error.

Of course, the margin of error may represent simple maximum possible errors, in which case simple addition is appropriate.

I can't remember if or how this was resolved on DPF.

tom

m said...

The reason you all are having such a confused time applying a 20% margin of error to the IRMS delta delta measurements is that it doesn't typically apply to such measurements. IT WAS A CLERICAL MISTAKE by Cofrac in the accreditation docs - that's all. The proper measure of error was +/- .8 delta units as found or assumed by both CAS and the AAA panel.

The 20% figure was first raised by Suh in the cross examination of Ayotte in the first arbitration. Ayotte said the 20% figure was a mistake, and she would have no idea how to apply it if it did apply. Ayotte stated that the proper measure of uncertainty should be expressed in the units of measure in question, or in this case, delta units.

Before CAS the 20% figure is just part of the argument that Cofrac didn't properly accredit the IRMS method. Landis by my recollection has never claimed that applying the 20% margin of error, whatever that means, eliminates his -6.13 delta violation.

USADA argued that the 20% figure was a mistaken reference to the margin of error for the T/E ratio, e.g. the 4 to 1 ratio. This makes sense. Look up "coefficient of variation" (mean divided by standard deviation) and you will find that it is used as an error measure for ratio-scale measurements like the T/E ratio, and that it is expressed as a percentage.

The .8 delta units is clearly the proper error measure to be applied to the IRMS measurements in this case.


BTW hello to all. Read the CAS decision and agree with it's results and basic reasoning.

Larry said...

M -

YOU'RE BACK!!!

Welcome home, buddy. We really missed you. Even me.

If you read my article, you'll see that I said that a 20% margin of error makes no sense in this context. You'll see that I said that it would have been necessary to specify how and where to apply the 20%.

The strange thing, M, is that if you apply the 20% in the only way that makes sense, it perfectly translates into the +/- 0.8 margin of error. In other words, a delta-delta of -3.8 is the largest possible delta-delta to translate into an AAF if you apply a 20% margin of error. How do you explain that, if the 20% was a mistake?

You have to admit, if the 20% was a mistake, it's a peculiar coincidence that 20% is precisely the percentage by which a delta - delta could be off by -0.8 and still be an AAF.

It doesn't seem like an error to me.

Which is why it's so peculiar that USADA and LNDD concluded that 20% WAS an error.

Read my article.

And welcome back.

Michael said...

So we've run up against the same wall: There are no printed SOPs or technical records [available to us] to verify that the lab performed the test as designed, or if the test is properly designed and fit for purpose. Therefore, the lunacy manifests itself in the CAS decision being supported by WADA rules. Anything can be made up ex-post-facto to fit the desired results. Not fair, not logical, and certainly not supportable outside the WADA realm, but apparently satisfactory to the power brokers in Olympic sports.
Is margin for error important? Absolutely. Is it an intangible concept? No. So why does it take smart people so much time to figure out what they meant? Because they are making it up as they go. Isn't that clear?

m said...

Larry,

That 20% happened to equal .8 delta units at a delta of 3.8 is just coincidence. The validation studies were performed over a range of values, probably on a pooled urine sample with a delta value close to zero. That is they did not spike the sample with exogenous testosterone. The .8 delta unit was most probably the standard deviation of the mean values over the range of the pooled urine sample.

Michael,

The SOPs and validation study for the error margin were available to Landis and the CAS panel. Yet Landis never argued that he would not have violated the -3 delta standard if a 20% error margin had been used. Therefore we must conclude the error margin was and is irrelevant to the legal and factual issues in this case.

Larry said...

M, that 20% = +/- 0.8 is not a coincidence. It is, at best, a strikingly remarkable coincidence. I mean, what are the odds that COFRAC would have goofed, stuck in a margin of error for a different test at LNDD, and that margin of error actually turns out to equal the margin of error that they meant to include? Now, THAT would be an interesting exercise in statistics.

Then we could combine this odd coincidence with a number of OTHER coincidences in the Landis conviction that defy the odds. Like Mr. LePetit's amazing sample case of GC columns. Not to mention Landis' biochemical ability to concentrate his metabolism of exogenous testosterone into a single metabolite, while still being able to metabolize other substances into the three other metabolites measured by the CIR test. How the heck does he do that?

M, I've been a lawyer long enough to know that there is such a thing as coincidence, and that, er, THINGS happen. Still, the fact that all of these coincidences seem to break in favor of USADA and LNDD is ... well, it's against the odds.

I'd like to find ONE of these coincidences where I could say to myself, no way, this is impossible. But none of these coincidences are impossible. They're all highly unlikely, but none of them are impossible. If they're all true, then man, this is one wacky case. If they're not all true, then hats off to USADA for inventing stories that hug the line of impossibility without ever crossing over.

I hope you'll stick around. I'll admit, there were times when I thought your posts were becoming a bit too personal, but it's more fun here when I can argue against you.

Care to defend the use of blank urine to identify IRMS peaks under TD2003IDCR? I read USADA's defense of that practice, and I thought to myself, M's peak anchor argument was a lot better than this.

m said...

Larry,

Are you seriously claiming that the proper error measure here is 20%?

Landis is not. There was no scientific testimony on his behalf claiming such as far as I can recall. Nor did he attempt to make it a legal issue in the case.

As Ayotte testified there is no certain and meaningful way to apply the 20% error margin to the delta measurements. It doesn't make sense to use such a percentage error measure.

As to the identification issue and the use of the blank urine and mix cal acetate, I think the USADA arguments, and more importantly the new expert testimony, pretty much mirrored my own analysis as to why there was no doubt that the 5A and 5B metabolites were properly identified. Even the Landis expert conceded that there could be no switching of peak order because of the different ramp conditions, so that once you anchored the 5B by the retention time in the IRMS Mix Cal Acetate, there was no doubt about the following peak being the 5A. They nailed that one shut.


If I have time I may post something about the overall decision. I do think the assessment of costs was warranted, and found the skepticism regarding both the Landis fraud allegations and the partisanship of the Landis experts well supported by the record. Under both state and federal law provisions, the facts constituting a fraud allegation must be specifically plead and proved, and sanctions can be awarded for failure to do so.

Larry said...

M -

Please understand, I'm still trying to get my hands around all of the new information available on TBV. By no means have I read it all. And to be sure, there is much NEW information. More may be coming.

I acknowledge that USADA has presented evidence that the 20% margin of error included in the COFRAC accreditation was itself an error. To be certain, I've seen no evidence in the case to the contrary.

I don't know what it means that the 20% margin of error, which the evidence says is not the right margin of error, happens to translate logically into a margin of error of +/- 0.8. Should I conclude that the +/- 0.8 margin of error is also an error?

Of course, my preference would be to consult the LNDD's validation studies, but they're not available to me and I don't think they were made available to the Landis team. We know a bit about those validation studies, and based on what we know they're inadequate to determine a margin of error ... but maybe we don't know the whole story and other studies were performed to complete the picture.

In any event, even if the LNDD's margin of error was something other than +/- 0.8, and even if we could prove that the correct margin of error was much higher (high enough to exonerate Landis), it probably would make no difference. Computation of a lab's margin of error is a method validation issue. It's close to impossible for an athlete to argue that a lab's methods were not properly validated. I think that the only lab method validation arguments available to an athlete would be one, that the lab's validation actually provided for a different margin of error, or two, that the method in use at the lab was not actually validated. The Landis team focused on the second approach, as you seem to know.

Arnie Baker argued in the new wiki defense that a 20% margin of error exonerates Landis. I mentioned this in my article. I don't know for certain that the Landis legal team failed to make this argument.

Regarding peak switching and temperature ramps, yes I've also seen the footnote in the USADA brief citing a statement from a Landis witness to the effect that changes in the temperature ramp cannot affect the order in which peaks elute. The statement in question has not itself been posted on TBV. However, we NOW know that as a general matter, changes in temperature ramps CAN change the order in which peaks elute -- no less an authority than Agilent (the manufacturer of the GC machine used at LNDD) says so. See the following statement at Agilent Guide:

"A Warning When Adjusting Temperature Programs When changing a temperature program, confirmation of peak identities in the new chromatogram is essential. Peak retention orders can shift upon a change in the temperature program (called peak inversions). Peak misidentifications or an apparent loss of a peak (actually co-eluting with another peak) are common results of undetected peak inversions."

Perhaps peak inversion is a general problem, but the specific conditions for the S17 test would have precluded any possible peak inversion. I can't say. Perhaps when the statement of the Landis witness cited by USADA is made available for viewing, we can reconcile the statement with the above warning from Agilent.

If USADA argued your 5B anchor theory, I've missed that so far. Clearly, USADA hung their peak identification argument on reference to the blank urine. That surprised me. You and I actually did a reasonably good job (as it turns out) of exploring the peak identification arguments available to both sides, and I thought you had MUCH better arguments than the one relying on the blank urine.

In theory, blank urine is just as difficult to analyze as an athlete's sample, so I don't understand yet how LNDD would have definitively identified the IRMS peaks in the blank urine. Perhaps the Mongongu study referenced in the CAS proceedings provides the answer, though from the little I know so far about the Mongongu study, it doesn't seem to help.

My preference at this point would be to explore the evidence with you, and not argue about it. There's too much new information for me to want to stake out hard and fast positions that I may not be able to defend later on. And to be honest, if you could prove to me that Landis' AAF was proper and properly upheld, you'd be doing me a favor, as I'd be able to move on with my life.

m said...

Larry,

Just a pointer to Dr Goodman's statement conceding no switching of peaks.

It's at paragraph 92 of his statement (posted on TBV) where he observes that the temperature ramps were different.

"The result of this difference (in ramp times) is that the RT and RRT (but not the order) of each ..target analyte, are not comparable between the two systems."

BTW the LNND SOPs specifically require these different temperature ramps (the testimony was for better peak separation in the IRMS). There's a very good explanation by one of the USADA witnesses (Mathews, the guy who invented IRMS) about why different temperature ramps are needed for GCMS versus IRMS and why the RTs and RRTs will not match and can't be simply corrected for by some linear add on of time.

Larry said...

M, thanks for the pointer, and also thanks for avoiding argumentation at this point in an effort to better understand the testimony and the science. Your effort is much appreciated on this end, and I will try to reciprocate.

I have looked briefly at the Goodman and Matthews statements, and will look at them both more closely later on. As we'd expect, they don't agree. Goodman is very critical of LNDD's use of different temperature ramps, though he does clearly state that temperature ramps won't cause peaks to change position. No way I can see to reconcile Goodman's statement with the statement in the Agilent handbook that temperature ramps CAN cause peaks to change position. I will concede that Goodman's statement trumps the statement in the Agilent materials, as the Goodman statement is specific to this case, but I'm left wondering why the experts don't agree with the manufacturer of the equipment.

From what I could see, Matthews justifies the use of different temperature ramps by the need for a bigger push to get the samples to elute through the column in the IRMS test (the column there being connected to a source with above-atmospheric pressure). If he discussed the relative accuracy of the two temperature ramps, I missed it. In fact, the suggestion in the Matthews testimony is that the IRMS temperature ramp is LESS accurate than the GC/MS ramp, because it pushes the sample faster.

I wonder how great a difference it would make to peak elution time, that the column end is attached to a source with below-atmospheric pressure for the GC-MS, and the end is attached to a source with above-atmospheric pressure for the IRMS. After all, we're talking about columns that are very long with very small diameters. I think it would be like my trying to drink a diet Coke through a 30 foot long straw. I could suck as hard as I liked on one end (or blow as hard as I wanted on the same end) to imitate above and below atmospheric pressure, and how much difference would it really make? I can't imagine getting any Diet Coke or blowing any bubbles in the Diet Coke, no matter how hard I tried.

Larry said...

To all:

The recently posted material here on TBV contains what purports to be the validation study performed by LNDD to come up with the +/- 0.8 margin of error for the CIR test. I'm not a statistics guy, but from what I understand, the match checks out.

The material is in the Exhibit 26 document, on pages 0451 - 0457.

LNDD computed margin of error based on a statistical concept called "expanded uncertainty", which is referred to in ISL rule 5.4.4.3.2.2. I wrote about this concept in part 8 of the "Curb Your Anticipation" series. The concept is supposed to measure uncertainty by defining a range within which we can expect to find a particular measurement result. The concept is complex (at least to me), but in practical terms, it seems to require LNDD to figure out the margin of error within a single standard deviation and multiply it by 2. So, LNDD computed a delta - delta margin of error of +/- 0.4, multiplied this by two, and came up with +/- 0.8.

The ISL expanded uncertainty is supposed to reflect a level of confidence of 95%, and if you look at the values on LNDD 0451 - 0457, you'll see that roughly 95% of the values shown there fall within the +/- 0.8 margin.

So as I say, the number look right.

One odd thing. We've discussed here that the delta - delta measurement used to determine an AAF is essentially the equation A - B = C, where A and B are measures of the isotopic values of different testosterone metabolites in Landis' urine. We've also discussed whether margin of error should be computed with respect to A and B, or with respect to C. The logic behind computing margin of error on A and B is that this is where the actual measurements are taking place that involve a potential for error. In contrast, the computation of C is simple subtraction, and the lab is not going to make an error in subtraction.

From the evidence, we can see that the lab computed margin of error based on the standard deviation in the computation of C. Whether this is the right or the wrong way to do things, I cannot tell you. However, it is INTERESTING to note that the paperwork in LNDD 0451 - 0457 ALSO shows the standard deviation calculation for computation of A and B.

Again, I'm not a statistician. Check out the documents for yourself and come back here to correct me if you think I'm wrong. But it looks like the margin of error for calculation of A and B separately is just about the same as the margin of error for calculation of A - B. The standard deviation calculations for A and B separately are shown on LNDD 0453 and range from 0.25 to 0.41. The standard deviation calculations for C are shown on LNDD 0456 and range from 0.32 to 0.40. The LNDD took the highest standard deviation shown on LNDD 0456 and multiplied it by 2 to get its margin of error of +/- 0.8. Following this logic, the margin of error for computation of each of A and B is ALSO +/- 0.8.

Strange. If A has a margin of error of +/- 0.8 and B has a margin of error of +/- 0.8, wouldn't C HAVE to have a higher margin of error than +/- 0.8?

Anyone out there got an answer?