Wednesday, November 22, 2006

Absolute Variability Doesn't Matter

Over the last week, there has been discussion in Dr. Baker's presentation and many blogs and forums about the meaning of the different readings given in many of the T/E test results. It feels wrong to have numbers that change a lot. This was one of the points made in our post revisiting the LDP earlier this week.

Marc was so bothered by the issue he went seeking some expert advice, and he files the following exchange in which things were explained to him, slowly. We hope to show readers why the variability argument used doesn't look good to non-partisans, and is probably not the hook on which to hang the defense at hearing.

For examples where this variability has been used, see Slide Show 2.1 pages 13, 14, 15, 16, 17 and 18.

Read on...

By Correspondent Marc

As readers know, I have been seriously troubled by the lack of uniformity in the absolute results of the LNDD lab tests (the raw numbers the tests produced). We have all been raised believing that one of the crucial criteria of scientific validity was the repeatability of experiments. Did getting different results from the same tests (and, it seemed, the same samples) invalidate the conclusions drawn from those tests?

This seemed so vital an issue for this entire case that I have not only read a great deal in posts and links to the blogs and forums interested in this case, but I've bugged obviously knowledgeable posters to re-explain things to me. And, in the end, I decided to
impose on a friend of mine who is a professor of biochemistry at a major American research university. Steroids are not his field, but he has decades of laboratory experience with other complicated biological compounds, with sophisticated instruments, and with managing a large research staff.

Below are the questions I asked him and his answers. I agreed not to identify him for one reason only. This whole case is, as I've discovered to my chagrin, a real tar baby. I didn't want my friend to become tangled with the tar baby as I have been, and if he were identified, people would send him further questions, and he would feel it his responsibility to answer, and there you'd have it: stuck with the tar baby.

In the spirit of Galileo's dialogues, I'll identify myself as “The Dummy,” and my friend as “The Scientist.

The Dummy: What worries me in general is that no one seems to care that the absolute values that are produced by these tests vary so widely. I recognize that preparing the samples for different tests may produce different absolute values, but don't we want to see the same results every time we perform the same test?

The Scientist: I understand they are using GC/MS to test for the presence of testosterone and epitestosterone, two steroid hormones that have related structures and therefore related properties in the GC/MS analysis. So the premise is that, whatever is done to the samples, these compounds would have related chromatographic and MS properties. So treatments or instrument variation would not affect the relative concentrations. They would only affect the absolute concentrations, and the absolute values do not matter.

The Dummy: But how can we just write off the absolute results?

The Scientist: I can see that you're hoping that because there are such wide variations in the absolute numbers the whole analysis is crap. Unfortunately, I think the most reliable thing that can be measured is the ratio of two related compounds. That would be the measurement that would be least affected by sample treatment, machine variation, human factors. Since epitestosterone is obviously closely related to testosterone, it is unlikely that anything that is done to the sample would affect one differently from the other. If the ratio remains consistent, then that result should be reliable across treatments, instruments, whatever.

Unless there's a specific reason that the absolute numbers must be important in this case, I think they should be ignored because they are not reliable. The ratio would be most reliable.

If an absolute concentration is needed, the only way to get this accurately would be to add an internal standard to the sample before any treatment is carried out, and then relate everything back to this. I think it is very difficult to get absolute numbers from samples which are treated differently and analyzed by different instruments. If absolute concentrations are important, then this sounds like a big mess.

The Dummy: But look, I'm still worried about non-repeatability. Suppose you took a portion of the original sample, prepared it for a specific test, divided that prepared portion into two sub-portions and ran the same test on those two sub-portions, wouldn't you expect to see the same absolute results, barring some major difference in the sensitivity of the instruments doing the measuring?

(I ask the question explicitly because something very like this is what was done with aliquots of the A sample for two of the screening tests --USADA 54 and USADA 57--and two of the confirmation tests--USADA 92 and USADA 212.)

The Scientist: The answer to your questions is maybe. You're thinking abstractly and not real-world.

If you have a mixture of unreactive and stable chemical compounds and the instrument was perfectly calibrated and run identically then you should get identical results, within some limits of error for the detection, from a split sample or a sample run on subsequent days.

Unfortunately, that it is not what you have when you're dealing with biological samples and complex instruments. Biological samples are not stable, they degrade, precipitate, react. For example, the steroids are lipids which are not all that soluble in water. They are probably complexed with proteins. Over time this mess will precipitate. The samples are also probably treated to prevent this but who knows how well it works. Also there may be enzymatic or chemical reactions occurring that degrade the samples.

The only thing you can really hope for is that the two related hormones have identical properties so that they will react or change identically under whatever circumstances they encounter--which they appear to do.

As for the instrument, if it was run identically, it should give identical results, again plus or minus some error which is related to the instrument detection. If the results are different, i.e. outside the error limits, then it is not being run identically.

What you have to realize is that the culprits here are the variables themselves.

The Dummy: So you don't think that the different absolute results in identical tests on identical--or very similar--samples could be used as evidence of mishandling of the samples? (If it could be made, this would clearly be an important piece of the defense case.)

The Scientist: I cannot tell if the samples were mishandled or what that even means. But I am not sure what could be done to alter the ratio, and if that is the important measurement, then it probably does not matter what they did to the samples other than purposefully changing them to get the result they wanted.

Now if there was fraud involved, then all bets are off.

You're not suggesting fraud, are you?

The Dummy: No. Nor do I know anyone who is.



marc said...

A personal aside on the significance of that discussion I had with my biochemist friend. Assuming my friend is right (BTW: this being science, we make no such assumption--at this very moment busy TbVers are trying to find other experts to confirm or refute his statements), does that mean that the pro-Landis case is toast? No. All it would mean is that the simplest argument about the absolute results of those tests is toast. It would mean: don't waste your time on losing arguments; go look for others. There are always other questions to look at, usually too many. So this discussion is probably a good thing in that it directs us to look elsewhere for a more productive challenge to the lab results. Several other directions have already been proposed on this site and on DPF, and I'm sure that a fair number of people are seriously looking into them as we speak.


Duckstrap said...

Broadly, I would agree with your friend that the absolute numbers are not the critical quanties measured--the TE ratio is. EXCEPT, as the absolute numbers have something to tell us about how consistently the assay is performed, and the reasons for the inconsistentcy. I would also agree that 20, 30% inter-assay variability might not be that unusual; 3-fold variations in repeat determinations on the same sample would be not be expected. The ostensible reason for the A sample to be so high, according to the lab documentation is apparently an "inhibitor" that interfered with the derivitization of the internal standard. Obviously this inhibitor did not affect everything in the sample to the same extent, which is a key assumption your friend makes. If whatever causes variability in the absolute levels does not affect everything the same, then all bets are off. Since, they apparently followed an identical protocol for all of these confirmatory assays, the results should be reasonably similar. I believe you (Marco) have noted that the lab thought they had found evidence of interfering substances in the screening assays as well. Perhaps this is their way of attempting to explain the discrepancy between the screening and confirmatory TE results. Nonetheless, it is a reasonable question to ask how they know it is gone, and how they know it affects T and E in the same way.

Those things said, your friend (and you), are right to think that it would be foolish to for Floyd to hang his hat on this one single discrepancy. It is one piece to the puzzle, and I believe there are others that are stronger.

marc said...

Hi duck,

It's a good thing most people who read TbV also read DPF and Rant, and vice-vice-versa, otherwise we'd have to come up with some fusion site to list all the cross-talk.

I'm with you on most of what you say, though I think in one sense I didn't make what my friend said clear. You say, "Obviously this inhibitor did not affect everything in the sample to the same extent, which is a key assumption your friend makes." You're right the inhibitor (if there was one--or some mishandling of the sample if there wasn't) didn't affect everything the same way, since apparently certain molecules were suppressed completely!

But what my friend was saying was that he'd be surprised if T and E didn't react in the same way to whatever interference there was since they're so similar as molecules. In fact, it looks like he might have been right, since the second screening test (which has no indication of any inhibitor noted) returns a very similar T/E ratio to the inhibited screening test (5.1 vs 4.9) even though certain steroids in the uninhibited (?) test are reporting at 10x their values on the inhibited test.

To all of that I should say, "If either test was inhibited," because, like you, I'd like to know what the evidence of inhibition was, and, like you, I'm suspicious that a claim of "inhibition" could be used to cover other sorts of errors.

At this stage, though, I'm most interested in seeing you pursue some of the other tracks you've suggested on DPF--why didn't LNDD produce their certification that they identified the right molecules? is it possible some of the later-tested samples were spoiled in earlier preparation stages? And I want to see some other tracks that have been mentioned here and elsewhere pursued too.

Good hunting.


Anonymous said...

ORG here …

I have a basic question about this and other conversations, please correct me if I’m wrong.

As I understand it, you cannot argue the science of the test. You can only argue is the test was bad. Since the ADA believes the science is absolute and you cannot argue it, then the regulations are absolute as well. Any violations of absolute regulations mean Landis walks, no exceptions.

So, if the rule on variability is 20% and Landis sample is three-fold, case over. It cannot be used. To drag into the discussions scientist opinions about what variability means is to open the case to discussing the science behind the test which you cannot do.

Further the arbitrators are not scientists. These conversations are going way over their head. Their lawyers and the case will be won or lost at their level.

As I see it, the case is going to be about interpreting rules. If contamination is above 5% and Landis’ sample is 7.7%, case over. USADA and Jacobs will then engage in a parsing of the regulation.

If the TE variability is 20% and his is three-fold, case over. USADA and Jacobs will then parse the regulation.

If the LNDD cannot show WRITTEN regulations as to what the criteria for a positive CIR test is, case over. Then USADA and Jacobs will argue a bunch a lawyerly stuff.

It seems we are basing our beliefs and discussions about the science and not the regulations. From a written regulation standpoint, it looks like Landis has a solid case. From the court of public opinion, these arguments about the science will help answer the question as to whether he actually doped but will have little bearing in his arbitration hearing.

Again, where is this argument wrong? said...

ORG, I think your suffering a temporary case of something Mrs. TBV accuses me of from time to time, which is being a literalist.

If this were a courtroom, what you describe is probably what would happen. But it's not a courtroom, and the history of doping proceedings is that the arbitrators interpret things in the way most favorable to reaching the conclusion that whoever was a dirty doper. When a literal reading of a rule helps, they do so. When a literal reading does not help the athlete, the don't, and take a "doesn't matter to the result" position. If the science is attacked, that doesn't isn't allowed, or doesn't matter; etc.

For a reference on the blowing off literal rules, there is a case of an athlete whose sample was stuck in customs for a week, unrefrigerated, and this was deemed not to have mattered to the test result.

There are very few examples of athletes winning. Therefore, there's is no choice to attack every angle, and not to stop when you think you've got a winner with one.