Monday, August 25, 2008

Larry's Last Call (?)

Larry sends the following musing, and wonders if this may be his last opus on L'Affaire Landis.



Bourbon and Beer:

Why The Landis Positive Test Results May Be Meaningless

(An Exercise in Hubris by Larry)

Here on TBV, we've spent time and effort trying to explain how Floyd Landis could have tested positive for testosterone doping without having doped with testosterone. We've focused on things that could have caused the positive test: it might have been the beer he drank, or the cortisone he took for his ailing hip, or small but mysterious "blips" we can see in his test results. We've also considered a few different lab mistakes that might have caused the positive test: failure to correctly identify the substances in Landis' urine, or possible contamination of these substances, or overlaps in these substances that might have resulted in the labs improperly measuring two substances at once. Yours truly has recently considered a theory that Landis' cortisone treatments might have impaired his liver function, which might have affected his metabolism of testosterone and thrown off the test results. Each of these theories is impossible to rule out, and impossible to prove.

Moreover, each of these theories assumes that something somehow went wrong with the lab tests.

Until recently, I had not considered a second possibility. (Thanks to Tom Fine for suggesting this possibility to me.) Maybe there was nothing wrong with how the lab conducted these tests. Maybe the test measurements are 100% accurate. And maybe, just maybe, these test results are meaningless. Maybe, just maybe, Landis' system, his metabolism, his biochemistry, were able to NATURALLY produce the results measured by the lab.

[MORE]


To explain this possibility, I need to do a couple of things. First, I need to discuss scientific questions that I do not fully understand. I am a lawyer, not a scientist. Hopefully, this article will stimulate debate among the scientists and scientifically inclined on this site, and my discussion will be corrected and supplemented as necessary.

I also need time to tell this story. It is a story that will look at scientific studies performed on three continents. It is a story that will consider the marvelous and maddening diversity and complexity of human biochemistry, of how the human system refuses to obey simple rules of thumb. In simpler terms: this is a story of bourbon and beer.

I've actually cut this story short. There is more to tell, if people are interested.

An Introduction and Some Background

Let's first discuss the background of the Landis case, and define some terms.

Floyd Landis' problems began after he won stage 17 (S17) of the 2006 Tour de France (TdF), As the winner of S17, Landis submitted a urine sample to TdF doping control. The urine sample was divided into two portions, an "A" sample and a "B" sample. The two samples were delivered to the French anti-doping lab (LNDD) for analysis. LNDD tested the "A" sample immediately, and froze the "B" sample to be tested later if necessary.

LNDD began its analysis by performing general screening tests on the S17 "A" sample. These tests indicated that LNDD needed to perform additional tests to determine whether FL doped with exogenous testosterone. Exogenous testosterone is testosterone that comes from a "doping" source like a cream or an injection, in contrast to the endogenous testosterone that is naturally produced by the human body. .

LNDD then performed a "T/E" ratio test to check the ratio of testosterone to epitesttosterone in Landis' S17 sample. According to LNDD, this ratio exceeded the 4:1 ratio established under the rules of the World Anti-Doping Agency (WADA) as the threshold ratio for a doping offense. LNDD then performed a second test on the S17 sample, called a "carbon isotope ratio" (or CIR) test. We'll discuss the CIR test in some detail in this article. According to LNDD, the CIR test indicated that Landis had doped with exogenous testosterone. Once FL learned that his S17 "A" sample had tested positive for doping, he exercised his right to require LNDD to test the S17 "B" sample. For the "B" sample, LNDD was required to perform only the "T/E" ratio test and the CIR test. LNDD concluded that its "B" sample testing confirmed the results of its "A" sample testing.

A positive anti-doping test result is sometimes referred to as an adverse analytical finding, or an AAF.

Landis exercised his right under the WADA rules to challenge his AAF in an arbitration proceeding. Prior to the arbitration proceeding, the LNDD performed the CIR test on other urine samples given by Landis during the TdF, and some of these samples also tested positive for exogenous testosterone. The arbitration panel ruled that the LNDD had improperly performed the T/E test, but he panel upheld the AAF on the basis of the CIR test performed on the Landis S17 urine sample.

One further definition: in this paper, we use the term ADA to refer to the various national and international anti-doping agencies. WADA is an ADA, as is the U.S. Anti-Doping Agency (USADA) that prosecuted Landis' AAF case.

Testosterone is Testosterone

Now that we have discussed the background of the Landis case, we can dive into our analysis of why the Landis test results may add up to nothing meaningful. A good place to begin this discussion is with a question: exactly what did the LNDD find when it performed its CIR test?

If you've followed the Landis case, you've probably read that the CIR test "discovered" exogenous testosterone in Landis' urine. These statements are incorrect. The CIR test cannot "discover" exogenous testosterone, for the simple reason that no such discovery is possible. Exogenous testosterone IS testosterone, and testosterone is a natural substance - all human beings naturally produce testosterone. From a chemical standpoint, artificial and natural testosterone are identical. If you could somehow place a molecule of natural testosterone side by side with a molecule of artificial testosterone and examine them both down to the most minute atomic and subatomic level of detail, chances are that the two molecules would be identical in every way. Moreover, even if you could spot a difference between the two molecules, the difference would not indicate whether one molecule is natural and the other is artificial. In simplest terms, testosterone is testosterone, regardless of where it comes from or how it is made.

Testosterone is a good illustration of why doping testing is so difficult to do. For the most part, modern athletes don't dope with artificial substances. They dope with natural substances like EPO, and insulin, and human growth hormone (HGH) ... and testosterone. We consider these practices to be doping, not because the substances are unnatural, but because the doping substances are produced outside of the human body.

To detect doping with a natural but exogenous substance, the ADAs must find some property of the exogenous substance that differs from the natural substance. For some forms of doping (like autologous blood doping), the ADAs have not yet discovered any such property. For a while, the ADAs could not find any such property associated with exogenous testosterone. Instead, the ADAs focused on the amount of testosterone in an athlete's system, reasoning that athletes could not naturally produce endogenous testosterone above a certain level. But this assumption proved to be incorrect in a lot of cases, and the scientists continued to search for another property that is characteristic of exogenous testosterone.

In the 1990s, the scientists announced that they'd discovered such a characteristic property. The property they discovered is a characteristic feature of all biochemistry, and is one of the differences between bourbon and beer.

Bourbon, Beer, and Carbon-Based Life Forms

Human beings consist primarily of three elements: hydrogen, oxygen and carbon. These three elements have the capability of joining together into a dizzying array of complex molecules, and the processes that sustain life rely on the ability of an organism to create and manipulate these molecules. Testosterone is one of these molecules, made up of 19 carbon atoms, 28 hydrogen atoms and 2 oxygen atoms.

About 99% of the carbon on earth is carbon-12, or C12. C12 has an atomic weight of 12, with an atomic nuclei containing 6 protons and 6 neutrons. However, a small amount of the carbon on earth is C13, with 6 protons and 7 neutrons. You can think of C13 as slightly heavier than C12. C12 and C13 are pretty much interchangeable. Every molecule (like testosterone) that contains carbon might contain C12 carbon or C13 carbon, or both.

Speaking generally, the processes of life prefer C12 over C13. This preference varies, depending on the chemical process in question (as we'll see in a minute). For some chemical processes, the lighter nature of C12 means that it takes less energy to work with a C12 atom than with a C13 atom. For this reason, living things tend to have more C12 and less C13 than non-living things, because life generally prefers chemical processes that require the least amount of energy.

The preference for C12 over C13 is a general rule. Not all biochemical processes prefer C12 to the same extent. A good example of this is photosynthesis, the process used by plants to generate energy from sunlight. Most plants are so-called C3 plants that utilize a form of photosynthesis that strongly prefers C12 over C13. However, a smaller number of plants - including corn -- utilize another form of photosynthesis (either C4 or CAM) that does not strongly prefer C12. So, corn will have a bit more C13 in its molecules than will wheat (a C3 plant).

We are what we eat (and drink!). If all we ate and drank was beer (made of C3 plants), we'd have a relatively low amount of C13 compared to C12 in our molecules. If we then gave up beer in favor of bourbon (made primarily from corn), the amount of C13 in our systems would go up. Human biochemistry manufactures testosterone from what we eat. If what we eat is relatively light in C13, then our endogenous testosterone will tend to be light in C13. If what we eat has a relatively large amount of C13, then our endogenous testosterone will also have more C13.

In the 1990s, the ADA scientists considered the C12 and C13 makeup of exogenous testosterone. This testosterone is usually made from soy, and soy is a C3 plant that is light in C13 atoms. Bingo, thought the scientists! On average, exogenous testosterone should have fewer C13 atoms than endogenous testosterone. If a person is doping with exogenous testosterone, the C13 content in the person's testosterone should decrease - and this decrease could be measured with CIR testing.

A new anti-doping test was born: the CIR test for exogenous testosterone.

In order to function as an effective anti-doping test, the CIR test must be able to measure very small differences in C12 and C13 content. It is an AAF under the WADA rules if the CIR test for testosterone measures 0.3% less C13 than would be expected. Given that C13 is only about 1% of the carbon on earth, and that the test is supposed to be 95% accurate, that means that the test must be accurate to about 1 carbon atom in 650,000 (per my rough and inexpert calculations). Supposedly, the CIR tests ARE this accurate, if performed correctly. But this statistic illustrates why it's impossible to "discover" exogenous testosterone. At best, the CIR test might enable the scientists to "discover" an exceedingly small difference between populations of exogenous and endogenous testosterone molecules.

More About the CIR Test

As we've learned above, the goal of the CIR test is to measure the C13 content of the testosterone in an athlete's system. If that C13 content is too low, then according to the ADA scientists, the athlete has been doping with exogenous testosterone.

From this description, it may surprise you that the CIR test for exogenous testosterone does not actually look directly at testosterone . For whatever reasons, the test focuses on testosterone metabolites in an athlete's urine. The human body metabolizes (breaks down) testosterone into other substances, and these break-down substances are called "metabolites". Specifically, the LNDD's CIR test (like the test used in other WADA labs) measures the C13 content of 4 testosterone metabolites: androsterone (andro), etiocholanolone (etio), 5a-androstanediol (5aA) and 5b-androstanediol (5bA).

As we've discussed, we are what we eat, so the C13 content of these four metabolites will depend to some extent on the person's diet whose metabolites we want to measure. Because different people eat differently, the scientists had to design the CIR test in a way that would correct for different diets. For this reason, the CIR tests typically look at two other metabolites: 11-Ketoetiocholonolone (11Keto) and 5b-Pregnanediol (5bP). According to the scientists, the C13 content of these metabolites depend solely on a person's diet and are not affected by exogenous testosterone. These types of metabolites are sometimes called endogenous references, because they are supposed to reflect a person's endogenous delta values without regard to whether the person has taken exogenous substances. So, if the C13 difference between a person's andro or etio and 11Keto is large enough, or if the C13 difference between a person's 5aA or 5bA and 5bP is large enough, then the C13 difference cannot be explained by diet, and can only be caused by something else ... like exogenous testosterone. Or so the theory goes.

The C13 content of a substance is typically stated as a "delta" value. It is usually negative, in the range of -20 to -30 or so. The more negative the delta value, the less C13 has been measured in the substance. The calculations noted above (andro minus 11Keto, etio minus 11Keto, 5aA minus 5bP and 5bA - 5bP) are sometimes called delta-delta values. The WADA rules provide for an AAF if the delta-delta value for any of these metabolites is more negative than -3. (LNDD added a margin of error of 0.8 to this calculation, so by the LNDD, a delta-delta had to be more negative than -3.8 in order to find an AAF.) LNDD was willing to declare an AAF if the delta-delta value for only one metabolite was more negative than the rules allow; other WADA labs will not declare an AAF unless more than one delta-delta value is too negative. For example, UCLA reportedly requires three delta-delta values to exceed the negative limit before it will declare an AAF.

The CIR Test Finds Dopers - Sometimes
<>
Enough about the theory behind the CIR test. Let's ask the question: is the CIR test capable of catching dopers? To find out, let's look at a recent study performed by Saugy from the Swiss WADA lab and a host of others, reported in volume 71 of Steroids pp. 364-70, available for the moment at (Saugy 2006 Study)

In this study, 7 test subjects were given oral testosterone, and the scientists measured the T/E ratios and delta levels for andro and etio. In a number of cases, the measurements came out exactly the way you'd expect, given our discussion of the CIR test. For example, here are the results for one of the subjects:



Figure 1: Subject 2


The top chart shows the T/E ratio for subject S2. As you can see, when the subject was given oral testosterone, the subject's T/E ratio spiked to a high level, from a little less than 1:0 to over 90:0 (note that this is over 8 times as high as the level LNDD said they measured for Landis - this is presumably a very high level of testosterone). The bottom chart shows the delta values for the endogenous reference (a relatively straight line at the top) plus the delta values for andro and etio (the two lines dropping sharply at the time the oral testosterone was administered). This is exactly the kind of result we'd expect to see if the CIR test is a good test.

Unfortunately for the testers, not all the subjects reacted as we might have predicted. For example, look at the results for subject S1:


Figure 2: Subject 1 (yeah, they are in reverse order).

The bottom chart for subject S1 looks pretty much like the bottom chart for subject S2. But compare the top charts for these two subjects! While S2's T/E levels went through the roof, S1's T/E levels scarcely moved at all. It seems like S1 got no benefit whatsoever from his dose of testosterone. The authors of this study noted this result, and hypothesized that S1 might be a person who quickly metabolizes testosterone. In other words, as soon as the testosterone hit his system, S1 metabolized it into other substances, including the andro and etio shown in S1's bottom chart.

But if this is the explanation for S1's results, then how do we explain the following results for S3?

Figure 3: Subject 3

S3's results look like the results we'd expect from someone who skipped the study altogether! S3's T/E ratio barely moved after taking the oral testosterone, and perhaps more surprisingly, S3's delta values for andro and etio were nearly flat as well. What is the story here? Again, the authors of this study tried to explain S3's results as being the product of fast testosterone metabolism. But if S1 rapidly metabolized his testosterone dose into andro and etio, what happened to S3's dose of testosterone, which appears to have disappeared altogether? Did S3 further metabolize the andro and etio into even more basic substances? The study authors do not say.

The lesson to be learned here is an important one: human biochemistry is complicated and diverse. We cannot expect that two people will react to a doping product in the same way. Some people (like subject S2) will react as we might predict, and some (like S1 and S3) will not.

Is it possible that Landis may have an unusual biochemistry, and might also be capable of naturally producing unusual CIR results? Well, to consider this possibility, we have to proceed a bit further through the analysis. After all, we've just looked at a couple of cases where doping subjects produced unusual CIR results. But maybe this is something we'd only expect to see when people are doping. We have not considered whether a person NOT using doping substances might also produce unusual results. To consider this possibility, we have to look at a few more of the scientific studies.

Studies on Non-Doping Populations

Let's take a look at a second study, this one produced by a group including Don Catlin, the ex-head of the UCLA anti-doping lab. This study is reported in volume 47 of Clinical Chemistry (2001) on pages 292-300 (Catlin 2001 Study).

In most ways, the Catlin 2001 Study is typical of published studies on CIR testing for exogenous testosterone. (Don't worry if you have trouble following this description.) The study first looks at a control group of non-doping subjects, and measures a mean average delta-delta reading and standard deviation for the control group. The study then adds three standard deviations to the mean average delta-delta, to come up with a delta-delta threshold that should be beyond what a normal non-doping subject could test at merely by chance. Then the study attempts to find urine samples from people who have taken (or are suspected of having taken) exogenous testosterone, to see if their delta-delta readings are beyond the threshold reading. If so, the scientists conclude that their method is a valid test for exogenous testosterone, and they propose that the drug testing authorities adopt their threshold as the standard for determining whether a doping violation has occurred.

As part of his study, Catlin looked at the delta-delta measurements for 5aA - 5bP and 5bA - 5bP for a population of 74 male UCLA medical students who were NOT taking exogenous testosterone. Here are the results:



Mean Difference

Standard Deviation

Threshold Test

5aA – 5bP

-2.09

0.68

-3.99

5bA – 5bP

-1.43

0.63

-3.47

Table 1: Catlin Data

In compiling their results for the negative control group, Catlin noticed something unusual: the delta-delta for 5aA was significantly more negative than the delta-delta for 5bA. This difference was significant enough to warrant discussion in the study. Catlin and his group put forth two possible reasons why 5aA might naturally have a more negative delta reading than 5bA:

* 5bA is thought to be metabolized only by the liver (hepatic metabolism). 5aA may be metabolized both by the liver and outside of the liver (peripheral metabolism)
* 5bA may be produced by metabolism of substances other than testosterone - for example, DHEAS.

But neither of these explanations tells why the delta reading for 5aA would be more negative than the delta reading for 5bA. Why would it matter, for example, that 5bA is metabolized only by the liver, while 5aA can be metabolized peripherally? Clearly, there must be differences between these two kinds of metabolism! Consider our earlier discussion of photosynthesis, where we described how some biochemical processes prefer C12 more strongly than others. If the differences in 5aA and 5bA delta readings can be explained by different pathways of metabolism, then it must be the case that the control group's peripheral metabolism preferred C12 more strongly than did the group's hepatic metabolism.

The Catlin 2001 study also briefly mentions another fact: the control group's average delta readings for 5bP (the endogenous reference) had a considerably less negative delta than either the 5aA or the 5bA. Three different metabolites, three significantly different delta readings, all of which were produced naturally and without doping.

This is a highly important piece of information to keep in mind: human beings can NATURALLY produce substances having different delta values. In other words, it's not the case that only doping can explain differences in the delta readings for various substances found in the human body.

What kind of differences can we expect to see in delta-delta readings for different sets of non-dopers? To answer this question, I've looked at two other studies comparable to the Catlin 2001 study: a study by Ayotte and others (Ayotte 2001 Study) contained in the exhibit package for the arbitration at GDC 0024, and a study by Saugy and others (Saugy 2004 Study) reported in the volume 28 of the Journal of Analytical Toxicology (September 2004). The Ayotte 2001 Study looked at delta-delta values for 78 people described only as "mixed athletes" from different nationalities. The control group for the Saugy 2004 Study was a group described only as "40 male caucasian subjects." Unfortunately, neither study measured the standard deviation for the delta-delta measurement, but only the standard deviation for the separate delta components. The results are shown below, along with the results of the Catlin 2001 Study.


Study

Delta - Delta

Mean Difference

Standard Deviation

Catlin 2001

5aA – 5bP

-2.09

0.68

Saugy 2004

5aA – 5bP

-0.3

1.00

Catlin 2001

5bA – 5bP

-1.43

0.92

Saugy 2004

5bA – 5bP

-0.9

1.15

Saugy 2004

Andro – 5bP

0.1

1.23

Ayotte 2001

Andro – 5bP

1.5

1.6

Saugy 2004

Etio – 5bP

-1.1

0.83

Ayotte 2001

Etio – 5bP

1.6

1.3


Table 2: Summary of Studies

The standard deviations for some of these measurements are uncomfortably high - note in particular the Ayotte 2001 standard deviation of 1.6 for Andro. This means that the Ayotte study could expect to see swings in the Andro delta measurement of close to 5 points that could be caused solely by chance. That's a pretty large swing in delta values! But to my view, even more significant is the difference in the mean average delta-delta values that can be seen in these studies. Pay particular attention to the mean differences measured for 5aA - 5bP (range of about 1.8), 5bA - 5bP (range of about 1.4), Andro -5bP (range of about 1.6) and Etio - 5bP (range of about 2.7).


The measurements in these studies are all over the board! And remember, these are measurements on populations that are presumed to be clean - these differences cannot be explained by doping.

Can the differences be explained by nationality? In the 1997 study by Shackleton and others (Shackleton Study) reported in Steroids volume 62 pp. 379-87 (available in the Landis team document package at GDC 1098 [huge!]), Shackleton compared delta values for twenty individuals of twelve different nationalities, and here's what he found:


Figure 4: Shackleton Data

In this chart, the open circles are delta values for 5bP, the closed diamonds are delta values for 5aA, and the closed rectangles are delta values for 5bA. Again, notice the lack of any discernable pattern in these results! The Indian and three of the Chinese subjects had 5aA more negative than 5bA, as in the Catlin study, but the French, Australian and Turkish subjects had the opposite result. Most nationalities showed 5bP values less negative than 5aA or 5bA, but this is not the case for the English or the French subjects. Moreover, a close look at the chart may cause us to doubt that the variations shown here are truly characteristic of the nationalities in question. For example, the measurements for the French subject do not match the measurements we get from the LNDD (where 5aA has a consistently more negative delta value than the 5bA). Also, where we have multiple measurements for the same nationality, these measurements do not match up. Note, for example, that for the 5 Chinese subjects noted above, three have 5bA delta values more negative than the 5aA values, one goes in the opposite direction, and one appears to be inconclusive.

Once again, the measurements seem to be all over the place, and we have reason to doubt that these measured differences are characteristic of various nationalities.

So, what can we conclude from all this? To be certain, the data displayed above reinforces what we learned from the Catlin study, that human biochemistry is naturally capable of producing substances having different delta values. But where the Catlin study suggested that we'd see rules and patterns in these delta value differences (for example, that 5aA would have a more negative delta value than 5bA), the other studies reveal seemingly random differences in the delta values for various individuals. It appears that human biochemistry is capable of producing a wide range of substances with a wide range of delta values, without any discernable rhyme or reason.

I have purposely held back a study from this analysis. It's a bit of a mind-blower. Let's look at it now.

Delta - Delta: It's Big In Japan

To my knowledge, the most comprehensive study ever performed on the delta-delta readings for a purportedly non-doping population (the Nagano Study) is mentioned almost as an afterthought in a study authored by Ueki and Okano in volume 13 of the journal Rapid Communications in Mass Spectrometry (pages 2237-43, 1999). This study looked at the 5aA, 5bA and 5bP delta readings for over 400 athletes participating in the Nagano Winter Olympic Games in 1998. This study is significant, not only because it is the largest and most diverse study of its type (to my knowledge), but also because it focused on a specific population of international (and presumably elite) athletes. The Nagano Study purports to measure non-doping suspects only.

The Nagano Study will take a bit of explanation, because its findings are reported differently than those in other studies. Set forth below are the details of the Nagano Study that are pertinent to our analysis:



Mean

Standard Deviation

Range

5aA

-17.5

3.5

-15.6 to -24.1

5bA

-20.0

2.75

-15.2 to -26.2

5bP

-21.0

1.65

-17.2 to -23.8

5aA/5bP

0.86

0.163

0.47 to 1.12

5bA/5bP

0.96

0.098

0.73 to 1.12

Table 3: Nagano Study Data

The first thing to note here is that these results don't look like the results we've seen in some of the other studies. For one thing, the measured delta value for 5bP is the most negative of the three values measured, and contrary to what Catlin saw in his 2001 study, the 5aA is a lot LESS negative than the 5bA. My first reaction to the Nagano Study was "this can't be right!" But the Nagano Study is highly regarded - it is widely cited, including by WADA in its listed references to its technical document for CIR testing. Plus, the study appeared in a peer-reviewed journal. I don't have the "hubris" to suggest that we can ignore this study.

And look what this study has to say about CIR testing! For example, note that the standard deviation for measurements of 5aA in the Nagano Group (a group presumed not to be doping) is 3.5. This is a HUGE standard deviation - it means that we could expect to routinely see delta values for 5aA that are 3.5 more negative than the mean, and that we would have to see a delta value more than 10.5 below the mean before we could conclude that the value was not "natural" (i.e., that it was not a chance occurrence). If we can expect delta values to naturally range up to 10 points from the mean, then that would pretty much blow all CIR testing out of the water. To my knowledge, no lab has ever measured a delta value 10 points more negative than the mean.

Look at the values reported in this study for 5aA/5bP and 5bA/5bP. Unfortunately for us, these are not delta minus delta values, like we've looked at before. These are delta divided by delta values (or delta/delta values). Luckily, a number of the studies (including Catlin 2001) have recommended a delta/delta threshold for doping of 1.1:1.0, so we can consider a 1.1 delta/delta value to be roughly the same as a -3 to -4 delta - delta value. But per the Nagano study, we could not set a delta/delta threshold anywhere near as low as 1.1. The appropriate threshold supported by the Nagano Study (mean plus three times the standard deviation) would be 1.37:1.0.

What was Landis' HIGHEST measured delta/delta value? By my calculations, 1.22:1.0. That's barely more than TWO standard deviations above the mean as measured in the Nagano Study.

If the Nagano Study was the WADA guideline, then there would be no AAF against Landis - or probably against anyone else - for doping with exogenous testosterone.

Conclusion: What Do We Make of Nagano?

What do we make of the Nagano Study? Is it possible that the Nagano Study is right, and that the other studies (from Catlin, Saugy, Ayotte and the rest) are all wrong?

To understand what to make of the Nagano Study, we can see what other studies had to say about Nagano. For example, the 2001 Catlin Study politely suggested the possibility that "our analytical method differs from that of Ueki and Okano." But the Catlin study failed to point to any actual difference in analytical methods, and I would argue that mere differences in analytical methods could not possibly explain the differences in results between these two studies. No, there would have to be something WRONG with the analytical method used in the Nagano Study before one could conclude that it would be safe to rely on the results reached in the 2001 Catlin Study.

The 2004 Saugy Study also examined the Nagano Study, and noted that "there are striking differences" between the findings of the Nagano Study and the 2004 Saugy Study. Like Catlin, Saugy nowhere stated that the Nagano Study was wrong. Like Catlin, Saugy suggested that the differences in results might stem from differences in the analytical methods used. But Saugy also noted a second possibility: he pointed out that the athletes at the 1998 Winter Olympics came from different locations with different diets, and Saugy suggested that this "diet heterogeneity" might explain the difference in results.

Saugy's comment here is worth considering. If the Nagano Study is different from all other studies because of "diet heterogeneity", then we'd want to pay special attention to the Nagano Study in considering the CIR testing at an event like the Tour de France. The Tour de France, like the Winter Olympics, attracts elite athletes from around the world. The Tour de France participants will also present the testers with "diet heterogeneity". If "diet heterogeneity" explains the results reached in the Nagano Study, then we'd expect to see similar results in a study performed on riders in the Tour de France ... and we'd have particular reason to doubt the results of any CIR test coming out of the Tour de France.

My own guess is that "diet heterogeneity" is not a full explanation for why the Nagano Study reached different results from the other studies. Since the Nagano Study looked at a large and diverse population of subjects, my guess is that what we're seeing is the product of "human heterogeneity". People are different. They have different biochemistries. These biochemistries are capable of producing testosterone metabolites with differing delta readings, for reasons we do not presently understand. The more diverse the population we study, the wider the variations should be in these delta values.

Moreover, while it's true tha t the Nagano Study does not agree with the other studies we've examined, it's also the case that these other studies do not agree with each other. Each study we've looked at has reached different conclusions about the delta-delta mean we should expect from a random sample of non-dopers.

The conclusion I reach from all this is not an expert opinion, but I think it is the only logical conclusion. I conclude that the CIR test for exogenous testosterone is based on a false sense of human homogeneity. People come in a wider variety of types than the CIR testers are willing to admit. We don't have uniform delta-delta readings, we possess systems that are naturally capable of producing different delta scores for different substances, and we probably have different biochemical reactions to the same events.

Different stokes for different folks. One size does not fit all.

In short, this non-expert believes that Floyd Landis could have naturally produced the results measured by the LNDD, without need of exogenous testosterone.


14 comments:

Thomas A. Fine said...

Thanks Larry. In case it isn't obvious, I wholeheartedly agree with your conclusions.

Just to clarify, it is well-known that people from different countries with different diets have different baseline d13C measurements. This is the whole point of a delta-delta measurement -- you are comparing each athlete against their own baseline, rather than a common baseline (although there is also an absolute limit of -28 in WADA's CIR test).

Corn-fed Americans have about the highest amount of carbon-13 in the world. Even people who don't like corn here end up eating corn in everything.

This means two things - first, the CIR test for testosterone is more likely to catch a cheating American than it is a cheating European (because the synthetic testosterone is more different from American baseline). True positives for Americans are more likely than for Europeans.

And second, when Americans go to Europe and start eating European diets, their baseline CIR level gradually becomes more negative, which should cause measurements more erratic than those in European riders who are eating the same carbon 13 content that they always eat. This makes false positives more likely for Americans too.

Implicit in all of this are ways to beat the CIR. One, eat a lot of soy, and never ever ever eat any C4 plants, like corn and pineapples (IIRC). Or two, find a source for testosterone that is not made from soy. But of course, the primary way to beat the CIR test is to never take it, which simply means stabilize your T/E ratio by doping with both, so that you'll never fail the screening test.

tom

blackmingo said...

Larry,

As always, it is a pleasure to read your thoughts. I feel the same as you and have dreamt of doing an opus like this (like since about this time: http://www.dailypelotonforums.com/main/index.php?s=&showtopic=4233&view=findpost&p=61324).
However, it would not have ended up as well as you and Berry have put it. Nice work.

WADA should keep these lessons in mind when attempting to validate their diagnostic tests -I think their scientists must be aware of this increased scrutiny and risk serious embarrassment if they continue their erroneous ways.

Best,

Dan

verifythentrust said...

The last call should have been "If Landis lied and doped" which stays the more logical solution.

Maybe he was doped without knowing it like Virencque alleged it too.

The corn story is a good tale that can't explain how Foyd was able to challenge Operation Puerto doped riders, ...

Bob said...

Thanks Larry.

This was a great overview of the whole thing.

On the Nagano study I question why those 400 Olympians would be considered to be non-doping athletes. Olympic athletes have been shown to have used performance enhancing drugs - even the horses have been give drugs this summer.

nahual said...

From all of us at the back of the class, thanks for that concise, articulate,synopsis. And you can bet I'm using the following when my wife wonders why I nap:
"For this reason, living things tend to have more C12 and less C13 than non-living things, because life generally prefers chemical processes that require the least amount of energy."

jrdbutcher said...

verifythentrust,
By your logic, we're back to the presumption that (1) if they won, then they doped or (2) if they beat known dopers, then they doped. Both presumptions are BS. If variations on the above presumptions are all you have to back up your theories, then you are out of your depth.

Larry said...

TAF, thanks for the nice words! In case it isn't obvious, I could not have written this without the stuff I've learned from you. It's interesting, one of the things I looked for was evidence that American d13C measurements would be less negative than elsewhere, and I did not find any such evidence. It would stand to reason, given the high corn content in the American diet, but I did not find any support for this in the scientific studies. I've also looked to see if there is evidence of people selling artificial testosterone with less negative d13C (made from corn?), and could not find evidence of this either. I did see someone mention the possibility of adding 5bP and 11-Keto with a highly negative d13C content to artificial testosterone, but I don't know that this is going on, either. Obviously, the best way to beat the testosterone tests is not to use exogenous testosterone within a few days of any possible test.

Dan, thanks! Glad you liked it. I doubt that WADA and the scientists in charge of this work are paying attention to what I have to say, and given my lack of scientific credentials, it's not clear even to me that they SHOULD pay attention. The most and the best I can do is to try to read the studies together and ask "what the hell?" (Then leave it to the smart guys like you to try and answer!)

VTT, I'm not trying to explain how a clean rider could win the 2006 Tour de France, or the 2008 Tour for that matter. But VTT, please note that the anti-Landis camp make two arguments that strike me as mutually inconsistent. The first is that Landis would need to have doped to keep up with doping riders. The second is that Landis must have doped to produce his extraordinary performance in stage 17. These arguments are mutually inconsistent. You can't use "dope" as the reason why one rider is able to keep pace with the others, if "dope" is also the reason why one rider is able to leave the others in the dust.

nahual, THAT's funny! Unfortunately, in the time it would take me to explain C12 preference to my wife, it would be easier to just mow the lawn and take out the trash.

jrd, agreed.

Lloyd said...

Whew, that was a lot of typing. The problem is that with the system rigged against the riders that none of this really matters. The lab was not held to any standard and the system lacked any checks or balances. The is nothing that occurs in nature that comes close to the conviction rate that wada has. This in itself is an anomaly. I hope to see Floyd on the bike next year. They messed him up, but they did not break him.

Larry said...

Bob -

Thanks back. Yes, I also questioned whether the 400 athletes tested at Nagano were all non-dopers. The Nagano Study, like a lot of other studies, fails to tell us a great deal about the selection of the negative control group. The chart in the Nagano Study refers to the sample as "non-doping suspects" and the study states that "urine samples which had shown any trace of drug use were not involved in this population."

If you look at the range of delta values reported in the Nagano Study chart, you'll see that the most negative values reported are not ALL that negative. So, instead of suspecting that there may be a significant number of dopers in the Nagano Study population, *I* wonder if the Nagano study actually eliminated non-dopers from the population with relatively negative delta values. It's hard to know.

But consider that the Nagano study has been out there for a while, and has been discussed by Catlin, Saugy and others. You and I have shown far greater willingness to question the Nagano Study than have the WADA scientists (at least publicly).

The clincher for me is that, while the Nagano Study is a bit of a shocker, it's not like the OTHER CIR studies are all producing consistent findings. True, Nagano does not agree with the other studies, but the other studies do not agree with each other.

I see no reason to discount the Nagano Study.

Lloyd -

A lot of typing? LOL! You should read my OTHER opus!

Russ said...

Larry, WOW, you have dazzled me again! Great job.

A couple of comments, first my usual reminder... though you started with a comment - what if LNDD test were correct. We have too much hard evidence of their slop to accept that. This could reasonably be seen as another source of variability in the results (maybe they are in the ball park but out in left field).

Corn in American diet and in the diet of beef, pigs, chicken feed etc, results in a 10:1 typical ratio of omega6:3 fatty acids in the diet (should be < 2:1 is widely accepted, perhaps 1:2 or more) The beef we eat contributes to our C13 content, etc.

Vegan diet should result in quite low c13.

And my favorite observation of things you documented with little comment demands my comment....
it is called "hand waving"!!! Yes when scientist don't understand something and only offer speculations about it, that is hand waving. Too often we lay people accept these pronouncements as fact because they come from a learned source. Remember it is just BS (as, you or someone mentioned at least once)

Again grat job!!!

Thanks,
Russ

dailbob said...

Larry,
Absolutely fabulous job. It's completely incomprehensible to me how you find the time to research and write all this, but I'm glad that you do(and I hope it's really not last call, because your keen observation skills, and your ability to communicate them in writing, are truly exceptional)!

For me, your comparison of these study results goes hand-in-hand, and reinforces, Berry's conclusion that these tests must undergo the same rigorous testing that medical tests undergo to determine false positive/negative rates.

Best

Larry said...

Russ, thanks.

Interestingly (to me, anyway), I started my research as a way of focusing on the quality of the LNDD lab work. I figured that I could contrast the LNDD's reported historic measurements of negative samples with those reported in the scientific studies, to see how things matched up. I was particularly interested in why the delta value for 5aA seemed to test consistently more negative than other metabolites. I thought I might see a pattern that would suggest that LNDD was making systematic mistakes in their measurements, particularly of 5aA.

I'll admit, I was surprised by what I found. The LNDD measurements can't be matched against the results from these studies, because these results are not consistent. So, my focus shifted away from LNDD and to the best understanding I could come up with of these scientific studies.

I've concluded from my review of the studies that there is considerable natural variation in delta-delta scores. You have raised a second possibility, that the variations seen in the study test results stem from flaws in the test methods used in these studies. That is possible, of course. Of course, if the tests ARE this screwed up, then my conclusions about the meaning of these test results would remain unchanged.

My humble opinion: I don't think the tests performed in these studies were screwed up. The tests were performed by top scientists, for peer-reviewed journals. You have to figure that these tests were performed with the greatest possible care. Sure, mistakes are possible ... but I looked at a number of tests and none agrees well with the others. Chances are that at least SOME of these tests were performed well, and that the differences in the delta-delta measurements reflect something meaningful.

Russ, you talk of hand-waving and BS. While I don't blame you for doing so, I purposely tried not to talk like that. I'm not a scientist, and I can't pretend that I understood every word that I read in these studies. As I wrote the piece, I imagined our buddy "M" sitting beside me, criticizing every word I wrote, and asking how I managed to summon the "hubris" to challenge the scientists. So, for the most part, I didn't try to challenge them, I only tried to understand what they were saying and to read the studies together.

Yes, as a result, I came up with different conclusions that those reached in these studies, each of which argued for the validity of the CIR test. But that's only because I don't think it's possible to read these studies together and at the same time agree with the conclusions reached in each study.

And if I'm right, and the WADA scientists underestimate the complexity and diversity of human biochemistry, I don't think that they're alone in this. I think that a lot of scientists have made this same mistake.

Larry said...

d-bob, I write this stuff out of personal interest. This piece pretty much wrote itself.

I think that this is the last question of interest to me in the Landis case. I don't see the point of talking about the WADA rules, because after much soul-searching I've concluded that I don't think they mean anything.

The science continues to interest me, but I'm not a scientist and I honestly don't know what more I can say about the science. I made a kidding reference to "M"'s pronouncement that we all had "hubris" to challenge the opinion of a prominent doping scientist, but actually I'm pretty sensitive when it comes to criticism like that. I'm a reasonably cautious and conservative guy (maybe not conservative in the number of words I use to complete a thought!), and I'd rather leave the science to the people with the training to understand it.

Landis has lost. His AAF is final. There's nothing anyone can do to change that. The ADAs move towards harsher punishments and more aggressive testing, while the athletes have ever more ways to dope without getting caught. It's a very, very bad situation for sport, one that is likely to blow up from time to time with Festina-like scandals. The "loyal opposition" to the ADAs found here on TBV will eventually be replaced with a growing voice to "reform" the ADAs by legalizing (and perhaps regulating) doping in sport. I think that before that day arrives, I will have given up sports fandom in favor of a more rational activity.

In the meantime, I will continue to comment here.

(oh, and if I occasionally end my retirement from posting opuses, in the manner of a blogging Bret Favre, no fair making fun of me!)

Russ said...

Larry,
Your observations seem right on to me.

The reason I keep hammering on LNDD, although some of their work may be research grade, we have seen too much evidence pointing the other way.

I would expect the quality of the work supporting research to be a cut more strict than bulk testing of samples. This is required so that the test results can stand open scrutiny, at least among peers 'inside' the tent what with the ADA's. Given the hidden nature of some of the ADA's research, and their apparent rush to production, I'd suspect any ADA research more than others. Plus any acceptance (lndd) of low grade work probably creeps into research to some extent. Plus they seem to use bulk test numbers sometimes in their papers.

The production time error margins built into the WADA rules and the ISL also reflect a lower standard for 'production work' vs research/validation work, as I recall. This should be remembered when evaluating Floyd's results against a research grade dataset.

As to hand waving, I work in support of a scientific field and am exposed to right much of it. Basically, it means - more clearly - Ooops, we don't know what that means so lets say something that sounds really impressive and maybe no one will ask about it. Though hopefully some of the ideas offered, in those cases, do have some reality support and sometimes cause follow on research by someone.

I am not quibbling with your conclusions or method. Those seem quite excellant to me, and faithful to your stated approaches to evaluating the information. Very thorough job!!!

I have learned a lot from your efforts, I just try to keep my unique skeptical perspective with everything. There are almost always loose ends worth a think or two.

Russ