Friday, August 15, 2008

Highly, highly unlikely?

The "highly, highly unlikely" remark from Ashendon, in Mark Zeigler's story of July 2007, provides the exclamation mark to the "of course he doped" reaction to our recent post about the hematology values. As Tenerifed snipped:

“Going from 15.5 to 16.1 (in hemoglobin) is not that unusual when not competing,” Ashenden said by phone from Australia. “But it is very unusual to see an increase after a hard week of cycling. You'd expect it to be the reverse. You'd expect that to fall in a clean athlete. An increase like this in the midst of the Tour de France would be highly, highly unlikely.

“There's nothing where I could point to one value and say, 'This guy definitely doped.' But it raises red flags for me. I would definitely recommend to anti-doping authorities that an athlete presenting these values should be target-tested for blood doping.”

[boldface from Tenerifed]

We'll point out some things that weren't bolded in his reading of the story.
  • "Not unusual when not competing"
  • "Unusual to see an increase after a hard week of cycling"
  • "Nothing where I could point to one value and say, 'This guy definitely doped.'
We'll get to the key assumption Ashendon makes in the above in a moment. For the moment, let's observe his conclusion isn't shared by all the experts either. In particular, it disagrees with a presentation by Saugy, who observed that 25% of riders would see the kind of change seen in the two Landis samples from the 2006 Tour, being a ride in Hb after 8 days of racing

Figure 1: Saugy Presentation, page 26.
About 25% of riders (the ones over the diagonal) have rising Hb values.


So, "highly, highly unlikely?"

To review what we understand of the basic theory, when not fatigued, a stressed body (work and altitude) will stimulate natural EPO delivery. This results in more reticulocytes, which a week or so later become hemoglobin carrying mature red cells. If the stress is short, the reti count may spike high, and be decreasing while the red count comes up. If the stress is long but not too intense, the reti count may stay relatively high, while the red count also rises. Should the stress be long with high intensity, fatigue may preclude production of more reticulocytes, leading to low reti counts and low red cell levels.

Hard Week?

How does this relate to the idea of a "hard week"? This is the assumption built into Ashendon's reasoning, that the first week of the Tour Landis rode was a "hard week" that would be a time of fatigue leading to a collapse of Hct and reticulocytes.

This seems to us an argument based on conjecture, and not supported by specific data. We don't know what Landis workload was before the tour, but it did leave him with a low Hct/Hb and high reticulocyte count. We also understand that strategically, a GC contender would be trying to take it easy the first week, so as not to be spent when the roads turn uphill. This seems like the natural strategy of any GC contender, probably more important for a clean rider than a dirty one. It's also probably more consciously adopted by a known contendor (Landis) rather than an opportunist (Vande Velde) or a domestique (Millar). That is, Vande Velde, while somewhat protected, probably felt more need to work to stay highly placed than Landis the first week, and Millar was supposed to be doing work.

Is there data to support this theory? That's why we included the stage summary and performance data in the original post. Up until that rest day, Landis had been averaging 3500 kj of work on the full-length stages, putting out 260w when pedaling, and 210w average over the length of the rides start to stop. That is not an intense amount of power. For comparison, Landis was doing 250w while bonked on Stage 16.

Now, let's look at Vande Velde's power data from the 2006 tour, from cyclingPeaks.com:

Stage
time
kj
tss
IF
watts
stage 2
5:45
3970
280
.699
192
stage 3
5:08
3500
256
.707
189
stage 4
5:06
3416
227
.668
186
stage 5
5:31
3510
235
.653
177
stage 6
4:19
3259
245
.753
209
stage 7 itt
1:04
1498
112
1.026
390
stage 8
4:26
3873
304
.828
243
stage 9
3:55
2484
143
.604
175






average (less s7)
4:44
3430
236
0.700
195
stdev (less 7)
0:40
487
61
0.07
23.6
Table 1: Vande Velde data from 2006 tour.

IF = intensity, 1.0 being at aerobic threshold.
Values > 1 start going anaerobic.

TSS = IF * hours * 100;


Is this hard? The pitiful, fat and 50+ TBV did a century in May that took 5:30, 3233 kj @ 163 watts, TSS 328, IF .772, and felt pretty good the next week, doing three hard, shorter rides.

It is certainly true that no stages in the Tour are "easy", and the workload and pace of the initial stages would kill most of the readership. And it is true that they are nervous for the riders, with lots of mental effort. This does not say that they are particularly physically demanding for the should-be-protected GC contending team leaders and domestiques not in the break. Vande Velde got out one day, on stage 8, and worked much harder.

By comparison, on stage 3, Mr. "Happy to be in front" Jens Voigt rode off into the distance:
"After a fast start and few early attacks, it was Jensy Voigt (CSC) who made a strong move at km 15 near Strassen, and he was quickly joined by Arrieta (AG2R), Pineau (Bouygues), Laurent (AG2R) and Etxebarria (Euskaltel). This was the right combination and the quintet cruised away as the peloton was in no mood to try and bring them back on the hot, hard and hilly stage. By Bridel, 5 km later, the break already had 1'30 and Voigt was virtual malliot jaune as the big German rouleur was in 47th, 0'36 behind leader Thor Hushovd. "

Voigt's day was 5:12, 5300 kj, at 283 average watts. That is a hard day of racing.

On the same early stages, Landis and Vande Velde did far more comparable efforts.

Stage
FL
kj
VV
kj
FL
watts*
VV
watts*
stage 2
3934
3970
195
192
stage 3
3969
3500
222
189
stage 4
3760
3416
209
186
stage 5
3749
3510
196
177
stage 6
3349
3259
223
209
stage 8
3481
3873
227
243
stage 9
2624
2484
203
175





average (less s7)
3552
3430
210
195
stdev (less s7)
467
487
13.35
23.6
Table 2: Landis and Vande Velde, 2006 Tour

* note average watts are computed by different methods and may not be truly comparable.

We see Landis did a whopping 120 kj more work a day (about one can of Coke) than Vande Velde, averaging 15 more watts over the first week -- which makes sense, because he did the same work in less time.

On stage 10, Voigt went on the break and blew up -- but Voigt's day was 5513 kj @ 281w, compared to Landis' 4377 @ 246w.

We think it reasonable to suggest the first week was not a "hard week" of cycling for Landis. Thus, Ashendon's predicate to his expectation of drop may not be true -- which means his conclusion may not be true either,

Saugy's data suggests that unless you think 25% is a slim prospect, "very, very unlikely" is hyperbole. A juicy quote for Zeigler to use, and ammunition for accusers, but it is a conclusion not supported by direct evidence.

Swings of reasonable size

The offscore methodology that Ashendon developed places limits well above and below any of those seen Landis, Vande Velde, or Millar. To date, we don't have data on a known-doping rider to see where there offscore lies, but there have been suggestions that the "odd values" that were discussed with Hamilton were pushing the high number very hard.

When people say Landis' values are "very high" or "very low", these relative terms become questionable, because they are well within the range that is considered "normal" by the off-score methodology.
For some idea of "natural variations", there is an anecdotal story at "Can't holder tongue" describing a non-doper's experience with odd reported blood values.


Given the paucity of information in the two Landis data points that are available, Ashendon is right to say (a) there is no smoking gun, (b) there is reasonable cause to look further.


Further investigation

We note that Landis had blood taken on four occasions during the tour, and nothing came of it. (Why the UCI only reported two sets of hematological data is an interesting question.) We might suspect that he was targeted for additional scrutiny, and that nothing came of it. That doesn't prove anything, as we know the available tests don't reliably detect everything. But the chain of reasoning is becoming very conjectural.

Unfortunately, the UCI running controls at the time did not seem to follow up thorougly and get enough data that one could possibly see the spikey pattern that Saugy discusses that would be indicative of manipulation -- or if the UCI did with the other tests, found nothing. It is not Landis' responsibility that no other samples were taken.

Conclusion

As far as we know, no one in an official position has ever seriously suggested Landis did anything other than Testosterone doping.

As far as we know, no one in an official position has ever made the argument that the micro-dosing of testosterone would have particularly enabled the performance of Stage 17. At best, there is conjecture that microdoses are believed by some racers (Papp) to enhance recovery, but there are no studies to support that -- USADA certainly never entered anything into the record.

We do know there is rumor and conjecture that Landis blood-doped, and a common twist is that the "blood he doped with" was spiked with testosterone from some training cycle. The latter makes little sense, for two reasons. First, blood-doping spins out the plasma that carries testosterone; and second, it doesn't account for the reported exogenous testosterone in some of the other B samples, if one thinks they are reliable.

As with so much else about the case against Landis, the available data seems inconclusive, and one is left with the weight of presumption as the probable determining factor. Looking at the two data points and making an accusation seems, objectively, to be as (in)valid as looking at Stage 17 and saying "I know that is doping."

14 comments:

Thomas A. Fine said...

Uggh. I really had hoped that I had long ago driven a stake through the heart of the ridiculous testosterone-through-blood-doping theory. And here I find my good friend TBV mentioning it, and then doing a poor job of beating the dead horse to make sure it doesn't jump up again and start demanding brains.

Night of the Living Horribly Stupid Theory.

So anyway, the two other really good reasons why this dog won't hunt, and won't even get out of bed, because it is in fact a dead dog, with no handy zombie chemicals nearby are:

1. The half-life of testosterone in the blood is very short, less than an hour, so unless he took testosterone and then immediatly drew a blood sample, there would no testosterone in the blood to reintroduce.

2. Blood doping is not a total blood replacement. You're adding maybe 10% new blood. So whatever testosterone there was in your testosterone originally (which is probably none, see above), and which was not eliminated in processing (see TBV's description) would only have a 1/10th measureable effect.

This is the most important argument. Normally, doping with T mixes natural and synthetic testosterone, and you get a carbon-13 measurement that's somewhere in between. If someone had a fairly high carbon-13 baseline of -20, then compared to the roughly -30 value for synthetic, doubling your testosterone through doping would produce a result of -25.

So, IF Floyd injected testosterone and immediately drew blood, and by some miracle the testosterone got passed the blood processing, then reintroducing that blood for a 10% increase in blood volume could conceivable alter the carbon-13 measurement by around 0.5.

tom

Tenerifed said...

Some of the cyclists in Figure 1. were doping. Figure 1. is irrelevant.

Larry said...

t-fed, come on. You cited an authority that said that a rise in Hg levels would be highly, highly unlikely. Figure 1 is relevant, as it shows that a rise in Hg levels happens about 25% of the time. That's not unlikely.

Sure, we can guess that some number of the cyclists represented in figure 1 might have been doping. What is your point? That you don't think we should look at the results from the real world? You cannot study the results of any race going back 25 years or more where we'd assume we were dealing with a 100% clean peloton. In fact, we can say with some confidence that a 100% clean peloton is highly, highly unlikely.

Thomas A. Fine said...

tenerifed, don't be ridiculous.

That's a beautiful plot. It's a nice smooth distribution, with a denser core, and gradually falling off evenly in all directions.

Doping would be indicated (ideally) by an isolated clump of values, or at least by a lopsided graph.

tom

Russ said...

Good morning everyone!
Larry, TAF, TBV you guys are on a fantastic roll ! I am dazzled.

Hope SYI and Bill are getting some rest and reloading their arsenals :-).

I saw Ali's post the other day.
Ali - come on in the waters fine!

Did anyone notice that Phelps had submitted to special intense testing to prove he is clean ? And he is collecting all that gold with only half of the niki suit! Maybe he has an invisible underwater Harley :-)

Going for a ride :-)

Regards,
Russ

Tenerifed said...

Larry -
You are looking for proof Landis did not blood dope. You found evidence in Figure 1. That evidence means nothing if cyclists in Figure 1. were blood doping. You said yourself doping is not unusual in cycling. Increasing Hb caused by doping is not unusual. This real world example does not help your point.

Thomas -
Are you saying there was a major cycling event without much blood doping? Not likely.

wschart said...

All this is not really an attempt to show that Landis didn't blood dope, but rather an attempt to show that the 48.2 figure often referred to as evidence of doping is far from conclusive evidence of doping. It might raise some suspicions, but that is all.

Larry said...

Tenerifed -

wschart, well said! To clarify: I am not looking for proof that Landis did not blood dope. No such thing can be proven. No rider can prove he is clean.

I am saying that there is no proof that Landis blood doped. There is no proof that Landis took EPO, or took any other banned substance to produce the Hg readings from the 2006 TdF. None.

The absence of proof of doping is not, of course, proof of no doping. But there is no such thing as proof of no doping. The absence of proof of doping is the only evidence available to a clean rider to show that he is riding clean.

You have cited authority to the effect that Landis' 2006 blood readings were highly, highly unlikely. TBV has shown you proof on figure 1 that this is not so -- and the proof is from a WADA lab, not from the Landis team. As TAF points out, the Landis values fall squarely within the expected statistical distribution of values shown on figure 1. If you add a margin of error to the Landis readings (and based on what we've seen from the ACE and UCI testing, there is SOME margin of error in this testing), I would argue that the Landis readings are not unlikely at all.

TBV and I have both stated that we'd have no problem with riders shown above the figure 1 diagonal line being subject to further testing, but I'd expect a very high rate of negatives if this group actually was tested further. I've pointed out that about 5% of the riders targeted for additional testing at this year's TdF had reported AAFs, meaning that 95% of these targeted riders tested negative. Obviously, I have no way of knowing whether the riders above the figure 1 diagonal line would test positive and negative at this same rate. I'm pointing out the 5% - 95% split to show the nature and extent of the "suspicion" that is appropriate when a rider is targeted for further testing.

Again, remember that 50+ riders were targed in this year's Tour de France, because of some "suspicion" on the part of the race authorities. These "suspicious" riders included most of the brightest lights in the Tour, such as the Schleck brothers and Carlos Sastre. Even Garmin-Chipotle riders like David Millar were reportedly targeted. None of these riders tested positive for blood doping or EPO. There is no lingering "suspicion" of these riders, nor should there be. Similarly, Landis was tested 8 times during the 2006 TdF, and never tested positive for blood doping or EPO. On the topic of blood doping and EPO, Landis also merits no lingering suspicion.

The bigger point being made by TBV is that you cannot make too big a deal about the Landis blood numbers. There's little if anything that you can conclude from these numbers.

m said...

TBV,

Don't know if your 25% estimate of HB increasing in the Saugy data 7 days after is accurate or not. How was it computed?

But even if it is accurate, Ashenden wasn't speaking about just any increase, but specifically one from 15.5 to 16.1 HB.

"“Going from 15.5 to 16.1 (in hemoglobin) ...is very unusual to see an increase after a hard week of cycling... An increase like this in the midst of the Tour de France would be highly, highly unlikely."

Using those HB figures, you'd find a much smaller percentage in the Saugy data increasing to the same extent as Landis, by my very rough eyeball estimate 5 to 10%. Does this reach the "very unusual" or "highly highly unlikely" standard? Probably. 5% is the usual standard for statistical significance which might correspond to "highly highly unlikely".

But the proper way to do this is to compute a regression line estimate and the standard error of the estimate and then compare Landis's score with that standard error.

So my best guess is that the Saugy figures support Ashenden's characterization, and is not inconsistent as you are claiming.

Larry said...

M -

It is always good to hear from you.

OK, because you asked, I printed out a copy of figure 1 and started counting dots. Anyone else who wants to do so can do the same.

I count 154 dots, 118 below the line, 4 on the line and 32 above the line. The percentage calculation varies depending on how you count the dots on the line. If you're trying to figure out the percentage of dots above the line, I calculate 20.1%. If you're trying to figure the percentage of dots on and above the line, I calculate 23.4%. I think this is reasonably close to 25%. To be sure, even 20.1% could not fairly said to be highly highly unlikely.

You then raised the question, maybe it's not the increase in Hg per se that's highly highly unlikely, but the increase of 0.6 that's highly highly unlikely. So I printed out a copy of figure 1, and drew a line parallel to the line shown in figure 1, only 0.6 higher. I count 18 dots on or above the 0.6 line, or about 11.7%.

I don't know if it's fair to group Landis in the 11.7% group or in the 23.4% group. Grouping Landis in the 11.7% group means you're counting Landis only with those who had his "score" or higher. But even if you thought that 11.7% is the right number, I don't think that 11.7% is highly highly unlikely. If 11.7% is highly highly unlikely, then looking at the perspective of an entire year, it's even more highly highly unlikely for a day to fall in the month of August.

tbv@trustbut.com said...

There's a followup post, in which I counted dots and got slightly different numbers than Larry, but the gist is the same.

TBV

Thomas A. Fine said...

"Are you saying there was a major cycling event without much blood doping? Not likely."

In order for any test to be useful, it has to separate the good from the bad in some way - it has to be able to see them as two distinguishable groups. This is the perfect graph for visualizing that - you would literally see two groups of data points.

But since we don't see anything like that, there's a narrow range of possibilities. Here's some of the points on the curve:

1. delta-Hb might be a fabulous indicator of doping, but the delta value must be much higher than Floyd's numbers, and in this graph, no one doped.

2. A very small number of people doped, and delta-Hb is a poor indicator of blood doping because natural variations are similar to doping indicators.

3. Lot's of people doped, but delta-Hg is almost completely uncorrelated to doping.

tom

Larry said...

Tom -

First of all, how well do you understand all of these statistics? I don't know your background. I have a bunch of statistics questions and I need the help of a qualified statistician (not a category in my local yellow pages).

I am slowly working my way through the 2001 Catlin article in Clinical Chemistry, that's supposedly one of the pillars for the CIR testing in WADA world. It seems relatively light on chemistry and very heavy on statistics (I'm ok with that split as I know next to nothing about either subject). If I had to summarize Catlin's methodology in a nutshell, it would be:

1. Find some measurement like C13/C12 that supposedly relates to what you want to test.

2. Try out the measurement on a negative control group to establish a mean and a standard deviation.

3. Move three standard deviations away from the measured mean. This is going to be the bright line you're going to use to separate positive from negative results.

4. Find a positive control group, or a suspected positive control group, and try out the test on the group. If you get a decent percentage of the group to test positive, voila! Your test is ready for prime time.

5. Publish a paper. Accept speaking engagements. Testify against athletes (optional).

Do I understand this more or less correctly?

By the way, off topic: a while back you took me to task for "assuming" that on average 5aA, 5bA and 5bP readings for a non-doping sample should all be roughly the same value. Hey, I SAID it was an assumption. Turns out the assumption is not true. 5aA appears to have a naturally more negative delta than the other metabolites. Catlin says so in his paper. In fact, his 2001 negative control group had a mean average 5aA - 5bP of -2.1. More on this later as I absorb information.

Thomas A. Fine said...

larry,

That's not a bad outline of how to design tests.

Of course, the smaller the sample in your testing, the less appropriate it is for establishing your distribution. You'll probably not get the wings well-represented, but you might get (un)lucky and find someone with unusual characteristics.

The more important the testing, the larger the study should be to establish the distribution.

Also, this all presumes that the distribution is gaussian. This is a very common approach, but it isn't always valid.

As it all applies to this, I've never found anything to indicate that WADA ever did this stuff in more than an eyeballed way. Go to dailypelotonforums, and search for Maitre or Catlin to find the discussions there on this subject. IIRC, Maitre recommends a higher threshold, and my analysis of Catlin finds "false positives" within their presumed normal dataset.

tom