trust but verify: The Winnowing: Meier-Augenstein on Watching the Watchdogs

Wolfram Meier-Augenstein testified for Landis at the AAA hearing. He is an expert in isotope ratio spectrometry, invented a number of the techniques, and is widely published. He also sent us a short, direct responses to the topics we suggested, and this longer piece, which we have reformatted. In the cover letter, he writes:

I have attached a slightly amended version of my "who watches the watchdogs" response to the original majority panel decision; a document I stand by to this day. It focuses on the real problem here, non-fit-for-purpose procedures incompetently applied by a lab with no adequate quality control and quality assurance procedures in place. If LNDD would be assessed by a proper accreditation body (such as UKAS) to either ISO-17025 or to GLP I should like to think they would fail.

W. Meier-Augenstein, original version 23 Sep 2007; this amended version: 20 Dec 2008.

Quis custodiet ipsos custodes?

Who watches the watchers? (Juvenal; Satire 6.346–348)

The panel's majority decision document contains misleading information, whether deliberately or merely due to ignorance I wouldn't care to speculate. However, what can be stated with certainty is the fact that standard operating procedures as for gas chromatographic peak matching as employed by LNDD are in stark contrast to every text book on gas chromatography and in contrast to articles by internationally acknowledged pioneers in the field of gas chromatography.

Instead of facing up to the fact that LNDD's methods are fundamentally flawed, the panel majority and its expert/s throw this back at the experts for the defence calling their criticism fundamentally flawed.

[MORE]

On the one hand two members of the panel brand criticism on non-matching relative retention values “scientifically totally unacceptable and fundamentally flawed” because LNDD uses two GC methods on their GC/MS and GC/C-IRMS system, respectively, while on the other claiming compound peak identification was a feasible and sound thing to do despite the fact that in addition to using two different GC methods LNDD also used two completely different GC columns of different polarity and, hence separation properties thus turning visual comparison of chromatograms into something only a cynic would call a scientifically sound approach to compound peak identification.

Is it really unreasonable to demand that methods applied to two sets of experiments are set up to be as similar as possible in almost all respects (with the exception of 1 maybe 2 parameters) if from the outset the intention is the compare the results / outputs from these two experiments? In order to see cause-and-effect relationships a laboratory must be sure that their procedures (the independent variable/s) are the only variables having an effect on the dependent variable. They should do this by holding all other variables, variables that might also affect the dependent variable, constant or as consistent as possible (“Principal of Identical Treatment”).

It’s a simple and well know fact of life applicable to every measurement system: you put garbage in, you get garbage out. This applies in an almost extreme way to GC/C-IRMS because there are so many causes for garbage-in such sample matrix interference, IRMS non-linearity, compound peak overlap to name but a few.

However, before even considering these compounding factors we need, no, must know with absolute certainty which compound is represented by which peak since GC/C-IRMS has to destroy a compound by combustion into carbon dioxide (CO2) to determine the compound’s 13C isotopic composition with the degree of accuracy and precision required for natural abundance level isotope analysis.

So, if somebody would like to compare responses from two GC systems where presence of a peak is indicated by an ‘ionization’ detector (system A) while system B uses a ‘carbon’ detector for the same purpose, is it indeed “scientifically totally unacceptable and fundamentally flawed” to ask (or assume) that experimental set-ups of both GC systems are as closely matched as possible, i.e. identical GC column (i.e. identical stationary phase), identical or at least similar carrier gas flow rate, identical or similar temperature programme to ensure one can compare chromatograms even though peak heights, peak areas, and absolute retention times will be different due to the differences in detector characteristics and variations in the GC conditions, respectively?

Both scenarios have the following in common, methods used for data evaluation as well as the experimental design must be fit for purpose and the latter must match the needs of the former.

If there is a compelling reason to use different temperature programmes one has to employ more than one Internal Retention Time Standard so Linear Retention Indices (Kovats Indices) can be calculated to compare and determine peak identity. In fact to anchor N samples peaks one has to use N+1 internal retention time standards.

If one wishes to use relative retention as proxy for peak identity between two chromatographic systems, chromatographic conditions ought to be closely matched though parameters such as column length and carrier gas flow do not have to be a 100% match.

At this point, attention is drawn to the fact that in USADA’s pre-hearing brief (16 April 2007, #172285 v1; Section IV. IRMS Confirmation of Exogenous Testosterone in Sample #995474; Sub-section D. Description of the IRMS method used by LNDD), which painstakingly describes every step of the IRMS analytical procedure as carried out at LNDD, Points 53 to 58 make no mention whatsoever how target compound peaks were identified during the IRMS analyses. All target compound peaks are primarily identified through the pre-IRMS compound identification as introduced in Point 39: “39. The LNDD IRMS test consists of three main steps: sample preparation, pre-IRMS compound identification, and IRMS analysis”.

Attention is also drawn to the following quotes from the same aforementioned USADA document. “40. … Next an internal standard (5-alpha-androstanol acetate) is added for a purpose that will be explained below. 41. … The first element of compound identification is the GC "retention time (RT)" and the second one is the molecular fingerprint recorded by the MS, which fragments the molecule into ions. 42. A parameter that is even better than the retention time is the relative retention time (RRT). It relies on the internal standard that was added to each tube during sample preparation. The internal standard has its own characteristic retention time. The relative retention time of any other compound is simply (RT of other compound)/(RT of internal standard)¹. This makes comparisons of retention times easier because it normalizes them”.

In the context of the above, Points 182 to 186 in the panel’s majority decision document make for interesting reading.

Also interesting to note is the fact that LNDD does not record the δ13C-values of the internal standard during 13C isotope analysis by GC/C-IRMS. This is point that was specifically stressed by USADA’s legal team during cross of WMA. If we except this argument then this begs the question on which basis did LNDD identify the internal standard peak in chromatograms of samples LNDD claimed to be Floyd Landis’, which showed in some fractions more than 4 peaks in the immediate vicinity of the presumed internal standard with 1 or 2 peaks being as close as 15 seconds.

Why were the panel and the WADA experts not interested in this point? Answer, if one cannot even unambiguously identify one’s chosen internal standard, what confidence can one have in the results that hinge on this knowledge?

¹Incidentally, this statement is scientifically incorrect as many a textbook on gas chromatography and GC training manuals will testify to. What follows is a verbatim quote from Shimadzu’s GC training manual; only bold emphasis and sections underscored and in italics are added. (Shimadzu’s GC manual is accessible on the Internet at:
http://www.shimadzu.com/products/lab/ms/tutorial/oh80jt0000007e8m.html
(click on “Click here for details” beneath bullet point ‘�� Identification Using LRI’; in the new window scroll through the presentation using >> and don’t forget to open the tabs on the right hand side of each screen).

“We call the length of time between injection and position of the target compound peak a retention time [tr]. On the other hand, the time difference between the peak of an unretained compound and a target compound is called the adjusted retention time [tr’]. We call the retention time of a compound that is not retained by the stationary phase the gas hold-up time [t0].

Relative Retention α (r) [IUPAC recommend using r instead of α to avoid confusion with the Separation Factor α)

Since absolute retention times are affected by many operational parameters, retention parameters less dependent on column dimension and analysis conditions may be desired. Such parameters are expressed by the relative relation of adjusted retention time between the standard sample IS and the unknown sample: relative retention and retention index.

Relative Retention α = t’rs/t’IS = (trs-t0)/(tIS-t0)

The advantageous point of relative retention is that it depends only on the ratio of distribution coefficients and the effects from some parameters, such as column length and carrier gas flow, are basically cancelled out.

However, there are some limitations for relative retention. Measurement of errors will increase for target peaks located far from the reference peak and it is hard to find a relation with a chemical structure.”

Note, that relative retention only makes allowances for different column length and different carrier gas flow in otherwise identical set-ups, i.e. identical stationary phase and identical temperature programme.

Finally, attention is drawn to the fact that the standard sample Mix Cal Acetate used in IRMS does not contain 3 of the 6 target compounds; in fact it does not contain any of the target steroids crucial to the adverse finding in this case, namely androsterone (andro), 5α-androstanediol (5alphdiol) and 5ß-pregnanediol (pdiol). In contrast, Mix Acetate, the standard sample used in GC/MS for pre-IRMS compound identification does contain all 6 target compounds plus the Internal Standard, yet Mix Cal Acetate contains the same Internal Standard as Mix Acetate (see Point 42 of the USADA pre-hearing brief)

Given all that and given how important it is to be able to identify all target compound peak as unambiguously as possible, it is surely a logical assumption to make that LNDD had a tried and tested protocol in place that enabled them to use pre-IRMS compound identification by GC/MS analysis as an anchor for the IRMS compound identification using relative retention as proxy.

However, in the light of Points 182 to 188 of the panel’s majority decision document it would appear LNDD was not doing any such thing “because the chromatographic conditions are different”; …”the thermal ramp {…} is different.” (Point 188). [This statement is not only in stark contrast with the stated benefits of relative retention as per the USADA document quoted above: “A parameter that is even better than the retention time is the relative retention time (RRT)”; because “This makes comparisons of retention times easier because it normalizes them” it all also ignores nearly 50 years of research and development that has made gas chromatography (GC) one of the most widely used techniques in analytical and separation science and resulted in insights of GC theory such as “The advantageous point of relative retention is that it depends only on the ratio of distribution coefficients and the effects from some parameters, such as column length and carrier gas flow, are basically cancelled out”.

Similarly, the majority panel is seemingly also ignorant of the fact why renowned scientists such as E Kovats, in whose honour the linear retention index has been named Kovats Index, have devised methods and equations designed for identifying compound peaks of the same compound but analysed under temperature programmed gas chromatographic conditions.

As an aside, one of the main applications of relative retention and retention indices is to compare chromatographic behaviour of compounds on two GC systems with two different detector systems. Even though the majority panel in Point 182 try to create the impression a GC/MS and a GC/C-IRMS instruments are instruments “that are not of the same type” they contradict themselves in Point 184: “With GC/C-IRMS the sample is processed first through the GC, as with GC/MS”. Exactly right, the fact the two GC systems use different detectors does not matter a jot. The only difference in GC terms is the “additional ‘plumbing’” since this adds to the ‘hold-up time’ of the GC/C-IRMS system.

Even this point is conceded and correctly applied (in parts) by the majority panel in Point 185. “The additional time added to the RT of the analyte or standard in the IRMS will always by a constant time, regardless of the individual substances or compounds being measured”. One could not think of a better definition of ‘hold-up time’², i.e. the time it takes an unretained compound to travel trough the GC system until it reaches the detector (be that be an MS or an IRMS). To drive home this point the majority panel add “an additional 1 minute” to demonstrate how this will change the relative retention time for a compound and an internal standard with GC/MS retention times of 10 min and 5 min, respectively, which in the GC/C-IRMS will now show retention times of 11 min and 6 min, respectively. So, what do they do? They build the ratio of the absolute retention times, i.e. 10 min/5 min since as per Point 184 “in the case of the MS, the GC is connected directly to the MS and it detects the substance almost instantaneously”, hence making the assumption for the sake of this example the GC/MS does not suffer from a ‘hold-up time’, i.e. ‘hold-up time’ equals 0 min. So far, so good. In the next step they proceed to build the ratio of the absolute retention times for the case of the IRMS with a ‘hold-up time’ of 1 min (affecting both compound and internal standard in the same way), i.e. 11 min/6 min. In doing so they completely ignore a fundamental principle of GC theory that states in order to calculate relative retention one builds the ratio of adjusted retention times, i.e. absolute retention time minus ‘hold-up time’, i.e. (11-1) min/(6-1) min, which uncannily is the same as 10 min/5 min.

² Just in case, here is the official IUPAC definition of hold-up time and hold-up volume.

hold-up volume (time) (in column chromatography), V_M, t_M

The volume of the mobile phase (or the corresponding time) required to elute a component the concentration of which in the stationary phase is negligible compared to that in the mobile phase. In other words, this component is not retained at all by the stationary phase. Thus, the hold-up volume (time) is equal to the retention volume (time) of an unretained compound. The hold-up volume (time) includes any volumes contributed by the sample injector, the detector, and connectors.

t_M = V_M/F_c

In gas chromatography this term is also called the gas hold-up volume (time).

What have we learned thus far? Relative retention is calculated by building the ratio of adjusted retention times for a given compound and the internal standard; adjusted retention time means absolute retention time minus hold-up time; fact. Relative retention is virtually impervious to changes in column length and carrier gas flow and therefore an ideal means to compare and identify compounds peaks recorded on two different GC instruments provided chromatographic conditions are identical; fact. The same objective can be achieved in cases where temperature programming (“thermal ramp”) is different by using the linear retention index method; fact. Thanks to the USADA pre-hearing brief (point 42) and the decision document (point 185) we have also learned neither LNDD nor the majority panel and its experts know how to calculate relative retention correctly.

At face value we have to accept the assertion that LNDD never intended to use relative retention to identify compound peaks in GC/C-IRMS by using the relative retention data from the pre-IRMS compound identification as anchor.

So, if that is not the way how LNDD have identified which compound peak in the IRMS analysis represents which target compound, how did they do it?

Luckily for us, the decision document provides the answer in Point 186: “the lab compares the peaks and the sequence of peaks from the GC/MS and the GC/C-IRMS to identify metabolites and the endogenous reference compounds. Specifically, to identify the substances in question, one would compare the pattern of peak heights and retention times in the GC/C-IRMS chromatograms, anchored by the internal standard with a known RT, with the pattern of peak heights and RTs in the GC/MS chromatogram obtained from the same aliquot of the sample.”

Hang on a minute. How does this work then? Didn’t the majority panel just say in Point 183 “it cannot be expected that the RTs for a GC/MS instrument will correspond with the RTs for the GC/C-IRMS instrument”??? If that is not the mother of all contradictory statements I don’t know what is.

Well, never mind and I shall not even remind you that “the thermal ramp is different” meaning “the chromatographic conditions are different”. Neither shall I keep harping on that peak heights in GC/MS are a function of ion current, which depends how easy a compound becomes ionized and how inclined the molecular ion feels to break up into fragment ions and how many, while in contrast the peak heights in 13C IRMS are proportional to the amount of carbon (in the form of CO2) entering the ion source of the IRMS, and that the peak heights between the two detectors for the same compound are not strongly correlated.

So, where were we? “Peaks and the sequence of peaks from the GC/MS and the GC/C-IRMS” are compared “to identify metabolites and the endogenous reference compounds”.

It is probably petty of me to mention again at this point that it would really help if the standard mixture used for GC/C-IRMS would contain all 6 of the target metabolites and endogenous reference compounds one is looking for instead of being 3 crucially important target compounds short! Attention is therefore drawn again to the fact that in contrast to Mix Acetate (as used for pre-IRMS compound identification by GC/MS) the Mix Cal Acetate used for IRMS analysis contains the Internal Standard and only 3 of the target steroids, i.e. Androsterone [andro], 5α-Androstandiol [5alphadiol] and 5ß-Preganediol [pdiol] are not included.

Just as well then that “the GC column is, of course, the same in both instruments” (Point 188) so there will be no problem for the lab to compare “the peaks and the sequence of peaks from the GC/MS and the GC/C-IRMS to identify metabolites and the endogenous reference compounds. Specifically, to identify the substances in question, one would compare the pattern of peak heights and retention times in the GC/C-IRMS chromatograms, anchored by the internal standard with a known RT, with the pattern of peak heights and RTs in the GC/MS chromatogram obtained from the same aliquot of the sample” (Point 186).

Even if one only intends to compare “the peaks and sequence of peaks” for their relative position in the chromatograms by essentially extrapolating from one GC[/MS] chromatogram to another GC[/C-IRMS] chromatogram, what one should not do, is to employ a different temperature programme and, in addition, GC columns of completely different polarity in both instruments!

There are plenty of examples in the literature (and GC column application notes) showing that a change in polarity of the stationary phase can lead to such changes in compound retention (times) that compound peaks X and Y (for example RT(X)=15 min and RT(Y)=16 min on column A) will swap places, with compound Y now eluting before compound X (for example RT(X)= 18 min and RT(Y)= 17 min on column B).

Clearly the majority panel is aware of this compounding factor assuring us as they do that “the GC column is, of course, the same in both instruments” (Point 188).

According to LNDD’s documentation (USADA 0124) all samples pertaining to this case analysed on the GC/MS system were analysed using the “6890 GC Method”, which employs an AGILENT 19091s-433 column, 30 m long, internal diameter 0.25 mm and film thickness of the stationary phase of 0.25 μm. The manufacturer classifies this column as a non-polar column, stationary phases comprised of 5% phenyl, 95% methyl-polysiloxane. As equivalent columns are listed Rtx-5MS, HP-5MS and DB5-MS.

According to LNDD’s documentation (USADA 0153) all samples pertaining to this case analysed on the GC/C-IRMS system were analysed using a DB17-MS column, 30 m long, internal diameter 0.25 mm and film thickness of the stationary phase of 0.25 μm. The manufacturer classifies this column as a midpolarity column, stationary phases comprised of (50% phenyl)-methyl-polysiloxane. As equivalent columns are listed Rtx-50 and HP-50+.

What have learned here? LNDD have used two different GC columns of significantly different polarity and, hence compound selectivity; fact. There can be no argument about this since this fact is undeniably documented in the USADA discovery documents. Yet, the majority panel quite unequivocally state in their decision document in Point 188 “the column is, of course, the same in both instruments”. To put it another way, the majority panel and the WADA experts have either not bothered to examine the laboratory documentation of LNDD’s analytical procedures, or, if they have they have very conveniently wiped their memory of this particular fact. Either way, again a crucial piece of evidence that at the very least throws serious doubt on the competence of LNDD has been conveniently overlooked and been withheld from the public.

So, we are back where we started; garbage in, garbage out. LNDD have no way of knowing let alone unambiguously proving that the peaks analysed by IRMS are what they claim to be. Is it plausible their peak identification is correct? Perhaps. Can they prove it, and more to the point can they prove that closely neighbouring peaks such as etio and andro, and 5betadiol and 5alphadiol have not swapped places? They can not! If one cannot prove that the outcome of an analytical measurement pertains to one particular compound and that compound alone, the result becomes meaningless.

In conclusion, the experts for the defence have highlighted that the methods employed (and as applied) by LNDD are not fit for purpose no matter how you slice it. Yet, the panel (with exception of Christopher Campbell) saw fit to turn this on its head and say our criticism that target compound peaks have not been properly identified is scientifically totally unacceptable and fundamentally flawed.

It strikes me that in order to justify false positive findings of drug abuse by athletes we have now entered the era of science abuse by the watchdog.

To put it more bluntly, this modern day witch hunt that as in the days of the inquisition works on the presumption of guilt (until proven innocent) is a tragedy for sport and a travesty of justice.

trust but verify

Wednesday, December 31, 2008

The Winnowing: Meier-Augenstein on Watching the Watchdogs

What"s here

Calendar

What they say about TBV

About Me

About Us (Admissions)

Links of Interest

Blog Archive