Friday, September 28, 2007

Retention Times II

I'm having problems editing the earlier post, so this is a continuation in a new one.

Figure 5: IRMS chromatograms stacked; Cal mix, Blank F3, Landis F3

Figure 5 does the same thing we showed in the previous post, theoretically in Figure 3, and practically in Figure 4, which showed how the GCMS peaks were correctly identified. Here we've taken the charts from USADA 361, 345 and 349. We see that the cal mix in this case only contains two peaks that match the urine samples -- the 5aA internal reference, and the first metabolite of interest, the 5bA. These matches are outlined in yellow.

There are no standards in the cal mix for the 5aA or 5bP metabolites, thus no positive identification according to a very strict reading ot TD2003IDCR. This is possibly what Dr. DeBoer meant in his B-Sample observer's report when he said peaks were not identified to "minimal WADA criteria."

It is also puzzling, because LNDD had the right cal mix available with the other metabolites. They did, after all, use it in the GCMS runs, leaving no doubt there.

Absent an end-to-end calibration sample match on the peaks of interest, we are left with two options for identification.
  1. Assume the unmatched peaks in the blank are the peaks of interest, unproven since there is no calibration to back up the claim;
  2. Attempt to match the GCMS that was properly identified to the IRMS that wasn't.
As we've been discussing, there seem to be two ways to do the latter, relying on the "visual gestalt" of the pattern of peaks. This assumes that the peaks are in the same order ("of course" says the Majority), and that no significant peaks appear or disappear between the representations. The panel adopted this approach, even though there is nothing in the technical documents to support it as a methodology. The correct thing would have been to use a cal mix with the right compounds present.

The second method is to attempt to work out the math to map the retention times from the GCMS to the IRMS. The Majority says this can't be done. Herr Doktor Professor is evidently more comfortable with the math and the physics and tried to do this.

Attempting to map the times requires accounting for all the significant differences, and Meier-Augenstein believed he had.



Figure 5: Shimadzu's catalog of relevant factors



Poking at the "details" button on the Shimadzu "relative retention" page, we see why it seemed like a good thing to try:


Figure 6: Shimadzu explains why relative retention is useful


The advantageous point of relative retention is that it depends only on the ratio of distribution coefficients and the effects from some parameters, such as column length and carrier gas flow, are bascially cancelled out.

However, there are some limiations for relative retention. Measurement of errors will increase for target peaks located far from the reference peak.

(emphasis added).

Are relative retention times useful for comparing results across instruments, which the Majority denied?

This journal abstract suggests otherwise:
Using experimental RRT data for 126 PBDE congeners from the literature, predictive regression models were built for seven individual GC capillary columns differing in stationary phases. Each model includes four descriptors which included Wiener index, Randic index, polarity parameter, etc., selected by CODESSA. High predictability was obtained. High multiple correlation coefficients R2 indicated that >98.5% (except for stationary phase CP-Sil 19) of the total variation in the predicted RRTs is explained by the fitted models.

Or this one:

Phenolic acids and related compounds were separated by gas chromatography using three separate columns. One of these columns was coupled to a Fourier transform infrared spectrometer. The trimethylsilyl derivatives could be separated and identified by comparing the relative retention times of the three different columns. However, where there was overlap, the accompanying infrared data clearly distinguished between the questionable derivatives, thus enabling characterization of all derivatives.

This one seems particularly apropos:
Since retention times vary with the column length, type of stationary phase and temperature, suitable parameters for comparison include relative retention times and so-called retention indices (RI). Relative retention times are simply the ratios of analyte times to the time of a chosen standard compound.

On values for matches, this EPA lab related document says,

9.8 Qualitative Confirmation- The relative retention time of methyl parathion in samples is tracked by comparing the relative retention time (rrt) of sample to standard within "3 standard deviation. In addition, a second, dissimilar GC column is used to determine if methyl parathion and surrogate were positively identified by both rrts.

The DEA has used relative retention to identify opiates and steroids:

Gas Chromatography (GC): The chromatogram is not shown. Methandrostenolone, dehydrochlormethyltestosterone, and stanozolol eluted at 7.96, 9.32, and 10.29 minutes, respectively. The peak shape for stanozolol was broad in comparison to the other steroids. The mixture in the yellow tablets was not formally quantitated, but was estimated as roughly 100 : 5 : 2.5 dehydrochlormethyltestosterone : methandrostenolone : stanozolol. Table 1 lists the relative retention times for cocaine, heroin, and six other steroids with similar chromatography.

Table 1. GC Relative Retention Times.

Drug (GC)
RRt
Cocaine 0.580
Mesterolone 0.813
Testosterone 0.822
Heroin 0.829
Methyltestosterone 0.836
Methandrostenolone 0.854
Testosterone Acetate 0.882
Fluoxymesterone 0.990
Dehydrochlormethyltestosterone 1.000
Stanozolol 1.103
Testosterone Isocaproate 1.188

It therefore appears that the assertion by the Majority in paragraph 188 that use of relative retention times to provide comparisons across instruments is "unsound and without any reasonable scientific basis" may itself be categorized exactly the same way. It suggests the Panel did not understand the science, or chose not to understand the science, and/or was misinformed by the "independant expert" or Brenna's testimony.

It should be noted that an even more accurate methodology is the use of "Kovats indexes", but those are not appropriate for the chromatograms at issue here. Their use requires a pair of internal standards, one at the beginning, and another at the end, after the target peaks. Since LNDD has not provided a later standard, the only methodology available is the use of relative retention times.

How we got here
  1. Brenna, Ayotte and Mongongu testify relative retention time is how peaks are matched between GCMS and IRMS.
  2. LNDD did not use an IRMS calibration standard to unambiguously identify 5aA and 5bP in samples.
  3. LNDD did not use an IRMS calibration standard with a trailing internal standard admitting use of Kovats Indexes.
  4. The only quantitative method available to match the GCMS and the IRMS peaks is the relative retention time.
  5. Herr Doktor Professor calculated the RRT matches, and they are at least 5%.
  6. TD2003IDCR says 1% or 0.2 minutes.
  7. Peak identification appears not to be done in an ISL-acceptable manner.
  8. Brenna changes story, testifies [absolute] retention times can't work for lots of reasons.
  9. Brenna says "relative retention times won't work for reasons I discussed this morning"
  10. Brenna says the match is done visually.
  11. Panel accepts "visual gestalt" as an acceptable peak identification technique, and lambastes Meieir-Augenstein and Davis for suggesting RRTs are relevant.
Is Herr Doktor Professor's math right?

Over at DPF, a number of people who are not in the group generally associated with sympathy towards Landis' arguments have repeated what Meier-Augenstein did, and obtained identical results, down to the 7.2% mismatch of RRTs on the furthest compounds in samples.

[ still working on the math part to get a clear example ]

30 comments:

DBrower said...

An emailer asks "what's the blank? What's the F3?"

When testing is done, we start with three sets of liquids. The first is the athlete's sample, the second is calibration mixtures in pure solutions (acetate, for instance), and the third is a "pool" of "blank" urine that is "clean" -- from infants in hospitals is one method we've heard to get it.

The blanks sometimes get spiked with control substances, depending on the test.

The samples and blanks are divided into testable units, called "aliquots"

Both sets of urine go through chemical preparation that is intended to separate the contents and remove unwanted substances. Here, we get three different separated "fractions", F1, F2, and F3. These fractions and the calibration mixtures are run through the GCMS and the IRMS separately to get the necessary measurements. Each run through a machine will have output for the cal mix, blanks, and the athlete samples, and there will be runs for each fraction. That's why there's a lot of pages in the lab document package.

As a terminological issue, the urine once turned into fractionated samples, is called a "matrix". When the chemical separation isn't very good, the resulting chromatograms have "matrix interference", which is to say lots of overlapping peaks and unknown substances present to complicate things.

In this case, everything comes down to measurements conducted on the F3 fraction, which is unique in not having had known substances in the calibration mixtures for two of the three metabolites of interest.

Hope this helps.

TBV

Unknown said...

TBV,
In this particular case, did they spike the blank with synthetic testerone containing carbon-13? If not, it seems to me that it would be impossible to positively identify it in Floyd's sample, unless you had prior knowledge as to what time it should elute. If they didn't, and they can't poitively predict elution time from the GC/MS, it would seem that they really don't have reliable evidence. Is this the essence of the debate?

Mike Solberg said...

Come on, TbV, waiting for you to work out this math is almost as bad as waiting for the decision. I hope it doesn't take as long.

(Just razzin' ya!)

syi

DBrower said...

Dailbob,

They never spike the blank with testosterone, only metabolites, and sometimes they don't spike it except with the internal standard.

The point of using a blank spiked with only the internal standard is apparently to help convince themselves that they are not identifying a known-clean sample as one that is doped. Unfortunately, sometimes LNDD's "blanks" appear to be positive.

TBV

Anonymous said...

David, You should have highlighted the second paragraph of the Shimadzu quote as well. It says the same thing that the majority wrote, that the RRT method doesn't always work. That the error will be stretched farther away from the internal standard. That's exactly what happened in Floyd's samples.

The other links refer to different GC columns, which don't apply to different machines. The one link to different types of machines, GCMS and LCMS, doesnt entail comparison across the machines.

The math is easy. Just take the ratios:

RT metabolite/RT IS

Find that ratio for the GCMS run and the IRMS run. Then divide the two to find the percent difference.
I did it for fraction 3 in the A sample. Pregnane will give you the biggest difference since it is farthest from the IS (just like the Shimadzu quote says). I got

GCMS - 19.55/10.7 = 1.79
IRMS - 1640/865 = 1.90

1.79 is 94% of 1.9, a 6% difference.

Unknown said...

TBV,
Based on your response and the additional info you just posted:

"Since LNDD has not provided a later standard, the only methodology available is the use of relative retention times"

I think Botre committed fraud/perjury (because he certainly knows better), and if I were Floyd, I'd see if there is a basis in law to sue him directly.

Excellent work!

DBrower said...

Ferren,

We're seeing the numbers don't work, and there are reasons why they don't.

The problem is, RRT is the only numeric, quantitiative match we can possibly apply given the methodology and the data that were collected.

Faced with the TD2003IDCR requirements for quantitative match, LNDD does not use a methodology that produces numbers that match.

The LNDD might have produced numbers that matched by changing their procedures in a number of ways. (1) Using the a cal-mix with all the metabolites in the IRMS; (2) using cal-mix with a trailing standard in both that admitted the use of Kovats indexes; (3) maintaining consistency across the GCMS and IRMS in other variables (flow rate, temperature ramp).

The LNDD did none of these things, and it is not surprising the numbers that do come out don't support the conclusions for peak identity. These are the unspun facts.

Faced with this, the Panel declined to declare an ISL violation that would have flipped the burden of proof. Instead, it declared the rule didn't apply, finding no problem with the LNDD's peak identification. It invented a "visual gestalt" standard that is mentioned nowhere in the ISL or the technical documents that supplement it as an acceptable methodology.

And, the panel further asserted that the scientists who pointed out the numbers don't work because the LNDD didn't do things right were unsound in their science and interpretation of the rules.

The rules seem to be: any rule we have may be broken to support a conclusion of guilt, and if you point out where we've done this, your argument is unsound.

TBV

Anonymous said...

I still think that's a very tough argument. You concede that the GCMS ID was correct and that the IS and 5bA were consistent across all the runs. That's a good bit of evidence that the ID was OK. And, like I pointed out above, the Shimadzu quote you provided backs up the arb's decision.

I can understand how you can expect that the technical document should be more specific, but I don't see the ISL violation the way things stand now. Especially since the technical doc seems more ambiguous when you read the entire first section (not just the part quoted by McLaren and Brunet):

"In those cases where shifts in retention can be explained, for example sample overload, the retention time criteria may be relaxed".

Mike Solberg said...

But, Ferren, I think you jumped one or two steps too far forward. I think TbV's point (at least at this point in his unfolding discussion) is not about the final conclusion of the majority, but about the reasoning.

It seems clear that TD2003IDCR only contemplates quantitative methods for matching the GCMS to the IRMS. As Dave pointed out, it doesn't even get close to talking about "eyeballing" it (or "visual gestalt"). How does that method, which TbV has shown in his graph comparisons is far from perfect, fit with the requirements of the TD2003IDCR? The TD does say "The Laboratory must establish criteria for identification of a compound." But all the examples cited provide quantitative means of matching, not visual. That's not even on the radar.

Dr. M-A tried to figure out, "Well if they didn't match the GCMS to the IRMS by any of these normal means mentioned in the TD, what did they do?" He was dumbfounded. When he realized that they eyeballed it, he said, "Well, okay, if we are eyeballing it, that means we are going with Relative Retention Times, so let's check the math." The math, as you have shown, is off by 6%. The majority said that the 2% limit doesn't apply (because it is across machines), but they didn't provide any idea of what limit would apply. The TD says "the criteria may be relaxed." But relaxed by 300%? (2% to 6%) What variance would apply? M-A's point was that if this is the method you are going to use, then the 2% has to apply, otherwise you have no standard at all.

So, with the information we have from LNDD, we either have the 2% limit valid and not met, or we have no standard at all. Or we have "eyeballing it" alone, non-quantified. Hardly in line with the intent of TD2003IDCR in my view.

So I think Dave is saying that there should have been a burden flip. USADA should have to show that LNDD's method didn't lead to the AAF. I think that burden flip would have opened up a bunch of different issues, but this entry is long enough!

syi

Unknown said...

Mike,
Great summary! In my view (for what it's worth), if you don't have positive peak identity (by recognized analytical methods), then there should have been no case. Minimally, the burden should have been flipped, in which case the USADA would have had a tough time arguing that inability to positively identify the peaks would not have affected the AAF.

Laura Challoner, DVM said...

Think about the can of worms the majority would have had to have dealt with within the 10 day time limit had they ruled that Landis proved an ISL violation, thus flipping the burden back to USADA:

Is the ISL violation so severe that the Panel could not be comfortably satisfied that the violation did not cause the adverse analytical finding?

Although the ISL violation rendered the result obtained meaningless, could/would the majority then turn the vast unplowed ground of additional "B" sample tests to either establish "new" ground to conclude doping had occured or re-establish re-fortify whatever was left standing of the Stage 17 "B" sample adverse analytical?

Would any of the other stuff it rejected, the testimony of Papp and Lemond as well as the "misconduct" USADA accused Landis of during and after the trial (as alleged to have established Landis' "bad character", thus consistant with a doping athlete?

All that is much too hard and worthy of another 84 pages at least.

The Panel came into the hearing assuming Landis doped. The science was way too complicated. The decision evinces the quickest way to affirm the belief of the majority without having to come out and address the tough issues and say what really should have been said if they truely believe it: " Landis won the science because he hired top shelf help USADA could not duplicate but he's a doper and we are comfortably convinced of that based on the evidence and then they would have to say exactly what evidence they believed.

That would have blown the lid off the whole system and they were not prepared to do that. Quite the opposite, in fact. Much simpler to listen to the WADA voice in their ear (Botre) and find no ISL violation, especially since their decision is not subject to review, except here.

m said...

There are also a couple of problems glossed over by TBV's analysis.

1. Does the technical standard even a apply to relative retention times? By it's terms it only refers to "retention times", not "relative retention times". There's a good reason it might not. Errors are magnified for RRT's as opposed to RT's. So the 1% standard might be appropriate for straight retention times, but not for RRT's. Some more relaxed standard would be applicable if at all.

2. The technical standard shouldn't apply to GC's done on two different machines. TBV hasn't been able to rebut this despite his examples. Most, if not all, the examples involved RRT's on one machine. It's clear that RRT's will vary if certain conditions are not identical, in particular temperatures and stationary times. According to what I've read the temperatures were different between the GC-MS and the GC-IRMS in this case.

3. The technical standard by it's own terms says the 1% standard can be relaxed "where shifts in retention times can be explained". That could certainly be the case here, because of the different temperature ramps.

4. Was MA's testimony about the 7.2% difference specifically about the retention times or the relative retention times. The arbs decision refers to "retention times". Moreover, what sample and what analyte was MA referring to. If if didn't involve the 5a and 5b Andro it could be irrelevant.


One has to ask why there must be a numerical criteria for identification? Every day criminals are convicted based on eye witness identifications. Sure a DNA sample would be better, but it's not required.

So in this case, the 4 middle peaks of the GC-MS had to match the 4 middle peaks of the GC-IRMS, otherwise you would have a peak mapping onto nothing. TBV couldn't ever get around that visual fact, for all his tinkering with the graphs. It was like a key in a lock.

Mike Solberg said...

In my 1:10 comments, the 2% should have been 1%, and thus the difference should have been 600%, not 300%. (Of course, that just makes my point stronger.) Sorry.

syi

Mike Solberg said...

m,

Re your 1, 2, and 3: It's not clear whether the 1% standard should apply across machines or not. Obviously Botre taught the majority that it did not (and Campbell seemed to buy into it as well). But Dr. M-A's point was that if you are going to rely on eyeballing it, SOME qualitative standard must apply, and the only available candidate is applying the 1% across machines, and to the relative retention times. He certainly doesn't think it is a good method (TbV details the good methods of connecting the GCMS to the IMRS, but LNDD didn't do any of those), but he was trying to apply SOME analytical precision and SOME standard to the situation.

If some "more relaxed standard" should apply, name a number. What standard? A 600% difference? The majority applied no standard at all. And if it is going to be a "more relaxed standard" (even up to 600% greater than stated), shouldn't you then provide at least a minimal description of why the standard was relaxed? "The temperatures were different" - "The pressures were different" - "My thumb got in the way" - something? LNDD provided nothing.

ugh...out of time for now.

syi

m said...

Mike,

Can you deny that the GC 4 middle peaks match and map onto one another?

m said...

"If some "more relaxed standard" should apply, name a number. What standard? A 600% difference?"

That is for the scientists at WADA to promulgate. Absent that, the arbs are not qualified to impose an artificial and unrealistic 1% standard.

Mike Solberg said...

m, 7:58:

Just to be clear, which two graphs and which peaks do you mean?

m, 8:01:

It seems one of two things must be true: 1) either Dr. M-A was right and the 1% should apply across instruments (accounting for the obvious corrections, like the time in the combustion chamber), and that is the correct reading of TD2003IDCR, or 2) TD2003IDCR is fine with "eyeballing it" with no quantitative support, even though it makes no mention of anything of the sort.

Unknown said...

M,
You ask: "One has to ask why there must be a numerical criteria for identification?"

The answer is that if you can't predict the time the substance is going to elute, you can't positively identify it. You may feel, qualitatively, that because the peaks between the GC/MS and the GC/IRMS look similar, that they must be the same. But I don't think you ruin a guy's life based on a gut feel.

Anonymous said...

Mike, I don't see how you could apply the 1% rule with a caveat. Either the technical document strictly applies or it doesn't. The obvious conclusion is that the technical document is not specific enough to govern the issue. I can see how you feel like Floyd was suspended by a technicality, but I don't see how the panel had a rigid guideline to follow.

m said...

dailbob,

"The answer is that if you can't predict the time the substance is going to elute, you can't positively identify it. "

This is incorrect in this case. Despite not meeting the 1% standard, the substances eluted at a close enough interval so that the GC graphs clearly showed a match.

I ask again, how can you claim that the 4 middle peaks of the GC-MS and GC-IRMS don't match or map onto one another?

m said...

Mike,

Look at TBV's graphs below of the Landis F3 sample. Figure 1a is the GC-MS and figure2 is the GC-IRMS. The 4 middle peaks including the 5a and 5b andro map onto one another. You can't shift them over like TBV did, without leaving a peak mapping onto nothing.

http://bp2.blogger.com/_xX3hgPBOgag/RvbJfIA2PWI/AAAAAAAAAkw/xIVwWkS8ijI/s1600-h/landis-f3-gcms-usada-348.png

http://bp1.blogger.com/_xX3hgPBOgag/RvaxT4A2PUI/AAAAAAAAAkg/hVzgCMjNxNU/s1600-h/floyd-f3-irms-zoom.png

Unknown said...

m,
You say that the peaks eluted "close enough" that they clearly show a match. My question (and everybody else's): by whose (or what) standard? We always run standards prior to running the test material (usually with a solvent wash in between to prevent any carry-over). I don't know how else you can say the substance is positively identified, i.e., this is the standard way that everybody does it (to my knowledge at least). The fact that LNDD did not run the other two metabolites in the cal-mix is incomprehensible to me. I have no clue how they thought they were going to positively identify the other two metobolites (in a way that would stand up in a court of law). I know you say compare the two graphs (you're right, the four middle peaks look similar). I just couldn't vote to destroy a guy's life unless the data were obtained using the standard method that everyone uses to make positive identification.

Regards

Mike Solberg said...

m 11:28, i'm not avoiding the question, just very busy. maybe later today.

syi

m said...

Mike,

No sweat. I've asked this question a dozen times here and would be interested in an informed response.

Larry,

"The fact that LNDD did not run the other two metabolites in the cal-mix is incomprehensible to me."

As I understand it from reading the opinions, LNDD did run some reference standards on each machine separately and these were within the 1% limits and were not "disputed" by MA.

Reading both the majority and dissent I can find no criticism of the absence of the other two metabolites in the cal-mix, if indeed that is what occurred (I'm not sure it did). Looking at Landis's reply brief which raised some arguments about the Cal-Mix, there was no mention of this problem. So at this point I don't see any evidence why it is a problem, or why it is a sufficient problem that the visual matching of the peaks was not valid and reliable.

You come very close to admitting the peaks match, but don't quite want concede that point. -)

Mike Solberg said...

Ugh. I just spent 45 minutes on a response then got an error. I'll try again, it should be easier the second time around:

So, m, here is my attempt at a thoughtful response to your question:

"I ask again, how can you claim that the 4 middle peaks of the GC-MS and GC-IRMS don't match or map onto one another?"

First of all, I am not sure my understanding of the science is up to the task, but I'm working on it. With that...I guess I have three things to say.

1) I would first respond with a question: Where in TD2003IDCR (or any other relevant TD) do you see any justification for matching the IRMS to the GCMS by means of this visual pattern matching? I think the point TbV has been making is that this method doesn't meet any of the requirements of the technical documents, so there should have been a "burden flip," and USADA should have to do prove that this method is scientifically sound and didn't lead to the AAF. That would have made the burden of proof much more difficult to bear and could well have changed the outcome. That may be a "technicality" but the technical documents and rules must be there for a reason (beyond my understanding scientifically), and they should be followed.

2) Even if going with the visual match, the peaks have to have some relation to the relative retention time. If I understand this right, all other conditions being equal, the relative retention time of the target peak and the control peak (in relation to the unretrained peak) (I'm not sure I said that right) ought to be the same in the IRMS as in the GCMS. The RRT's are not the same.

But the conditions must have been different. If the conditions were different then things are up in the air. OMJ, over at DPF, who is generally makes similar points to you, has addressed this and made some pretty big concessions.

He wrote: "I still don’t understand why they would change the conditions. This is not a minor point. If you change the conditions, you could change the order or the pattern of the peaks. The changes made clearly had a major effect on retention times. In MS, A eluted at 639 seconds vs. 867 seconds in IRMS, a difference of 228 seconds. Much or all of this difference could have been due to the combustion process in IRMS. However, the times for 5a were 934 and 1337 seconds, respectively, a difference of 403 seconds. Since by the arbs’ own statement in article 185 there is a constant difference ascribed to combustion, at least 403-228 = 175 seconds resulted from different chromatographic conditions. This is fairly substantial, and it seems to me, demands some kind of evidence that patterns were not changed. I would not expect temperature changes to affect the order in which peaks eluted (though certainly in a trial like this, that point should be addressed), but they could certainly affect which peaks elute closely with other peaks, which is critical to identification."

3) And finally, perhaps more to the point of your question, yes, of course, visually the pattern of peaks in the F3 GCMS and the F3 IRMS look similar (generally from left to right: series of peaks, small peaks, two big peaks, no peaks, one big peak). But I don't think I know enough about the science to say what the significance of that is. Dr. M-A, said it wasn't good enough. Dr. Brenna said it was, leaving a "contradiction of experts." Generally, I though Dr. M-A, sounded like "the brighest guy in the room" so I tend to trust him. And reading the majority opinion and Dr. M-A's testimony, I think they just didn't understand what he was saying. Dr. Botre was presumably supposed to settle such disputes, but honestly, I can't trust his motivations given his position and WADA omerta. To me, that's obviously a big problem with the system, or at least the way in played out in this case.

No more time now.

syi

Unknown said...

m,
"So at this point I don't see any evidence why it is a problem, or why it is a sufficient problem that the visual matching of the peaks was not valid and reliable. You come very close to admitting the peaks match, but don't quite want concede that point. -)

The point I have been trying to make (probably very poorly) is that visual matching to identify substances run though a GC is never a valid method. A few points/examples:

When we run a known standard in triplicate, the elution times only vary by thousandths of a second. The difference in times between the GC/MS and the GC/IRMS for the Landis F3 may look visually close, but the 7.2% difference calculated by Dr. M-A and people over at DPF represents light years in GC time. This forced the discussion during the trial to retention times (and/or RRTs), since there were no internal standards for two of the metabolites in the cal-mix . Note that having to use retention times between two different machines is an extremely poor second choice for peak identification, because of all of the issues issues listed previously (all conditions need to be the same between the two machines to have much hope for this to work). LNDD then really screwed things up by not using the same temperature ramp on both machines. This made it entirely possible that the peaks wouldn't elute in the same order. So, given that we can't match an elution time to either a known standard, or map it to the GC/MC elution times using RTs or RRTs, and the conditions between the two machines were not the same creating the possibility that the peaks didn't come out in the same order, it starts to get pretty difficult to have confidence in this data.

You have pointed to the fact that the middle four peaks between the two runs look the same. Please note that this may, or may not, have any meaning. The IRMS is measuring an ion fraction, which is different than what the GC/MS puts out. Consequently, it's entirely possible that a matching peak between the two machines will be entirely different heights.

Given all of the above, the data from this Landis F3 is meaningless, in my opinion. I don't remember the exact context, but sometime during the trial when Dr. M-A was being cross examined concerning this, he asked "How do they do it then (i.e., identify the metabolites), Divine intervention?" I understand why he asked this, now that I've seen this data.

Finally, note that I'm not trying to be challenging. Just trying to be helpful. Hope you found it that way.

Regards

m said...

Mike

"But I don't think I know enough about the science to say what the significance of that is. Dr. M-A, said it wasn't good enough."

I was hoping you were a scientist who was familiar with the science involved. Thanks though.

m said...

Daily Bob,

Thanks.

1. Are you a scientist, if so can you tell me what sort of issues you have knowledge of, in particular GC-IRMS? I am not a scientist.

2. I understand your argument as applied generally, i.e. that shifted retention times may lead to mis-identification of peaks. But is it applicable here?

If you examine TBV's figure 6 below, in which he attempts to account for the 7% shift in retention times between the GCMS and GC-IRMS you will see that the peaks line up slightly off. However, because the middle 4 peaks are separated from all the surrounding peaks by a good interval, there is no way that the GC-IRMS peak can match up on any other peak. So there is no misidentification here. Remember, the GCMS and GC-IRMS both have the same substances here, there cannot be a peak in one that isn't in the other even if they may be in different places.

http://bp0.blogger.com
/_xX3hgPBOgag/
RvgE3IA2PeI/AAAAAAAAAlw
/HK5bemojpjs/s1600-h
/Figure-6.png

(you have to put the url above into one line)

3. "when Dr. M-A was being cross examined concerning this, he asked "How do they do it then (i.e., identify the metabolites), Divine intervention?" I understand why he asked this, now that I've seen this data."

This is why I doubt his testimony. This is hyperbole that a practiced expert witness uses to embellish his testimony. The knowledgeable scientific posters on DP all say the visual identification was reasonable if not obvious.

3. As to problems arising from the different temperature ramps, this was apparently not addressed at all in the testimony and only raised as a problem by OMJ on DP. I would not be so bold to claim this makes the GC's meaningless in this particular case, without more expert evidence.


First

Unknown said...

m,
1.) I am a scientist. I work in R&D for a large consumer products company. We have LC and GC/MS, but do not have GC/IRMS here. What I know about GC comes from working in a lab where we've had it for a long time (I've been there 27 years). So, I'm pretty familiar with the principals (enough where I'm comfortable making the comments I've made). However, I won't pretend for an instant to be on the same plane with guy like Dr. M-A (which is maybe what you're looking for). I don't think the fact that we don't have IRMS has much bearing on the discussion you and I are having, because the root of the problem has more to do with whether or not the peaks of interest can be read visually, or do they have to have a reference standard to positively identify them. In other words, I think you and I would be having the same conversation if they had run them through two GC/MSs, because the different temperature ramps would have caused a shift and/or the peaks to change order, and with no known standard in the cal-mix, no real way to identify them.

2.) I printed the graph from the url you gave. These are shifted enough that I would not be comfortable saying that there is a poitve match here (remember I'm used to seeing a standard and known substance I'm looking for not separate by more than 6 - 8 thousandths of a minute different. This is typically what we see with the things we work with). If you worked in our Analytical Chemistry department, and brought me these results saying you felt very strongly that this sample contained components X and Y, because of the reasons you give above, I'd tell you to go back, make the column conditions the same, and run standards for X and Y in the cal-mix.

3.) You mention that there are knowledgeable scientific posters over at DP that don't have issues with visual identification. If you tell me which particular thread and page your reading, I'd be happy to go read their stuff.

4.) m, please believe me when I tell you that the different temperature ramps is a pretty big deal. I have little doubt that it is what caused most, if not all, of the problems with retention times, i.e., making it impossible to get the peaks to align no matter what kind of math you did.

At the end of the day, you and I may just have to agree to disagree. I guess I'm a bit of an "anal science guy," and I don't know any other way to positively identify a substance without running a standard for the substance you're trying to identify. LNDD didn't do that (at least for this run), so I could never be convinced to use this to convict somebody.

Best

Mike Solberg said...

You know, I've gone back to read some of the older stuff on DPF, and I am reminded of just how stupid I am. Duckstrap explained all this quite clearly months ago. Duckstrap in the "the key issue" thread on May 28th:

"The relevant part of the ISL that dB(TbV) refers to comes from TD2003IDCR (previously known is duckstrap's favorite document), for which the opening paragraph the Chromatographic Separation section (p. 1, right at the top) starts with: "For capillary gas chromatography, the retention time (RT) of the analyte shall not differ by more than one percent or 0.2 min from that of the same substance in a spiked urine sample, Reference collection, or Reference material, analyzed contemporaneously". We previously had paid scant attention to this section. However, LNDD has clearly not met this requirement with the IRMS separation, and haven't even tried to, and this appears to be a clear violation of the ISL.

Is this significant enough that enough that the results of Landis' sample might be wrong?

Seems to me the burden is on USADA/LNDD to show that it isn't. A couple thoughts, however. First, I don't think LNDD have misidentified the relevant metabolites (edit: m, this concedes your point), and here, Brenner's testimony is probably accurate, in that the overall pattern of large peaks is consistent between the GCMS and GC-IRMS samples. So, I would not say that the peaks have been misidentified (except for the internal standard--which is in such a crowded part of the chromatogram that without the mass spectra, it is impossible to say). The problems are with the minor peaks, which are missing from the IRMS chromatograms. This was Dr. M-A's point, as I understand it. Without accurate retention times in the IRMS chromatograms (edit: that is without reference standards for each metabolite) and mass spectra, it is impossible to say that the little peaks that are now missing, but were close to both 5aA and 5bA do not interfere--in fact, that little one between 5bA and 5aA (USADA 0348) probably does interfere with either or both of them. Since we have no mass spectra or idea of the CIR of the interfering peaks, and because the integration limits of the 5bA and 5aA are inappropriate because of the peaks' close proximity (see you3's posts above), this leads to serious, significant probability of a large negative bias on the apparent CIR of the 5aA."

m, I'm sure you would appreciate going back and reading that thread.

So, m, again, yes, the peaks visually look the same, but what is the significance of that? That's my response 3) above. Now I understand the significance. The answer is "not much." The fact that the peaks generally fall in the same pattern on GCMS and GC/C/IRMS is no guarantee that LNDD measured (only) the right thing. And as duckstrap points out, they probably didn't.

This failure to apply TD2003IDCR is an egregious violation by the majority.

syi