Saturday, December 15, 2007

Getting around to Specificity

Mr. Idiot posted the following in a comment to the Friday Roundup, but let's give this more obvious treatment.

I would like to see the specificity discussion get off the ground. So I'll begin by describing the way in which I think specificity SHOULD be assured in a given sample.

[MORE]


But first, my understanding of the background (feel free to add or correct). GC/C/IRMS is not that old. It wasn't used at all before 1990 and because it is expensive and complex wasn't used very widely through the 90's. Because all identifying properties of a substance are lost in combustion, it has limited applicability in the world of analytical chemistry. Dr. WMA has been one of the world leaders in exploring ways GC/C/IRMS can be useful, and in developing processes for given applications.

From the beginning of the GC/C/IRMS method, achieving certainty that you are measuring what you are supposed to be measuring has been the key concern. Some of the early uses of IRMS were in forensic science, in which results had to stand up in a court of law and thus had to meet high certainty requirements.

At first, the only way to know what you were measuring in the IRMS was to compare retention times with the same sample run through a GCMS, under the same chromatographic conditions. But "the same" was never "identical," and "the same" was apparently not good enough for forensic science.

So in the mid 90's WMA developed a way around the problem. He took a GC (to separate the different substances in a sample, obviously), and as the sample was coming off the GC he split it - with part going to an MS for clear identification, and part going to combustion and IRMS for isotopic analysis. This solved the "peak identification" problem because the chromatographic conditions were identical (apparently it is easy to compensate for the combustion phase and whatever other time lag there is with IRMS because it isn't an issue in the stuff I have read).

This combo method goes a long way to solving the specificity problem as well, because your trouble spot is limited to good peak separation in the MS. If you have that, you are good to go, especially if you use complete mass spec data. If the mass spec data reveal co-elution problems, you run things again, change the conditions somehow and get better separation. Some co-elution can be compensated for, but even Brenna (I read somewhere) puts a clear limit at three intersecting peaks and with just two peaks overlapping by more than 70%. If you have either of those, you can't get good numbers and you have to run things again to get better separation somehow.

The process WMA developed was apparently technically tricky, as he had to fiddle with several types of connections / fittings and such. But that basic principle (of GC - split one way to MS, one way to IRMS) has been perfected in the world today. See here:

http://www.massspecsolutions.com/isotope/IRMS_ToF.htm

But remember that this was all still developing in the late 90's and it sounds like that type of combo machine is still a big deal today. And also remember that it wasn't until '94 that Aguilera et al. reported that you could test for exogenous testosterone based on the difference in 13C/12C ratios compared to endogenous.

I think Catlin and UCLA were the first to get approval to use IRMS for exo-t testing in 1997, and it has spread more or less quickly and broadly since then. In a USADA symposium in 2003 most of the main players were present, and at that time they recognized that IRMS methods had developed widely between the different labs, and there was great need for more uniformity. See various parts of this document:

http://www.arniebakercycling.com/floyd/other_links/2nd%20USADA%20Symp%202003.pdf

At the time of this symposium, it sounds like WMA's combo machine was not in wide use in anti-doping labs, and we know that at least LNDD still doesn't have one today. To emphasize that this IRMS stuff was still all developing in 2003, I'll note that Tom Brenna was still emphasizing that "the most reliable isotope ratio results are obtained when GC-C-IRMS peaks are strong and cleanly resolved from all other peaks." And speaking of the evidence from a particular data set, he said, "Finally, these results demonstrate as dramatically as any, that precision is no assurance of accuracy, particularly in continuous flow IRMS. Good calibration standards are essential." (emphasis original)

Finally, I would note that the document that presumably controls GC/C/IRMS assays (TD2003IDCR) was written in 2003. As a non-scientist, that is very surprising to me, on two fronts. First, it sure seems like there was a lot still up in the air about IRMS in 2003 (not necessarily about the basic science, but about the implementation), and second, that's ancient history in IRMS terms. With the perfection of machines like the one noted above, they are able to do things much better now than they were in 2003. People are still being punished based on what seems like outdated equipment and processes.

Now, back to my basic original point here. It is clear from the history (and I put a lot of that together from something WMA wrote in his chapter about IRMS in "Forensic Human Identification" by Timothy Thompson - which is largely available through a Google preview, but which I can't link here), that the way to assure specificity in GCMS is with the complete mass spec data (we all knew that already). And the way to assure specificity in IRMS is to use chromatographic conditions as identical as possible to the GCMS, preferably by combining the two machines, to ensure the conditions are identical.

Identical chromatographic conditions were always the goal here - the need is taken for granted in all IRMS work from the very beginning. All the studies involving IRMS document the same conditions between GCMS and IRMS as part of the standard description of the study. In places they even say that you should not only use the same type of column, but a column from the same batch (same production run) so they are as similar as possible.

So, that's my understanding of the background, and the general scientific expectation (not the legal WADA expectation, which is a different issue) of how specificity is supposed to be assured in GC/C/IRMS.


This was followed by a comment from Michael:

So basically, Brenna and WMA contradicted themselves in the hearing, but deep down believe the same thing regarding the testing.

Am I understanding you correctly?

Is is possible that Landis' team just didn't have enough time to present their case properly?

We think, "yes", and the LeMond and Papp digressions at hearing just chewed up defense time that would have been better spent making the real case.

68 comments:

Russ said...

syi,
Absolutely fantastic piece of research!

I appreciate the hard work you, Larry, Ali, TBV, Duck and others have put into this technical work.

When I was a budding teen techi back about 1960, one of my mentors explained to me that I was, in fact, a scientist as were any who seriously pursue truth and understanding in science. The context for that was amateur radio, a serious hobby.

It does not require the degree, it requires an analytical approach and a continued (never ending) search for the truth! Your perception of the facts and regard for what the facts are will evolve with time and effort.

Great Work!
Russ

m said...

Mike,


This seems to go more to identification, rather than specificity.

Did Shackleton use a splitter or splitless.


TBV,

I sent you a post about my graphs and the identification issue, but it hasn't been posted yet. Did you get it?

DBrower said...

M, apparently I didn't get it.

TBV

Larry said...

Mr. Idiot -

Great post and great contribution to our understanding of the issues. However, I agree with M, to the extent that you're discussing both specificity and peak identification, and that you seem to be freely moving from one topic to the other.

From a legal perspective, if we want to discuss "specificity", we first have to define "specificity". From a scientific point of view, maybe we're defining "specificity" as the absence of any interference between two peaks that could have a material affect on the result of our test. From a legal perspective, this definition is not helpful -- it just requires us to come up with a second definition of what interference is material, at which point we could just dump the first definition and rely on the second.

But this point raises the question of whether you want to discuss law here, or science, or both. If you DO want to discuss law, I've posted a couple of law-related questions for you over on the 5bA Anchor thread, and we could drag them over here. Or you could just try to define "specificity" and we could move on from there.

You question how a four-year old rule such as TD2003IDCR (three years old at the time of the S17 testing) could possibly still be the governing rule today. A three-year old rule is a new rule in most legal contexts. However, you're raising a broader question, which is how the rulemakers can write rules setting technological standards, in an environment where the technology is constantly improving. Well ... you can constantly rewrite the rules (which isn't going to happen!). Or you can write rules in a more flexible (and more vague) way, requiring labs to use the best technology reasonably available to them, and to measure test results within standards that are reasonable given the required technology. You can imagine that the people responsible for following the rules prefer the first kind of rules, the rules with "bright lines". And given how closely you've followed the FL case, where every rule was interpreted in favor of the lab, you might prefer "bright line" rules here as well.

Mike Solberg said...

Yes, I suppose that brief "history" does deal with both peak id. and specificity. That's because of the role of GC/MS. If you have the complete mass spec data from the GC/MS, then specificity is certain. You know you are measuring only what you say you are measuring. But that also, of course, takes care of peak id. So, when GC/MS is done with complete mass spec data, peak id and specificity are the same thing.

And to take the next step, if you move to IRMS with identical chromatographic conditions, then peak id and specificity are still pretty much the same thing. Of course, if there are questions about specificity in the GCMS, then there are going to be the same questions in the IRMS. And if you take questions about specificity in GCMS, and add different chromatographic conditions on top of that, then you have even more questions in the IRMS.

Just to be clear, Larry, when I talk about specificity I am intending to use it the way it is used in ISL 5.4.4.2.1. "The ability of the assay to detect only the substance of interest must be determined and documented. The assay must be able to discriminate
between compounds of closely related structures."


I'll try to figure out what Shackleton used.

More later.

syi

Larry said...

syi -

Let's focus on the first part of 5.4.4.2.1. What do you think the rule means by "the ability of the assay to detect only the substance of interest"?

At the end of the 5bA anchor thread, I posted what I thought was a brilliant and penetrating piece ( ;^) ) where I compared chromatography to eggs. I guess it wasn't quite as brilliant as I'd thought, since no one responded to it! But in that post, I was trying to start a discussion of what we mean by specificity, by looking at particular cases. I can think of three problem cases: (1) the case where the GC has achieved no separation whatsoever between two substances (scrambled egg), (2) the case of co-eluting peaks (hard boiled egg) and (3) the case of interfering peaks (fried egg). In which of these cases do you think a lab has achieved "specificity", in accordance with the 5.4.4.2.1 definition?

There is a fourth case I have not considered much, which is a case where there's a lot of background noise. I've never been exactly sure what causes background noise. Is noise "stuff"? If we took a mass spectrum reading of the noise at a place on the chromatogram where there was no peak, would we find any ions? Or is this noise something more like radio static? I was never able to get this straight. (I have an "egg" analogy to GC noise, but I'm suppressing it.)

Another thing I'm wondering is, exactly what would the full mass spectrum show us in these cases? This gets us into a discussion of the mass spectrum, and here I'm venturing into territory I don't completely understand.

The mass spectrum is, I believe, a chart showing ion distribution and ion intensity. The x-axis of the chart shows the ion mass-charge, and the y-axis shows (I think) the intensity of the ions at any particular mass-charge. My guess is that we can take a mass spectrum across a specified range of RTs. The need to specify RTs would arise, I would guess, as a result of the fact that the MS is a SCANNER -- it scans across a pre-determined range of mass-charges (a full mass spectrum would scan across a wide range). Because the MS is a scanner, it should be able to detect only one type of ion (i.e., an ion with one particular mass-charge) at any moment. So we'd have to specify a range of times to cover the mass spectrum. My guess is that the scanner moves pretty quickly, so we could specify a relatively short time range and still pick up a representative sample of ions. Notwithstanding, it still seems to me that the mass spectrum is measuring the presence of particular ions over a particular slice of RTs.

If we want to use the mass spectrum to determine the purity of a particular GC peak, what range of times do we use? I would imagine we'd want to use the widest range of times applicable to that peak, so we could check the purity of the whole peak. If the peak is separated from all other peaks and the background noise was well-behaved, it should be relatively easy to figure out the right slice of time for a peak -- we would just use the RTs for the peak start and peak end. Even if the background noise level was complicated, we could probably look at the raw data, and do the mass spectrum analysis over the range of RTs where we were receiving ion data for the particular peak.

But how do we analyze the mass spectrum for a peak that's not completely separated from a neighboring peak? Isn't there going to be some range of RTs where we're picking up ions from two peaks? If so, do we have "specificity" over that particular range of RTs? Moreover, can the mass spectrum even be analyzed over that range of RTs? Could we tell if there was a THIRD substance hiding in that range of RTs?

I hope that these questions illustrate the importance of how we define "specificity".

Mike Solberg said...

Okay, Larry, I answered your challenges in the other comment thread. You were indeed terribly insightful and probably two steps ahead of me. Read those responses, then let me know where you think we are.

Let's continue in this comment thread.

syi

Larry said...

Mike -

I promised a response to your posts from last night over on the 5bA anchor topic. To recap, (a) I've drawn a distinction between the techniques used to achieve specificity (sample preparation and good chromatography), and the techniques used to confirm that specificity has been achieved (mass spectrum analysis), and (b) I've questioned whether ISL 5.4.4.2.1 addresses both types of technique, or only techniques used to achieve specificity. I've also questioned whether the techniques used to achieve specificity are sufficiently uncertain so as to reasonably require a lab to confirm that they've achieved specificity in every case.

I don't think we're ready to answer any of these questions yet. I raised these questions in an effort to get you and others to see that there are some difficult issues here, and based on your posts from last night, I think I've succeeded with you!

You point to ISL rule 5.4.4.1 on selection of methods, and ISL rule 5.4.4.1.1 on substance identification, and you're quite right, these rules require labs to develop testing methods that achieve specificity. No question, as these methods are being developed and verified, the labs must use mass spectrum analysis to confirm that the methods achieve specificity. But that does not address the central question (the "rub", as you put it) of what the labs need to do AFTER they've developed their testing methods under the ISL.

You're also right that TD2003IDCR prefers mass spectrometric detection as an identification technique. But note that the requirements for mass spectrometric detection under TD2003IDCR are aimed at peak identification, and not at specificity. TD2003IDCR sets forth two general requirements for the mass spectrum analysis. The first requirement is that the lab must identify ions in the reference mix with a relative abundance greater than 10%, and then determine that these ions are present in the test sample. This requirement does not address specificity, as it does not check to see if there are ions in the test sample that belong to multiple substances (as Duck says, this proves that the peak contains the substance of interest, but does not prove that there's nothing else in the peak). The second requirement is that the lab compare the relative abundance of three diagnostic ions in the reference peak and the sample peak. Since the standard for comparison is somewhat lax (up to +/- 25% in some cases), again it seems to me that a peak could be identified under the criteria in TD2003IDCR and still have a significant specificity problem.

Your point about maintaining chromatographic conditions is a good one. The method that LNDD adopted under ISL 5.4.4.2.1 for achieving specificity MUST have addressed the chromatographic conditions identified by the lab for this purpose. Presumably, the lab's ISL 5.4.4.2.1 chromatographic techniques are embodied in their SOP. If the lab failed to follow its SOP on chromatography, IMHO we have a clear and obvious ISL departure, setting up a burden flip. This is one reason why the column switch is so important. If LNDD really did use the wrong column for any part of its CIR testing, then it's hard to see how LNDD complied with 5.4.4.2.1.

Still, given the uncertainty around whether there really WAS a column switch, we should continue to focus on whether the ISL required LNDD to produce the mass spectrum data.

On to the "egg" analogy. You're absolutely correct when you suggest that if the lab's chromatography results look bad enough, the lab has to do them over (or check the mass spectrum) even if they followed their specificity techniques to the letter. And you're correct when you suggest that we can't tell under the ISL exactly when the lab has to perform a "do over" -- this seems to be a judgment call with no hard criteria available to guide our judgment, and where reasonable minds can differ. I think you're going too far to suggest that a lab could Justify a completely screwed up piece of chromatography on the grounds that the lab did the chemistry right and set up the GC conditions correctly. At some point the results speak for themselves, and we have to conclude that the lab screwed up the sample preparation, or that the GC wasn't working right on the day of the test, regardless of what the lab says to the contrary. (Of course, it's not helpful here that WADA rules presume that the lab performed in accordance with the ISL.)

I think the bigger problem is not whether a lab can give us garbage chromatography and pretend in effect that the egg is not scrambled. I think the bigger problem is whether a lab can give us chromatography that looks reasonably good but where there are hidden co-eluting peaks (pretending in effect that a hard boiled egg contains only egg white.) In such a case, we'd only know we had a problem if we looked at the mass spectrum data (sliced the egg). We DO want to figure out a way we can force the lab to do that.

Is the egg analogy moot? I don't think so. But please go back to my 7:34 pm post of last night, and let's carefully consider what we mean by specificity. And think about whether a fried egg is an egg with specificity. (Hint: I think this is a trick question!)

Mike Solberg said...

Larry, you really have to break this down more. You just cover too much ground in one post for me to respond intelligently. I would love to go into all this, as I think it is all worthwhile, but please help me by breaking it down into more manageable chunks!

I'm going to start with this one - you wrote:

You're also right that TD2003IDCR prefers mass spectrometric detection as an identification technique. But note that the requirements for mass spectrometric detection under TD2003IDCR are aimed at peak identification, and not at specificity.

I am not sure that is right. As you know, the ISL points to td2003idcr at 5.4.4.1.1: The Laboratory must develop as part of the method validation process acceptable standards for identification of Prohibited
Substances. (See the Technical Document on Identification
Criteria for Qualitative Assays).


Now, of course, testosterone is not a prohibited substance. Only exo-t is. So, to replace "prohibited substance" with "exogenous testosterone," the ISL says that "the laboratory must develop as part of the method validation process acceptable standards for identification of exogenous testosterone. (See the Technical Document on Identification Criteria for Qualitative Assays.)"

Then td2003idcr begins with the two sentences: The appropriate analytical characteristics must be documented for a particular assay. The Laboratory must establish criteria for identification of a compound.

So again, making the replacement, it would read: "The laboratory must establish criteria for identification of exogenous testosterone."

But remember that exogenous testosterone has not been identified until you have measured the CIR of a pure IRMS peak and found a difference of greater than three delta/deltas compared to the ERC.

So, in the case of exo-t, the requirements of td2003idcr cannot be aimed at peak identification, but have to be aimed at making sure that you are measuring the CIR of the right stuff in the IRMS - specificity. Again, that is because you are not measuring a "prohibited substance" until you are measuring the CIR of a clearly pure (specific) peak in IRMS.

Hmmmm...maybe I have found a legitimate path to an ISL requirement of assuring specificity?

The question is, would WADA still claim that gas chromatography separation (without support from complete mass spec data) assures pure peaks, i.e. assures specificity? I don't know.

Actually, what I think this reveals is that td2003idcr has limited applicability to identification of exogenous testosterone. It was clearly designed for control of GCMS, not control of GC/C/IRMS.

By the way, I know you are trying to get me to further define specificity, but I still don't get what the issue is with the eggs.

syi

Larry said...

swim, I've managed to wipe out two different messages I've tried to compose to you in the past 18 hours.

I'll try to get a decent reply to you later. In the meantime, if you have a moment, check out the WADA prohibited substance list. You'll see that endogenous testosterone is listed as a prohibited substance! It's the worst piece of legal drafting I've seen since last Wednesday. The list then contains some terse language to the effect that a sample is not deemed to contain endogenous testosterone unless there's something about the sample to lead you to believe that not all the testosterone IS endogenous! Ergo, endogenous testosterone is not endogenous testosterone unless you suspect that some of it is exogenous.

Which is why we lawyers have the reputation we have today.

I'm pointing this out not just because it's bizarre, but because it DOES speak to what you last said about having to detect exogenous testosterone. More later.

Mike Solberg said...

A long time ago, in a thread far away, TbV wrote:

The IDCR says what you are supposed to do to identify some of the substance in the peak. It does nothing to say there should be nothing else in the peak, and what suffices to demonstrate the purity of the peak. This is a omission in the standards, and there should be something. In its absence, we seem free to conclude there is no requirement of the sort, which is the position I'd expect to see argued in the case by the ADA; or that the standard implies purity or the science is meaningless, and the absence of the proof is a violation, which I'd expect Landis to argue. Logically, the Landis position seems correct, but a literal reading of the rule would go the other way. It's an example of where "strict constructionist" reading leads to a conclusion that is "correct" because of flaw in the rules, not by the science.(A comment in the "A Dangling Issue" post, begun on October 28.)

This problem is why "we" have believed for some time that we need to settle the legal status of the specificity question. Most basically the science issue is "When LNDD measured the CIR of the 5aA peak, were they measuring only 5aA, or was there some other substance in there altering the CIR measurement?"

But legally, the question is "Is there a standard LNDD must meet to show that it has measured only 5aA, and if so, have they met it?"

TbV says above that the controlling legal document, td2003idcr, "does nothing to say there should be nothing else in the peak, and what suffices to demonstrate the purity of the peak. This is a omission in the standards, and there should be something."

Until a couple days ago, I agreed with him, but I now think that that is (unfortunately) wrong.

The second sentence of the idcr says "The Laboratory must establish criteria for identification of a compound." And then it gives examples of how compounds can be identified. The first example given is "chromatographic separation" (or what it more specifically calls "capillary gas chromatography," which is what LNDD did, of course).

I now think that, as far as LNDD and WADA are concerned, separation of compounds by capillary gas chromatography "suffices to demonstrate the purity of the peak."

I can think of no other reason why LNDD was not required to produce the mass spec data, and why Landis' legal team did not make a bigger deal of this at the hearing. They realized that USADA would argue that chromatographic separation was "good enough," and knew they couldn't win that battle. So they focused on other things.

The really sick thing is that td2003idcr gives a second example of an acceptable criteria for identification of a compound. It is "mass spectrometric detection" which begins with the sentence, "A full or partial scan is the preferred approach to identification." (emphasis added)

If I understand right (and please someone correct me if not!), this would include the complete mass spec data that would have settled this "specificity" question once and for all.

But LNDD was not required to produce this data, although it was the "preferred" method. Instead, they were able to present the "good enough" method of just doing the gas chromatography, with the legal presumption that everything was separated just fine. That is why Landis' legal team had to engage in a "battle of the experts" regarding whether the chromatography was "good enough." I think that was a hopeless battle, because WADA standards have no detailed criteria (really no criteria at all) for how you would judge whether the chromatography was "good enough." With the legal presumption given to LNDD/USADA there was no way to win that battle of the experts.

(Just to be clear, I know they did select ion monitoring with the GCMS, but that only proves that there is some 5aA in that peak, it does not prove that there was nothing else in the peak.)

There is more to say, but I'll leave it at that for now.

syi

Mike Solberg said...

Larry wrote: swim, I've managed to wipe out two different messages I've tried to compose to you in the past 18 hours.

I hate it when that happens! As advised, I am trying to be patient.

That's amazing what you noticed about endogenous testosterone being on the list. I guess WADA actually has a 100% false negative rate!

syi

DBrower said...



(Just to be clear, I know they did select ion monitoring with the GCMS, but that only proves that there is some 5aA in that peak, it does not prove that there was nothing else in the peak.)


Just to be clearer, the SOP in MAN-52 for the GCMS says quite clearly "full scan 50-550 uma" on the last line.

They may only have looked at the selected ions, but they were supposed to have collected the full scan per the SOP.

TBV

DBrower said...

Let me ask this: can they prove they followed the SOP and collected the full scan?

If they can't prove it, is that a violation by itself?

If their SOP said they should collect a full scan, and they did SIM instead, that seems wrong.

TBV

Mike Solberg said...

You know, after doing some more reading about possible specificity problems (i.e. full or partial co-elution issues), I am beginning to think this is a meaningless discussion given our particular set of circumstances.

Even if LNDD did have the complete mass-spec data for the GCMS peaks, it wouldn't make any difference, because we know that they did not maintain consistent chromatographic conditions between the GCMS and the IRMS.

With different chromatographic conditions, stuff moves around, and yes, depending on the substance, it can move around a lot. Some substances can move around a lot, even if other ones don't move at all. Given different chromatographic conditions, there is no way to determine whether something co-eluted with the 5aA (or any other peak) in the IRMS.

TIC, SIM, full mass spectrum, retention time, relative retention time - it doesn't matter, because they didn't keep the chromatographic conditions the same.

Obviously, this is the point of Herr Doktor's combo GC-MS/GC-C-IRMS creation, mentioned in the main post, in which the chromatographic conditions are guaranteed to be identical.

Note that I am not just saying that LNDD didn't provide some information they should have, and so they cannot assure specificity. There is no possible information they could provide, because (you guessed it) they did not maintain consistent chromatographic conditions between GCMS and IRMS.

This reveals the limitation of td2003idcr. It doesn't even talk about IRMS.

I don't know how the need for consistent chromatographic conditions gets applied legally, but the whole assay makes no sense without it. And, Larry, if the law is almost always on the side of common sense, Floyd will be riding strong next year.

syi

Larry said...

TBV -

Regarding your 11:26 AM post, first, the lab has the benefit of the presumption that they followed the ISL. So FL's team has to prove that LNDD did not take a full scan.

Second, I'm not following you. Isn't that a TIC on USADA 321? It seems to say "TIC" on the graph. It says that they used MAN-52 to take the scan. Why doubt that it's a full scan?

I've never been 100% sure what the graps are on USADA 322, but I've decided that these must be the graphs showing the presence of the three diagnostic ions for each peak identified in the TIC on USADA 321. There appear to be three captions at the top of each of these graphs, and while they're not readable, I figure they must identify the three identifying ions. I don't know whether any ions are repeated from graph to graph, or whether the seven graphs on USADA 322 are measuring 21 different ions. Is it the presence of these graphs that has led you to wonder whether LNDD performed a full scan? I think that these graphs CAN be produced from a full scan.

I think I must be missing something. Can you elaborate?

DBrower said...

I don't think the TIC chart on USADA 322 demonstrates collection of a full-spectra. It shows the TIC of the three signals M1, M2, and M3 that are identified.

I don't know that they did a full scan. I do know that the MAN-52 SOP calls for a full scan.

You may be right that there might be 21 different ions represented in the graphs, which might be consistent with a full scan, or it might be consistent with SIM of 21 ions.

We are never shown a full-scan graph of anything.

TBV

Larry said...

TBV, doesn't USADA 321 show a full scan?

Larry said...

syi -

Now you see why I try to respond so quickly and with so much information. Here I am, your faithful correspondent, stuck in an automobile for 3 hours today, pondering your 10:32 AM post, all thoughtful-like. And I come home, tired and discouraged, looking forward to responding to your 10:32 AM post ... only to find that you effectively withdrew the post at 7:31 PM.

OK, let's deal with the 7:31 PM post. Yes, the main point you're making in the 7:31 PM post is a good one. When it comes to CIR testing, we do NOT care about GC/MS peak purity for its own sake. We care about GC/MS peak purity only if it indicates GC-IRMS peak purity. And if the lab screws around enough with the chromatographic conditions for the GC-IRMS relative to the GC-MS, then GC-MS peak purity tells us nothing at all about GC-IRMS peak purity.

But with this issue (like with many issues in this case), we have uncertainty that we can say anything for certain. There's every possibility that LNDD actually used the correct columns in its CIR GC-MS and GC-IRMS tests, and that the REPORTING of a column switch was in error. As for the difference in the temperature ramps, that difference is built into the SOP, which means that the difference is intentional. It also means that the difference should have been considered by scientists outside of LNDD when the lab received its accreditation. Personally, the difference in MS and IRMS temperature ramps makes no sense to me (and for a while at least, it made no sense to OMJ). But we have to consider the possibility that there are different temperature ramps for a good scientific reason, even if we don't know what it is yet (I'm not going to ASSUME that there's a good reason for the temperature ramps, I just think we should be cautious that a good reason may emerge).

From the standpoint of law, we're not directly concerned with whether the science is good, we're concerned about whether the science followed the rules. So if we think the lab made a mistake in failing to maintain chromatographic conditions from MS to IRMS, we have to point to the rule that required them to do so. I haven't seen any rule yet that expressly addresses the need to preserve chromatographic conditions, but we DO have ISL rule 5.4.4.2.1. So ... haven't we now come full circle, back to the need to understand this rule?

Back to the need to define what we mean by "specificity"? And the egg analogy?

Consider the fried egg. Imagine the fried egg flying through the end of the GC column, yolk first, whites trailing. There's a point where the RTs for the egg contain just yolk, then we hit RTs where we have both egg and yolk, then finally there are RTs with just whites. Kind of like a chromatogram with two overlapping peaks. And in my mind, I'm imagining that you get two overlapping peaks on a chromatogram precisely because (as with the fried egg), you have not managed to completely separate two substances. You've got two blobs of stuff, you know that there are two blobs, but the two blobs are stuck together -- sort of the "conjoined twins" of the GC world. Is this specificity, or not?

DBrower said...

Larry, what I see in USADA 321 is a graph at the top labelled TIC, that doesn't say what total it contains. Later, we see target response per peak per signal.

As far as I know from what is presented, the TIC is only the TIC of the three signals listed below. I do not know if there is a full-scan, by which I'd expect to see a graph labeled from 50-550 with target responses per m/z.

TBV

DBrower said...

the difference should have been considered by scientists outside of LNDD when the lab received its accreditation.

It is not clear that there is anyone outside the lab that looks at the details of any test in a "fitness for purpose" review. Such a review should have found the retention time error-by-design, and the lack of internal standards for use in direct identification or for creation of Kovat's indexes.

My understanding is that the SOP document is just taken by the accred. agency as what they are supposed to do, and they check to see it is followed -- not whether it is correct.

There appears to be no specific ISL rule for correctness of test protocol, nor with the accrediting agency. One might take this as an error in drafting, or as an intentional omission.

TBV

Mike Solberg said...

Ah, finally you ask the question in a way I can understand! Blobs. I understand blobs.

You've got two blobs of stuff, you know that there are two blobs, but the two blobs are stuck together -- sort of the "conjoined twins" of the GC world. Is this specificity, or not?

My simple answer is no, that is not specificity.

The more complex answer is maybe, maybe not. It depends on what you mean by "stuck together," or "conjoined twins." If that implies overlap, then according to Brenna, any degree of peak overlap is not good. Clear peak separation is critical to clear results. But Brenna also says that with less than ideal peak separation, there are things you can do, and it depends on where and how you place the integration limits. For IMRS Brenna says you can perhaps compensate for up to 70% co-elution, if the CIR of the two substances is the same. (What if the CIR is not the same, or unknown? That was the source of the scene at the hearing with WMA asking for a calculator.) Brenna also says that if there is third substance of any size involved, you have to start over and get better separation.

If "stuck together" just literally means two discrete substances stuck together, like with a strip of two sided duck tape, with no "overlap," then yes, I would think you have specificity.

So, it depends on how much your blobs are connected / overlapped.

syi

Larry said...

TBV, please elaborate further. I thought that the "T" in TIC means a full scan, that a chromatogram based on a selective SIM scan would be called something else (a "SIC"?). Obviously I could be wrong about this. But if the "T" in TIC does not mean full scan, then wouldn't the graphs on USADA 322 ALSO be "TIC"s?

Yes, I'm probably showing an overly lawyerly concern for definitions. But we don't have much else to go on.

Are your suspicions about the total scan based on anything more than the presence of the graphs on USADA 322?

swim, you may have fallen into a trap here. Arguably, peak overlap can't possibly mean no specificity, since we have different rules in the ISL for dealing with peak overlap, and since the various scientists have more or less accurate ways to resolve peak overlap issues. As you suggested, peak overlap seems to be a problem that presents itself in various degrees. In contrast, specificity (at least as it is presented in ISL 5.4.4.2.1) seems to be an either-or proposition: either you have it or you don't.

But I will skip ahead about 5 steps and ask, what does the mass spectrum of the area of peak overlap look like? If there was a third substance co-eluting in the area of the peak overlap, would we be able to detect it with the mass spectrum?

Mike Solberg said...

More about your post later, Larry, but I just wanted to make sure that you have read the majority decision paragraphs 156 and 157:

156. The ISL provides in Article 5.1, "Any aspect of testing or management not specifically discussed in this document [the ISL] shall be governed by ISO/IEC
17025 and, where applicable, by ISO 9001." Therefore, violation of ISO 17025 can become a violation of the ISL.
157. Therefore, violations of the ISO 17025 or of WADA Technical Documents can be violations of the ISL for purposes of rebutting the initial presumption favouring
the Lab that an AAF has been established. However, that of itself does not mean that the AAF does not amount to an anti-doping rule violation. The Panel must weigh the evidence to determine if the violation affected the AAF. If that is the case then the anti-doping rule violation may not have been made out at law.


That seems like a pretty strong statement about ISO 17025, which may help in the specificity discussion.

syi

DBrower said...

Larry,

I'm saying "total" sometimes means "total", and sometimes it means "the total of the things I'm totalling".

I'm sure you are familiar with spreadsheet errors where the @sum(a1:a10) cell labelled "Total" should have been @sum(a1:a20) instead.

The only data evidence I see is that they were looking at three ions later on USADA322, so I can reasonably suppose their total was from @sum(a1:a3), not @sum( m50:m550 ). Since they don't say, we don't know. There is no other evidence of additional data in the documents that we've found. The missing pieces are plots going from mz=50 to mz=550 -- either for the total scan, or the scan time for any particular peak.

You asked elsewhere what a full scan would look like for a co-eluted peak. Let's say that pure substance A has proper mz peaks at 63 81 and 127, and pure substance B has peaks at 31 77 and 92. A full scan over the region in question showing overlap would have data for 31, 63, 77, 81 and 92, I believe.

To use the fried egg analogy, you have two signals, W and Y indicating the color. You put the egg face down on your flatbed scanner, and get a rasterized image that has pixels either W or Y. If you compute the average value of a row or column, some will be W, and some will be between W and Y. (You won't get any that are all Y unless it is a very deformed fried egg).

[And if you want to be fancy, you can do an FFT on the mess, and reach a different family of plots, but that's not really relevant here, since they don't do such things.]

TBV

Larry said...

TBV -

I understand that your sense for what might be wrong with LNDD's data is pretty good, even in those cases (like this one, regarding your concern about the totality of LNDD's total ion chromatograms) where I haven't figured out where your concern is coming from. But with regard to the totality of LNDD's scans ... I think without better evidence, we're going to run smack into the WADA presumption that LNDD performed in compliance with the ISL.

I've been thinking about what it would mean if we COULD prove that LNDD's "TIC"s are based on SIMs. Obviously, we could not get good mass spectrum data from a SIM, but since we don't have ANY mass spectrum data, that's kind of a moot point (a "mute" point, if you like). I guess that a SIM might minimize the appearance of any matrix interference, and might throw off the relative heights of the MS peaks. A SIM probably would NOT affect retention time. (This is pure speculation on my part, and I'd stress at this point that I'm not a scientist, except that Ali says I'm not allowed to do that any more.) Do you have a more specific concern on how using SIM chromatograms could screw up these tests?

This leads me into a question I've wanted to ask anyway, which has to do with the breadth of the scan used in the IRMS. The IRMS is supposed to scan for ions at 44, 45 and 46 -- is this the same as referring to mz of 44, 45 and 46? I understand that ion 44 is supposed to correspond to combustion of metabolites containing C12, that ion 45 is supposed to correspond to combustion of metabolites containing C13, and that ion 46 is supposed to correspond to combustion of metabolites containing C14 (which we don't really care about here). If so, then it wouldn't seem that the IRMS actually excludes anything with this scan (unless there's some carbon 15 floating around in the universe that no one told me about). So it would seem to me that an IRMS test requires all ions in our peaks of interest to be scanned and measured. Is this correct?

TBV, is THIS where your concern lies? That somehow, there were ions in our peaks of interest that were not scanned in the IRMS? If so, I didn't know that an IRMS had the ability to filter out ions before they are combusted.

(By the way, I DO understand that the IRMS includes three ion detectors, so that the process of detecting ions in the IRMS is not the same as for the MS, where there is one ion detector that can only detect ions at a single mass-charge at any moment. Notwithstanding, it does appear that the IRMS can be tuned to collect three ions of varying mass-charge, perhaps because not all CIR analysis focuses on the carbon.)

Larry said...

TBV -

On the mass spectrum for the area of peak interference, you wrote:

You asked elsewhere what a full scan would look like for a co-eluted peak. Let's say that pure substance A has proper mz peaks at 63 81 and 127, and pure substance B has peaks at 31 77 and 92. A full scan over the region in question showing overlap would have data for 31, 63, 77, 81 and 92, I believe.

I appreciate your using a simple example, but I wonder whether a more real-world example would produce results capable of this kind of analysis. If we're talking about substances that might have dozens of characteristic ions, it might not be so easy to sort things out. Moreover, if there's a third substance co-eluting in the area of interference, unless that substance contained a significant presence of ions that did not exist in the first two substances, I'm not sure that you could reliably detect the presence of the third substance.

Maybe you could suggest something I can read that discusses exactly how mass spectrum data is used to identify that a peak is not pure. The books sometimes use the analogy that the mass spectrum is like a "fingerprint" that uniquely identifies a peak. If so, then what happens to the fingerprint if there are two substances in a peak. Is it like you suggested, that the scientists can detect and identify two fingerprints? Or is it more like the "fingerprint" comes out as smudged and unidentifiable?

Mike Solberg said...

I think I have figured out a little about the graphs on USADA 321 and the similar pages. The graph at the top of these pages and labeled "TIC" (followed by a shortened version of the sample id) is a "total ion count" chromatogram. It tells us the abundance of all ions eluting at a particular time. But doesn't tell us anything about which ions are eluting at any particular time.

The graphs on USADA 322 are not directly related to the graph on 321. They are "select ion monitoring" graphs of three particular ions in the peaks of interest. Because substances have characteristic ions, the lab knows which ones to look for in any given peak. The labels of each individual graph are impossible to read in some cases, but the same information is recorded on the previous page. For example the 5aA graph on 322 is measuring the main (target) ion 316, and two others 241 and 256. (Note by the way that 5aA and 5bA are very similar - the same ions are being monitored just in different order/abundance.)

So the combo of 321 and 322 does tell us that there is 5aA (or whatever) in that peak, but it does NOT tells us what else in that peak. That would require monitoring of any and all other ions that were in that peak. That's what a complete mass spec graph would do for any given peak.

syi

Mike Solberg said...

Two more things to note about the graphs on 321 and 322: First, the TIC graph on 321 does give the total ion count (abundance) for any peak. But I don't think there is any way to figure out, based on the available data, what the ion count is for just the 5aA (for example). Even if you could add the ion count for the three monitored ions, there would still be other ions of 5aA present in the peak, unaccounted for. So, as best I can tell, there is no way to add things up and see if you get to a total ion abundance of 3.5 million (the y-axis of the TIC 'gram) for the 5aA peak. In other words, I think there is insufficient data to use the TIC graph to help you decide whether there is anything else in the 5aA peak (for example)on 322.

Second, note from the 5bA and 5aA graphs on 322 that there are clearly other ions/substances that are not being monitored at those same retention times. We know from the TIC 'gram (321) that there is some substance between the 5ba and the 5aA, but no such substance shows up on the 5bA or 5aA 'grams on 322. That's just a further way to say that there is more information about these peaks that is not given in the graph, and again, that those graphs cannot be used to decide whether there is anything else in those peaks.

syi

DBrower said...

Mike, I don't see why you assume the think labeled TIC is the count of all masses. I accept it is a total of multiple ions, I just have no idea which ones were counted.

Your note that there appears to be something in the TIC between 5ba and 5aa not in the SIM graphs may sway me otherwise.

Why do you say that peak can't be used to decide if there is anything else? It certainly indicates the presence of something not expected, I think.

TBV

DBrower said...

I haven't figured out where your concern is coming from...

Mostly, it comes from a working position they did something wrong. So I prod and poke wondering what could it be.

Do you have a more specific concern on how using SIM chromatograms could screw up these tests?

It only screws up the tests to the degree they should have taken, and I submit, examined the full scans for co-elutes and/or matrix interference. Taking the data was required by SOP, and it seems absurd to suggest they were to take it, but not consider it. There is a reason the SOP calls for a full scan.

I see little (formerly no) evidence they took the full scans. SYI may convince me otherwise with TIC peaks.


This leads me into a question I've wanted to ask anyway, which has to do with the breadth of the scan used in the IRMS. The IRMS is supposed to scan for ions at 44, 45 and 46 -- is this the same as referring to mz of 44, 45 and 46?


Yes.


I understand that ion 44 is supposed to correspond to combustion of metabolites containing C12, that ion 45 is supposed to correspond to combustion of metabolites containing C13, and that ion 46 is supposed to correspond to combustion of metabolites containing C14 (which we don't really care about here). If so, then it wouldn't seem that the IRMS actually excludes anything with this scan (unless there's some carbon 15 floating around in the universe that no one told me about). So it would seem to me that an IRMS test requires all ions in our peaks of interest to be scanned and measured. Is this correct? TBV, is THIS where your concern lies?


Er, no. It all got burned up. All that is left is the C and the O of the resultant CO2. Unless we are concerned with the Oxygen, there's no need to collect anything else in the IRMS.

That somehow, there were ions in our peaks of interest that were not scanned in the IRMS? If so, I didn't know that an IRMS had the ability to filter out ions before they are combusted.


It doesn't -- that's the job of the chemistry and chromatographic separation, and you judge the effectiveness by the identification and specificity in the GCMS.

TBV

Mike Solberg said...

About the TIC, I think that is a sum of all the ions in the whole spectrum because I think that is just what a TIC is. The page m noted from the ACMS says

For GC/MS, LC/MS or other combinations, the data consists of a series of mass spectra that are acquired sequentially in time. To generate this information, the mass spectrometer scans the mass range (e.g., m/z 30-500) repetitively during the chromatographic run. If a scan is taken every second and the run is 30 minutes long, 1800 spectra are recorded. This information may be displayed in several ways as shown in Figure 15. First the intensities of all the ions in each spectrum can be summed, and this sum plotted as a function of chromatographic retention time to give a total ion chromatogram (TIC) whose appearance is similar to the output of a conventional chromatographic detector.

And then I think, as you noted, the presence of the bump between 5aA and 5bA in the TIC on 321, but not in the SIM 'grams on 322, is supporting evidence.

I would say that probably does mean that LNDD did a full scan (from 50 - 550), as the parameter printout on USADA 304-305 says. But nothing in the LDP (other than these TICs) seems to use that information. That, presumably, is exactly what was erased from the hard drive. Possibly exculpatory evidence, gone - evidence which they collected, and which according to TD2003IDCR is "the preferred approach to identification." Sick.

Perhaps they would argue that they had to use SIM because the substance was at the "minimum required performance limit." But I don't know what the evidence of that is. I'll have to look again to see if that is documented in the LDP anywhere.

Why do you say that peak can't be used to decide if there is anything else? It certainly indicates the presence of something not expected, I think.

I don't get this question. Which peak? On the TIC or on the SIM? How does it indicate the presence of something not expected? Maybe you mean the little peak between the 5bA and 5aA on the TIC?

When I said the peak could not be used to decide if there was anything other than 5aA (for example) in the peak, I was referring to the 5aA SIM on USADA 322. They looked for three ions and found three ions, but didn't look to see if there was anything else there. That's why the 5aA SIM 'gram/peak cannot be used to decide if there was any other substance present.

syi

Larry said...

TBV, some additional thoughts on the totality of the TIC shown on USADA 321 and the corresponding USADA pages. And forgive me for telling you things you already know and belaboring the obvious:

1. The three ions for each substance shown on USADA 322 are identified on the chart shown on USADA 321. So, the ions used for the IS were mz 258, 243 and 204.

2. The chart on USADA 321 also shows the response for each of the three ions measured for each peak. In the case of each peak, the combined response of these three ions appears to be less than the size of the peak. Take 5bP as an example. The size of this peak appears to me to be about 4.6 million. The combined response of the three ions shown for 5bP is about 3.7 million. So I think there must be other ions being measured in this peak.

(to be honest, I'm not sure about the validity of this calculation. If I want to measure the total "response" represented by a MS peak, presumably I should be measuring peak volume and not peak height. Any thoughts on this?)

3. The analysis performed by LNDD for the T/E tests appears to me to BE a SIM analysis. And the paperwork included in the document package for the T/E tests is different from the MS CIR paperwork, in ways that seem to me to be material to the nature of the scan performed in each test. I won't try to detail all of the differences, I'll just let you contrast, compare and tell me what you think.

Russ said...

syi,
I agree with your understanding of tic and sim as you describe. I'd like to try to polish it a bit (from my understanding).

First, you said:
"First the intensities of all the ions in each spectrum can be summed, and this sum plotted as a function of chromatographic retention time to give a total ion chromatogram (TIC) whose appearance is similar to the output of a conventional chromatographic detector."

The TIC, as a sum plot, sums all of the spectral (various mass) components that elute at a given time for the scan range of masses. This different from a hardware tic which would be all masses for a given elution time.

The SIM, however, selectively plots all elution times (within range) for a given mass.

So another view of the TIC would be that it is a sum plot of all the possible SIM's within the mass range and elution time range captured.

I did notice that there is at least one other type of SIM plot possible where the detector is set to only scan for one mass (or narrow range).

Don't know if this is any help but this is my understanding.

Regards,
Russ

Mike Solberg said...

This "specificity" thread went into a necessary discussion of the Total Ion Chromatograms and the Select Ion Monitoring 'grams and related matters, but I wanted to get back to specificity directly. Larry has been trying to nail down exactly what "specificity" means in ISL 5.4.4.2.1, but that doesn't seem too difficult to me. One of his points is that obviously "specificity" and "matrix interference" must mean something different, because they are both talked about in the ISL. But I think most of us, myself included, have been conflating the two things in the discussion. We can hardly be blamed for that because my reading is that the majority opinion against Floyd did exactly the same thing. Look at the majority decision beginning at paragraph 232 (paragraph 240 is particularly weak in my view). They quickly quote ISL 5.4.4.2.1 and immediately narrow it down to the "matrix interference" bullet point, skipping over the "specificity" bullet point and several others. That leads them to treat all "poor chromatography issues under the category of "matrix interference," which is just sloppy legal work and ignorant of analytical chemistry.

In reality, all the terms in 5.4.4.2.1 are familiar terms in the world of analytical chemistry and have particular meanings. Most importantly "specificity" and "matrix interference" are different things.

Specificity is exactly what the ISL says it is: "The ability of the assay to detect only the
substance of interest." Or to expand that a little from a nice description I found:

"Analytical specificity" refers to the ability of an assay to measure one particular organism or substance, rather than others, in a sample...Analytical specificity is the ability of an assay to exclusively identify a target substance or organism rather than similar but different substances ... When an assay is analytically nonspecific, it often produces a positive result when the specimen is truly negative for the exact agent being sought.(bold added)

But "matrix interference" has to do with the matrix. (The coolest movie ever, but I digress.) In analytical chemistry the "matrix" is not just everything that is being examined. The "matrix" is the stuff the lab uses and adds in order to make the test easier or more effective or possible in some way. The matrix is different than the "sample." The "sample" is what came from Floyd. The "matrix" is what came from the lab.

5.4.4.2.1 says "The method should avoid interference in the detection of Prohibited Substances or
their Metabolites or Markers by components of the sample
matrix." Understanding those last two words is key. They mean the "matrix" in which the "sample" is placed for testing.

The discussion in the majority opinion glosses over this distinction (and a couple others in 5.4.4.2.1), to tragic ends. I say "tragic ends" because, while the "matrix interference" bullet uses language of "should" (and thus is suggestive, but not absolutely required), the language under "specificity" is "shall" and "must" - thus required. The arbs dismissed all concerns about "poor chromatography" because of the "should" language under "matrix interference", and ignored the "shall" language of "specificity."

I have shown in another comment thread that lab report from the July 22 sample is very strong evidence that in the July 20 sample there is something unaccounted for co-eluting with either the 5bA or the 5aA (it is impossible to know which).

The remaining legal challenge is to show that the ISL 5.4.4.2.1 (or something else) applies not only to LNDDs process in general, but that it applies to requirements of particular tests. It seems like a no-brainer to me, but the legal people, like Larry (and jr. over at DPF) seem to think it is more complicated.

I really intended this post as a clarification of where we are at with regard to specificity, and hope that it will help further the discussion.

syi

Mike Solberg said...

I am not sure about something in the above post, regarding the precise definition of "matrix." I'll have to refine my use of the word. I was probably too restrictive, but I am still figuring out whether that nullifies my argument.

syi

Mike Solberg said...

So, here's the beginning of my correction. In analytical chemistry "matrix" often does refer to the stuff into which the sample is placed to aid somehow in testing. However, "matrix" also refers to that part of the sample which is not under investigation.

The matter is very complex for the assay involved in Floyd's test. As Suh said a million times at the hearing, urine is a "dirty matrix," by which he meant that there are lots of compounds in there other than the target compounds. These compounds can distort the tests in various ways, including full or partial co-elution.

What then is the difference between "specificity" and "matrix interference?" Eh, I'm working on that.

syi

DBrower said...

Ah, the glorp congeals just when I'd hoped it was starting to separate.

It appears then, by definition, that stuff that is not specificaly what you are looking for is part of "the matrix", interfering with your true target.

So, "you must be specific" and "you should avoid matrix interference" are two sides of the same coin. Except one says "this is redeemable' and the other says "this might be redeemable", and your legal interpreters get to pick which one they want to look at depending on their mood.

Nice.

TBV

Mike Solberg said...

Okay, I figured out the problem. And, surprisingly, it is related to the difference between a "Non-threshold Substance" and a "Threshold Substance" and the applicable rules in the ISL. Sorry, this gets long again.

I now think exogenous T is a "Threshold Substance" rather than a "Non-Threshold Substance." The difference is critical because of the slightly different ISL rule that applies.

The short version is that the CAS should find an ISL violation of 5.4.4.2.2 (not 5.4.4.2.1) and force USADA to prove that the violation didn't cause the AAF, which they won't be able to do.

First, the involved ISL definitions (ISL 3.2):

"Threshold Substance: A substance listed in the Prohibited List for which the detection of an amount in excess of a stated threshold is considered an Adverse Analytical Finding."

And,
"Non-threshold Substance: A substance listed on the Prohibited List for which the documentable detection of any amount is considered an anti-doping rule violation."

Now, on the surface it would appear that exogenous testosterone is a non-threshold substance, because you are not allowed to have any of it in your body - any amount is considered an anti-doping rule violation.

But in reality, the TEST by which exogenous testosterone is detected is a "threshold" test - that is, an analytically determined number must pass a certain limit. The IRMS test has to measure the 13C/12C ratio below (more negative) a certain level. Since the TEST to detect it is a threshold test, exogenous T should be considered a Threshold Substance.

Why does it matter? Well, in the ISL there is slightly different, but critically important wording in the rules that govern threshold and non-threshold substances. The most significant difference comes in ISL 5.4.4.2.1 and 5.4.4.2.2 - the "matrix interference" bullet.

5.4.4.2.1 first - for a Non-threshold Substance:
Matrix interferences. The method should avoid interference in the detection of Prohibited Substances or their Metabolites or Markers by components of the sample matrix.

and 5.4.4.2.2 - for a Threshold Substance:

Matrix interferences. The method must limit interference in the measurement of the amount of Prohibited Substances or their Metabolites or Markers by components of the sample matrix.

The change in that one word is critical, because in the majority decision against Landis the "should" of 5.4.4.2.1 was used as a way to dismiss concerns about matrix interference in the IRMS tests. Specifically, from paragraph 240:

While the Panel acknowledges that it may be ideal to have “perfect chromatography” where there is absolutely no matrix interference and no coeluting peaks, or sloping baselines it is also mindful of the fact that this is not always possible, particularly when dealing with human samples obtained in less than ideal circumstances. The LNDD’s chromatography was according to the experts called by the Claimant good to very good. Although it perhaps could have been better it remains “fit for the purpose” and unquestionably indicates the presence of exogenous testosterone in the Respondent’s “A” and “B” samples. In applying the language of the ISL what is required is that the “method should avoid interference.” The language is not mandatory. Had the drafters intended that matrix interference be avoided it would require wording such as “shall” or “must”. For this Panel to accept the submissions of the Respondent that matrix interference must be avoided would be a misconstruction of ISL 5.4.4.2.1. Dr. Ayotte confirms this statement in noting that a laboratory does not violate Article 5.4.4.2.1 of the ISL just because it produces a chromatogram that contains matrix interference. Therefore, even where matrix interference has occurred in the Stage 17 chromatograms it would not amount to a violation of the ISL. It may be a violation of the standards used by a purely scientific research lab or one that does criminal analytical work; however, the Rules are very direct on this point in stating that only a deviation or departure from the ISL is relevant. Therefore, evidence of scientific or criminal labs and their standards and practises is of no consequence in rebutting the presumption favouring an anti-doping Lab.

But WHY is there a difference between the key wording about Threshold Substance and Non-Threshold Substance? Well, it is because with a Non-Threshold Substance it does not matter if, when you are detecting (or quantifying, but you don't even have to do that) the substance, you also detect (or even quantify) other stuff along with it. Even if you detect other stuff along with it, you know that you have detected it, and as a Non-Threshold Substance that is all that matters. You're guilty.

But with a Threshold Substance, it makes a critical difference if, when you are quantifying (in the case of a Threshold Substance "quantifying" is really a better word than "detecting") a substance, you also accidentally measure something else along with it - it will change the quantification.

That explains the difference between the "permissiveness" wording of 5.4.4.2.1 and 5.4.4.2.2.
And it explains why we could not previously use the "matrix interference" bullet of 5.4.4.2.1 to nail down an ISL violation and were trying to rely on more restrictive language of the "specificity" bullet. But in 5.4.4.2.2 there is restrictive language in both, and that makes perfect sense given the analytical chemistry difference between a Non-threshold Substance and a Threshold Substance.

Specifically with regard to the IRMS test, it makes a critical difference if, in the peak you are measuring for a CIR, you accidentally have another substance in that peak. Whatever that substance is, it could change the CIR, and in some cases make the CIR more negative, thus tripping the threshold for a positive test.

So, because of the specific way the IRMS test works exogenous testosterone ought to be considered a threshold substance and the "must" language of 5.4.4.2.2 ought to apply.

And that means that the majority decision should not have used the less restrictive language of 5.4.4.2.1 to dismiss all the concerns about the "poor chromatography" of LNDD. And, hopefully, it means the CAS will decide that the proper governing rule is 5.4.4.2.2 and thus hold the lab to the more restrictive requirement, find an ISL violation, and flip the burden of proof to USADA, forcing them to prove that the violation didn't cause the AAF, which in this case would likely be impossible.

And finally, if you are thinking that this only an attempt to get Floyd off on a technicality (probably only m, who is on vacation from us), you need to think more about the argument I have made. It is a fundamental issue that gets to the heart of the accuracy of the tests, and the realities of the analytical chemistry that underly the tests.

Three final comments, in their Pre-Trial Brief and Post Trial Proposed Findings of Fact, the Landis legal team characterized 5.4.4.2.1 (Non-threshold) as the controlling rule. I think they just missed the importance of this distinction. Bummer. In the arbs majority decision they followed Landis' legal team arguments (sort of inversely) and declared that the "should" rather than the "shall" or "must" gave them reason to dismiss arguments based on this rule.

Also, I know there are questions about whether 5.4.4.2.1 & 2 apply to the actual tests or just the development of the methods used. I suppose there is more legal debating to be done about that, but I will simply note that the majority decision of the arbs sure is based on the idea that it applies to the actual test, not just the method. And C. Ayotte's testimony at the hearing supports that conclusion also.

And finally, just for m, when he gets back, a long time ago in the DPF "New Devastating Corroborative Evidence" thread, May 19, at 12:54 p.m., OMJ wrote:

With respect to the 0.8 error that Floyd’s supporters say should be subtracted from the IRMS values: USADA cites a passage in the rules that indicate this is only for a threshold substance, which is defined as a substance that is illegal only if it is over some determined level. That is not true for exogenous T, which is illegal at any detected concentration Other labs follow this rule.

Comment: I think this point is still debateable. It's true that exogenous T is considered an AAF at any concentration, but the IRMS test does not determine the amount of exogenous T. It just determines the probability that some T is exogenous. Since the IRMS value itself is used to estimate this probability, it could be argued that the threshold rule should apply to it.


I don't think anyone understood the significance of the distinction at the time.

DBrower said...

I think this has been beaten up lots before, over at DPF and at Toxic by Notorius Will.

Just because the test is a threshold test doesn't make exo-T a threshold substance. The conventional thinking is that the argument it should be treated as a threshold substance will be smacked down because of that.

There is an explicitly enumerated threshold substance list, and exo-T isn't on it.

Life would be much easier if this were not so, and I wish you luck in outlining the theory and argument. I just don't think it is going to fly much longer than it takes to hit the ground by gravity.

TBV

Mike Solberg said...

Well, "Merry Christmas" to you too.

syi

Larry said...

Mike and TBV -

If you want "proof" that testosterone is a threshold substance, see this interview with WADA science director Dr. Olivier Rabin. Rabin interview. In this interview, Dr. Rabin explains the difference between threshold and non-threshold substances:

We've got 2 categories of substances. One is what we call the "nonthreshold substances," which are typically substances that can never be found in your body except for doping purposes. An example is stanozolol, an exogenous drug not produced by your body. If you have access to stanozolol and you don't have a valid TUE, then you will be sanctioned regardless of the quantity of stanozolol found in your urine. This is what we call qualitative analysis.

There is a second class of substances called "threshold substances," for which you need to have a quantitative analysis. An example is testosterone. We know what levels are normally produced by the body. If you've got above the normal range, it's typically a doping case. This requires a quantification of the drug in the urine, and once it's determined that there is an abnormally high level, then the lab can report an adverse analytical finding.


So, Dr. Rabin thinks that testosterone is a threshold substance. And perhaps, so does the draftor of the WADA Guideline for Reporting and Management of Elevated T/E Ratios. This is the official WADA standard for measuring the T/E ratio. This Guideline contains the definition for "Threshold Substance". It never actually uses the definition in the substance of the guide, but it sets forth the definition in its list of defined terms. And the Guideline does NOT bother to define "non-threshold" substance. Does this indicate that someone at WADA thinks that testosterone is a threshold substance?

Mike, I've listed the above to make you feel better. If Dr. Rabin calls testosterone a "threshold substance", and if WADA is throwing the defined term around in its T/E Guidelines, then what are the rest of us supposed to think?

But IMO testosterone is a non-threshold substance.

I'll repeat the relevant ISL definitions for ease of reference. Under the ISL, a "Threshold Substance" is a substance listed in the Prohibited List for which the detection of an amount in excess of a stated threshold is considered an Adverse Analytical Finding. A "Non-threshold Substance" is a substance listed on the Prohibited List for which the documentable detection of any amount is considered an anti-doping rule violation.

Next, check the 2006 Prohibited List. There's a long list of exogenous and endogenous anabolic androgenic steroids shown as prohibited substances on pp. 2 and 3 of the list, with testosterone listed as a prohibited endogenous anabolic steroid. There's no stated threshold shown on the list for any of these substances. (Contrast alcohol on p. 10 of the list, where doping violation thresholds are shown on the list.)

The language of the Prohibited List is something less than a model of clarity, as we've already discussed. The list must grapple with the problem that testosterone is capable of being produced endogenously - all normal people have testosterone in their systems. So the Prohibited List states that a sample will be deemed to contain a prohibited anabolic steroid if there is something about the presence of the steroid -- its concentration, or its metabolites or markers, or any "relevant ratio(s)" -- that is not consistent with normal endogenous production. Mike, you're right to point out that the amount of testosterone in a sample is one indicator that a portion of such testosterone must be exogenous. But the langauge of the rule is broader -- if there's ANY reliable indicator that's inconsistent with purely endogenous production of a target anabolic steroid, then that indicator can be the basis of an adverse analytical finding.

The clincher language is in the second paragraph of text on page 3:

In all cases, and at any concentration, the Athlete’s sample will be deemed to contain a Prohibited Substance and the laboratory will report an Adverse Analytical Finding if, based on any reliable analytical method (e.g. IRMS), the laboratory can show that the Prohibited Substance is of exogenous origin. In such case, no further investigation is necessary." So, if testosterone can be a prohibited substance "at any concentration", then testosterone is a non-threshold substance. Gotta be.

If testosterone is a non-threshold substance, then how could the esteemed Dr. Rabin (quoted above) have been so wrong about such a basic question? I think the answer is that he is confusing two closely related concepts under the WADA rules. Section 2.1 of the World Anti-Doping Code states the general rule that the presence of a prohibited substance in an athlete's sample is a doping violation. The Code then sets forth two exceptions: (1) substances for which a quantitative reporting threshold is specifically identified in the Prohibited List (rule 2.1.2), and (2) substances such as testosterone that can be produced by the body endogenously, and where there may be special criteria for the evaluation of the substance (rule 2.1.3). I think it is easy to confuse category (1) threshold substances, with category (2) endogenous substances that have specially established identification criteria that may involve the use of thresholds.

Mike, consider the CIR test. It's NOT a threshold test, in that the test is not concerned with how much of any given prohibited substance is in the athlete's system. The athlete could have normal testosterone levels, or even below-normal testosterone levels. But if there's a relatively low amount of C13 in this testosterone, then the athlete is busted. It's not that the athlete who fails the CIR test has testosterone that exceeded a threshold. It's that the rules have set up special criteria for the evaluation of testosterone.

Here's another concept that may aid in our understanding here. It is ALWAYS possible for an athlete to use an undetectably small amount of a prohibited substance. The fact that the testing cannot detect a tiny amount of a prohibited substance does NOT mean that the substance is a threshold substance. This is probably the better way to look at exogenous testosterone. True, an athlete is not going to get busted for using exogenous testosterone unless he uses ENOUGH exogenous testosterone to throw off his T/E ratio or his CIR delta-delta value beyond the standards set forth in the rules. But these standards are something closer to a doping detection limit than they are to a "threshold" under the prohibited list. Unless the delta-delta exceeds 3.0 or the T/E ratio exceeds 4:1, we're not sure that we're detecting exogenous testosterone.

(I think this analysis should be placed in an exhibit in the Museum Of Why People Hate Lawyers.)

Mike Solberg said...

Larry, obviously you're argument is sound - or at least as sound as it can be based on the documents.

But I think my point is slightly different and possibly legally more compelling. It is that ISL 5.4.4.2.1 and 5.4.4.2.2 only make scientific sense if exogenous testosterone is considered a threshold substance.

If it is a non-threshold substance, then there is clearly and logically a fatal flaw in the ISL, one which I think would even be recognized by the CAS. The assay could not possibly be "fit for purpose" if 5.4.4.2.1 applies (as least as it was interpreted by the majority arbs).

If it is a threshold substance, then the ISL makes sense, and the "must" language of 5.4.4.2.2 applies.

That seems like a pretty good legal argument to me.

syi

Larry said...

Mike -

I've read all the way through your 1:35 PM post, and that's a pretty interesting argument. I had not previously picked up on the difference in "permissiveness" under 5.4.4.2.1 and 5.4.4.2.2. But I'm not sure that this difference is as sharp as you are painting. 5.4.4.2.1 uses a "should avoid" standard, and 5.4.4.2.2 uses a "must limit" standard. Arguably, both standards permit some amount of interference - to say that a method "must limit" matrix interference is not the same thing as requiring the method to eliminate all such interference altogether.

You've also got a good argument that tests for detection of threshold substances need to be more accurate because these tests must both identify the prohibited substance and measure its quantity. However, again, there may not be as sharp a distinction as you're painting between testing for threshold and non-threshold substances. Remember that even for non-threshold substances, there will be stated limits on the lab's ability to detect the substance. A substance may be present below the lab's detection limits, but may appear to be present in a greater quantity owing to matrix interference. So there's a quantification concern even for non-threshold substances. However, your point is still a good one, as the ISL clearly recognizes that something "extra" is involved with the detection of threshold substances. This can be seen, for example, in rule 5.4.4.3, which states that quantitative uncertainty does not apply to the detection of non-threshold substances but does apply to the detection of threshold substances. So rule 5.4.4.3.2 applies to the detection of threshold substances and not to non-threshold substances.

I also think that your cite to OMJ goes to the heart of the issue. Even if testosterone is a non-threshold substance, we rely on a bunch of threshold tests to identify this substance, and arguably the more restrictive rules applicable to threshold testing should be applied to testosterone testing.

So, I can potentially go along with a great deal of what you've written here. But now comes the hard part. By moving from 5.4.4.2.1 to 5.4.4.2.2, we've toughened the standard for matrix interference, but we still don't have a rule saying that no matrix interference can be tolerated. We've got a rule saying that matrix interference must be limited. Did LNDD violate this tougher rule?

You're doing a good job here, I think you should run with this a bit further and let's see where we end up.

Mike Solberg said...

TbV, with this comment:

There is an explicitly enumerated threshold substance list, and exo-T isn't on it.

do you mean the Prohibited List? Or something else?

I think it is clear that testosterone is a tricky substance legally speaking, partly because of the history of the way the test has been done.

Before IRMS came into use (I know that predates WADA and thus The List), it was clearly a threshold substance, and as Larry noted, Dr. Rabin (the WADA Science Director) still calls it a threshold substance. Before IRMS, you had no way to tell whether it was endogenous or exogenous, so the level was all that mattered.

But after IRMS came into use, obviously you can (try to) determine endo or exo. And since you are not allowed to have any exogenous testosterone in your system, there is a way in which it is a non-threshold substance.

However, the way I read TD2004EAAS, it is still possible to have "proof of the administration of a source of testosterone" even without a conclusive IRMS test. In which case, it would obviously still be a threshold substance.

So, all that means that whether something is a Non-threshold Substance or a Threshold Substance depends on the type of test used to validate the AAF, rather than strictly based on the substance itself.

If that is the case, then you have to look at the details of the test. I think it is clear that the IRMS test is a threshold test, since a measured number has to go over (more negative) a particular number, and thus exo-t should be a threshold substance, as Dr. Rabin says it is.

A loose end: Regarding the verb "limit" (rather than "avoid") in 5.4.4.2.2 (matrix interference bullet), I don't think that can be used as a way to make the wording less demanding. In context "limit" would have to mean that you would have to "limit" matrix interference to the degree that it did not impact the results of the test. It cannot be that if you have made an attempt to limit the matrix interference you have done your job. You have to limit it so that it doesn't affect the outcome. So the language of 5.4.4.2.2 m.i. bullet, is still very restrictive.

syi

Larry said...

Mike -

Just to be as clear as I can be on two points:

First, IMO testosterone IS a non-threshold substance. If an athlete is caught using any amount of it, he's busted. I have set forth an argument that some of the ISL rules applicable to threshold substances might be applied to testosterone testing, because of the importance in testosterone testing of determining thresholds, and because of the language in the WADA rules describing the testing for endogenous substances as a special case. But this is an argument, another hurdle for us to overcome. And better arguments than this are shot down all the time by the ADAs and their hand-picked arbitrators.

Second, regarding the word "limit": it is a limited word, so to speak. Rule 5.4.4.2.2 uses the word "limit" as a verb, and "limit" as a verb can have two meanings: it can mean (1) to narrow or reduce, or (2) to confine or restrict within a boundary or bounds. Obviously we'd like to avoid meaning (1), as LNDD can point to any number of things in its SOP that reduce the level of matrix interference. We want to argue meaning (2), which is that LNDD had to take steps to reduce matrix interference below some identifiable standard.

You've suggested that the standard here is that matrix interference must be limited to the point where the interference does not affect the outcome of the test. But how much interference IS that? Presumably, ANY amount of intereference could affect the outcome of a test, and from what Ali and TBV have written, a very small amount of interference could have an outsized impact on the test outcome. There does not seem to be any tolerable amount of interferece under your standard, and if that's the case, then your standard is an outright probitition of matrix interference and not a "limit".

I see a second problem, which is that we have no objective way to measure matrix interference. It can't be quantified. The severity of any matrix interference is a matter of judgment, it's just another thing for Drs. M-A and Brenna to argue about. In other words, we're back to YGIAGAM.

Mike, remember what we're shooting for here: a standard that non-experts can apply to determine whether or not we're looking at the results of a valid anti-doping test. In effect, you're arguing that rule 5.4.4.2.2 requires the lab to reduce matrix interference to the point where it does not give us invalid test results. Your standard begs the question: at what point does matrix interference invalidate the test results?

Mike Solberg said...

At the point where you don't know whether you have other substances in the peak you are measuring.

But remember that if you are stuck calling testosterone a "non-threshold substance" then it doesn't matter anyway, because then you only "should avoid" matrix interference. The arbs were more than happy to use that less restrictive language as an out, to avoid the ISL violation and burden flip.

syi

Larry said...

Mike -

I asked: at what point does matrix interference invalidate the test results?

You replied: At the point where you don't know whether you have other substances in the peak you are measuring.

I respond: (1) and what point is that? (2) Who is the "you" in your reply, Brenna, M-A or a non-expert? (3) Isn't this "specificity"?

Russ said...

Guys,
Don't know if this ref has already been looked at but I think there may be some data in it good for comparison and reverse qc of LNDD's stuff.

I am not a statistician so I will have to leave the actual applicability below the blue sky level to you.

What caught my eye was this (along with actual data results, and more detailed descriptions of linearity verifications done statistically. You might take note that they did charterize the linearity of the system and took care to work within a linear range as selected by the linearity test. They then tested certain data regularly to verify continued acceptable linearity. If this stuff sets the bar for acceptance, perhaps there is fuel here.:

"characteristics of the GC-C-IRMS ASSAY
GC-C-IRMS measurement of urinary steroids has become a potent analytical method in doping control, although scrupulous attention to detail is necessary. Successful implementation of the method requires strict adherence to QC, system suitability, and batch acceptance criteria such as those described herein. In the present study, after the tolerance ranges for the system and batches were established, the system suitability test never failed and the batch acceptance criteria were always met over the 14 months of the study.

In the instrument precision study, the mean SDs were less than ± 0.33{per thousand}; thus, when the variability attributable to extraction and derivatization was eliminated, all of the SDs were low. In the within-assay study, the SDs were also low: ± 0.27{per thousand}, ± 0.38{per thousand}, and ± 0.28{per thousand} for 5{beta}A, 5{alpha}A, and 5{beta}P, respectively. The between-assay SDs ranged from ± 0.40{per thousand} to ± 0.52{per thousand}, which is slightly higher than the within-assay values, but they indicate the excellent repeatability and reproducibility of the entire assay.

We did not find comparable precision studies for urinary steroids in the literature; however, Shackleton et al. (3) provided evidence that {delta}13C values for diols are reproducible based on duplicate analysis of 13 samples from one subject. We used the data in Table 1Up from Shackleton et al. (5) to calculate the SD of duplicates and found values of ± 0.36{per thousand}, ± 0.15{per thousand}, and ± 0.28{per thousand} for 5{beta}A, 5{alpha}A, and 5{beta}P, respectively. These data show that the SD for duplicates in another laboratory was similar to the SD that we obtained for duplicates in our batch acceptance studies."

and More quotes on linearity and precision:
"GC-C-IRMS RESPONSE LINEARITY
The linearity of the GC-C-IRMS response was determined by preparing seven solutions containing all three analytes at 2.5, 5, 10, 25, 50, 100, and 150 mg/L, respectively. One microliter of each solution was injected three times on 1 day, and the three {delta}13C values for each compound were averaged.

precision studies
Instrument precision for 5{beta}P, 5{beta}A, and 5{alpha}A was determined by extracting one aliquot of each QC urine and injecting each extract four times in succession. The within-day precision was determined by extracting 20 aliquots of QC-H in the same batch and injecting each one once. The between-assay precision was determined by extracting one aliquot of QC-H and QC-L per day for 16 days, spanning 15 months, and injecting each aliquot once.

daily system suitability test
To establish a tolerance range, the steroid calibrator was injected five times on each of 5 different days over 2 weeks (n = 25), and the mean {delta}13C values and SDs were calculated for each steroid. Each day, before urine sample analysis, the system suitability was assessed by injecting the calibrator three times and calculating the mean {delta}13C values. The system was considered suitable for batch analyses if the mean {delta}13C values of at least two of the three steroids were within ± 2 SD of the means described above. If the system suitability test failed, maintenance was performed on the gas chromatograph injection port, the GC column and/or the oxidation reactor were replaced, and the calibrator was re-injected. Data acceptance criteria included absence of peak tailing, retention times of the steroids within ± 1% of established values, and a minimum 0.8 V response at m/z 44 for each steroid.

criteria for batch acceptance
To establish a tolerance range for the QC urines for batch data acceptance, two aliquots of each QC urine were extracted and each injected twice per day for 5 days spanning 2 weeks (n = 20). ANOVA was performed using the factors day (5), injection (2), duplicate (2), and QC (2). The mean {delta}13C values and SDs were calculated for each of the three steroids. For batch analysis, each unknown urine sample was extracted and injected once. One aliquot of the two QC urines and one aliquot of the steroid calibrator were analyzed with each batch of samples. These three controls were injected at the beginning, in the middle, and at the end of the batch. The batch was accepted if at least six of the nine means for the three steroids were within ± 2 SD of the previously established means. The acceptance criteria were as noted above except that an instrument response of 0.3 V at m/z 44 was accepted.

statistical analyses
The Smirnov-Grubbs method was used to test the control group for outliers; otherwise, all statistical tests utilized statistical software (SAS). The linearity of the GC-C-IRMS response was assessed by least-squares linear regression. The response was considered linear if the slope was zero at P = 0.01. The method of Koch and Peters (17) was used to determine the SDs of duplicates. The normality of the distributions of 5{alpha}A, 5{beta}A, and 5{beta}P, differences, and ratios was determined by the Anderson-Darling test. The values for 5{alpha}A, 5{beta}A, and 5{beta}P were correlated with Pearson’s correlation coefficient, r. Two-sided paired t-tests were used to compare the means of 5{alpha}A, 5{beta}A, and 5{beta}P in the control group. The general linear model procedure was used to assess differences between the mean {delta}13C values of the ethnic groups, and the power of the analysis was assessed by one-way ANOVA.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References

linearity of GC-C-IRMS RESPONSE
The IRMS response was linear for 5{beta}- and 5{alpha}-androstane-3{alpha},17{beta}-diyl diacetate from 2.5 to 150 mg/L (Fig. 1 ). However, for 5{beta}-pregnane-3{alpha},20{alpha}-diyl diacetate, the response was linear only between 5 and 150 mg/L. The concentrations of 5{beta}- and 5{alpha}-androstane-3{alpha},17{beta}-diol in the control group were 23–430 and 25–204 µg/L, respectively. Therefore, assuming 85% recovery, the amounts of 5{beta}- and 5{alpha}-androstane-3{alpha},17{beta}-diol extracted from the 10-mL urine and reconstituted in 25 µL of cyclohexane were analyzed in the linear range of the IRMS."

and
"precision of the GC-C-IRMS INSTRUMENT AND ASSAY
Instrument precision (SD) for 5{beta}A in QC-H and QC-L was 0.32{per thousand} and 0.41{per thousand}, respectively. For 5{alpha}A, it was 0.08{per thousand} and 0.59{per thousand}, respectively, and for 5{beta}P, it was 0.27{per thousand} and 0.16{per thousand}, respectively. The descriptive statistics for the within-assay experiment on QC-H are shown in Table 1 . The SDs were 0.27{per thousand}, 0.38{per thousand}, and 0.28{per thousand} for 5{beta}A, 5{alpha}A, and 5{beta}P, respectively, and the CVs were <=1.4%. The range of values was 0.9{per thousand} for 5{beta}A, 1.2{per thousand} for 5{beta}P, and 1.8{per thousand} for 5{alpha}A. The between-assay SDs for QC-H were 0.40{per thousand}, 0.42{per thousand}, and 0.44{per thousand} for 5{beta}A, 5{alpha}A, and 5{beta}P, respectively, and the CVs were <=1.8% (Table 2 ). For QC-L, the values were slightly higher. The mean {delta}13C values for the three steroids in QC-L were significantly lower than the means for QC-H."

and more in the original, of course.

Here is the link (full text):

http://www.clinchem.org/cgi/content/full/47/2/292?ijkey=410fd0b48331ea8d370ed8e34f53c2c5d4e33e93

Title is:
Performance Characteristics of a Carbon Isotope Ratio Method for Detecting Doping with Testosterone Based on Urine Diols: Controls and Athletes with Elevated Testosterone/Epitestosterone Ratios

Russ said...

Crud...
full link:

www.clinchem.org/cgi/content/full
/47/2/292?ijkey=
410fd0b48331ea8d370ed8e34f53c2c5d4e33e93

Sorry,

Russ

Mike Solberg said...

Larry said:
(1) and what point is that? (2) Who is the "you" in your reply, Brenna, M-A or a non-expert? (3) Isn't this "specificity"?


(1) The point at which you don't have complete mass spec data from the GCMS to show what you measured, or even if you did have it, if the chromatographic conditions btwn GCMS and IRMS were not the same. LNDD simply has no proof that they did not measure something else along with the 5aA, etc. The question is how to use that fact to establish an ISL violation. Matrix interference under 5.4.4.2.1 didn't work. You won't agree with me that 5.4.4.2.2 applies. So, with that set of circumstances, we are left shooting for "specificity" again, under with 5.4.4.2.1 or 2.

(2) LNDD

(3) Well, I don't know. I guess I have not yet figured out what specificity is, or at least how it differs from matrix interference.

syi

Larry said...

syi -

I sense some frustration on your part, which is understandable.

I'm working on a reply, where I'll try to be more helpful and a little less argumentative.

You made a great point here a while ago. It's worth quoting:

The legal aspects of the case are exactly what are supposed to bridge the gap between the scientists and the rest of us. For that matter, the legal aspects of the case are supposed to bridge the gap between different scientists too, for obviously, they disagree. It is not a matter of giving up the search for "the truth" and debating "technicalities." It is a matter of recognizing the limits of both the science itself, and of our understanding of the science.

In other words, the work we're doing here may be frustrating, but it's important and necessary.

I also told you to hold fast to legal arguments that have a strong basis in common sense. That's because we lawyers assume that the laws have a common sense basis, that they were written to further a specific and explicable intent of the legislators, and that the rules can and should be interpreted consistently with this intent.

Of course, the intent is rarely specific enough to give us direct answers to our questions. It would be great if we could conclude that the writers of the ISL intended all WADA labs to analyze the mass spectrum data. But we don't expect to find that kind of intent in the rules (because an intent that specific would probably be reflected in a more specific ISL rule, and we would not need to be doing this level of analysis!).

The intent we can find in a body of rules like the ISL is probably going to give us general principles for interpreting the rules, rather than specific answers to our specific questions. As an example, I've pointed out that the draftors of the ISL intended to provide rules to guide each lab's creation of doping test protocols, and rules to govern how the lab performs tests once the protocols have been developed. You pointed out that the rules distinguish between threshold and non-threshold substances, and intend for stricter rules to be applied to threshold testing. That is a terrific piece of analysis.

I am working on trying to read together all of the bullet points in 5.4.4.1.2 and 5.4.4.2.2, to see if an overall plan emerges for evaluating a drug testing protocol. My preliminary conclusion is that the reference to "specificity" may have been intended as a general reference to the need to avoid false positives, and that the following bullet points are intended to be more specific rules. Not sure about this. But I'm starting to lean towards the idea that the problem of co-eluting peaks is generally addressed by "specificity" and more particularly addressed by "matrix interference". These are preliminary thoughts, and I'm not sure this is where I'm going to end up.

Mike Solberg said...

A little frustrated (at the ISL and LNDD, not you), and mostly too busy to work out the things I'd like to work out.

My latest finding of how limited my understanding of the science still is: reading the transcript it sounds like Dr. Goldberger says that they can rule out other substances in the peak with the three ion SIM - see especially transcript pdf p. 874. Before and after that is interesting too. But I don't have time right now to figure out whether that's right. If SIM rules out other substances in the peak, then that seriously changes things compared to how I have understood it before, in a way that's not good for Floyd.

syi

Mike Solberg said...

pdf pp. 874-876 actually.

To wit, Goldberger:

"It's also common practice to monitor two ions for the internal standard -- at a minimum two ions for the internal standard -- for the same reason we monitor at least three ions for the drug, is to verify that we don't have
interferences at those ions that could potentially affect the quantification, or at least the abundance, the size, of those peaks."


syi

Russ said...

It's Christmas day, Merry Christmas!!

Since this is the specificity thread...(and I had time on hands once the kids got into their packages).... I dug some definition references out of DPF and Duck's stuff.

Specificity (consider that you have determined that the ISL/SOP's etc predate IRMS and so specificity is probably correctly considered here):

Discussed at DPF, introduced by Duck...
http://www.dailypelotonforums.com/main/index.php?showtopic=1861&st=0&p=37215&#entry37215

Duck links to D.L. Berry's article and in the same post says this:

"Essentially, we are trying to assess the ratio of correct positives (guilty positives) to the number of all positives (guilty positives + innocent positives). Also, the assay sensitivity is, as OMJ pointed out, the fraction of truely positive athletes who also test positive. The assay specificity is the fraction of truly positive negative athletes correctly identified as negative out of all positive negative athletes tested. The probability of a false positive is (1-specificity). The relationship I am using to assess likelihood of guilt is Bayes' rule (first published in 1761):"
Note that although Berry is using T/E ratio in his analysis, Duck and Hughw at DPF found a gold mine of ideas for the IRMS too.

His ref link to 172.berry.pdf has changed and is now found at:
http://www.amstat.org/publications/chance/pdfs/172.berry.pdf
Berry says Specificity is this:
"Specificity and sensitivity characterize diagnostic tests. In the anti-doping analogy regarding the T/E ratio, specificity is the proportion of nonusers with T/E < 6.
Another quote from Berry (contains "'s so the paragraph is the quote):
David J. Borts and Larry D. Bowers, studying the use of advanced laboratory methods to measure the T/E ratio, stated that the "measurement of a prohibited substance [in the urine matrix] requires identification of the compound either by full scan or by consistency of ion abundance ratios between a reference material and the unknown"(p.56). Anti-doping laboratories, however fail to address sensitivity or to recognize its relevance. Specificity generally addressed by keeping records of tests performed on athletes, introduces a clear bias of unknown magnitude because some athletes in there databases may be users but have "normal" T/E.

And another:
Conclusions about the likelihood of testosterone doping require consideration of three components: Specificity and sensitivity of the testing procedure, and the prior probability of use. As regards the T/E ratio, anti-doping officials consider only specificity. The result is a flawed process of inference. Bayes' rule shows that it is impossible to draw conclusions about guilt on the basis of specificity alone.

Note that Berry's article contains additional references that may be of interest to you Larry (if you have access to it someplace).
Black, D.L. 2001 "Doping Control Testing Policies and Procedures: a Critique." In Wilson, W. Derse, E., eds. Doping in Sport: The Politics of Drugs in the Olympic Movement.
Champain, Ill: Human Kenetics Pub; 29-42.

also several of the ref's for Berry's article look to be of possible interest.
Note also that the above PDF quotes were retyped by me as I could not figure the cut and paste, which seems restricted(?).

In (more or less) plane English from Duck post #13 of first link:
"I am using the same definition for sensitivity (probability of correctly identifying a a true positive from the population of true positives) and specificity (probability of correctly identifying a true negative from the population of true negatives) as Berry does. The probability of incorrectly identifying as positive someone who is really negative is the false positive rate. This rate is given as 1-specificity. Berry uses words to define specificity. At the risk of some confusion, I will quote OMJ in his previous post for the outcomes. I am using a different definition of specificity, which is, again, consistent with Berry, i.e. specificity is the probability of testing negative when you are truly negative."

Russ said...

Darn and oops again on the links getting trunkated.

http://www.dailypelotonforums.com/main
/index.php?showtopic=1861&st=0&p=
37215&#entry37215
Also called the "How good does the testosterone test need to be?" thread.

And Berry's link:
http://www.amstat.org/publications
/chance/pdfs/172.berry.pdf

Larry said...

Russ, thanks for posting all of this terrific information. I'm trying to get through it.

Are you suggesting that "specificity" means simply the overall goal of reducing false positives?

Russ said...

Larry,
I do not know for sure if this definition of specificity is the same as applies to the specific case or not. I suspect that it well could. I would hope, if you can find some of those ref's listed by Berry, that there would be something (possibly) to hang your hat on (don't know the probability, just observed the titles and the like).

Hope it helps. I do not like the use of statistics to try to pin down a specific assignment of guilt! That SHOULD be beyond the shadow of a doubt (my prefs).

Regards,
Russ

Mike Solberg said...

Russ and Larry, it looks like Russ has brought up a different definition of specificity. Look at the last line of this quote from "The Fitness for Purpose of Analytical Methods"

Note these statements of Precision relate to quantitative analysis. Qualitative analysis can be treated in a slightly different way. Qualitative analysis is effectively a yes/no measurement at a given threshold of analyte concentration. For qualitative methods precision cannot be
expressed as a standard deviation or relative standard deviation, but may be expressed as true and false positive (and negative) rates. These rates should be determined at a number of concentrations, below, at and above the threshold level. Data from a confirmatory method
comparison should be used if such an appropriate method is available. If such a method is not available fortified and unfortified blank samples can be analysed instead.

% false positives = false positives X 100/total known negatives

% false negatives = false negatives X 100/total known positives

Note that biological analytical chemists and microbiologists treat false positives and false negatives slightly differently, using the terms selectivity and specificity in a way that conflicts with chemical usage.


This document helps explain things about specificity, but I'm still ingesting it.

syi

Larry said...

syi -

I am considering the possibility that "specificity" does not mean what we thought it meant. I think that "specificity" may simply refer to the ability of a test to avoid false positives. This would explain its appearance in 5.4.4.2 as the first bullet point, as it would make sense to list the most general requirement first and then move on to more specific things. It would also explain why there are later bullet points (like the one for matrix interference) that seem to us to cover the same matters we thought were covered under specificity - "specificity" may be a more general way of referring to the problem of one peak disappearing into another, and "matrix interference" may be a more specific reference to the same problem.

This points to the general need to try to read the ISL in context, and (to the extent we can) without thinking about a particular issue in a particular case. I am trying to work up a post on the ISL that will do just this, that we can use to discuss the kinds of legal questions being raised by you and Ali.

Russ said...

Larry, syi,
Swim, great find! Not to upstage you but I will make a few observatios from your link (to bad it is copyrighted!).

Specificity is first mentioned technically in 6.23 regarding positive identification falling below 100% for an analyte when it's concentration fell below a threshold.

Again mentioned in 6.38 regarding different usage between the way that biological analytical chemists and microbiologists verses chemical usage for of selectivity and specificity.

Both 6.23 and 6.38 allude to statistical evaluations and 6.38 brings in some of the formulas from the Berry/Duck/OMJ items from DFP.

So Larry, I am thinking specificity can be used both at the high (large group)level and the unique test level. The two seem to mingle as the large group statistics are used to define criteria to be applied to unique cases. The unique case (to me) seems to advise, if not require 100% positivity to claim the achievement of specificity.

6.30 regards Accuracy as a measure of closeness to a true value having two components to study:precision and trueness.

6.31,32 deal with trueness

I'd say 6.13 to the end of 6 call for a close read.

Regards,
Russ

Larry said...

Russ, thanks for your great posts recently. They have me thinking in new ways.

I am making an effort to understand the rules of the ISL in context - in the context of the ISO 17025 requirements, and also by trying to understand ISL rules on quality control as addressing a different concern than ISL rules on test validation. I think that both classifications of rules can be the basis for challenging a lab and showing an ISL departure. However, test validation seems to go to the soundness of a test as a general matter, while quality control seems to look at each performance of a given test. Obviously, these two concepts are related - it seems to be part of good quality control to at least ask whether a quality control problem might be due to a problem in the validity of the test.

Swim and Ali, when you guys are examining flaws in the FL test results, it seems to me that you are engaging in a quality control effort. So I am trying to understand the arguments you guys are raising in this context. What kind of quality control was LNDD required to perform on the FL tests?

Swim, did LNDD's quality control rules require them to search for disappearing peaks? Does ISL 5.4.7.3 set forth the required quality control here?

Ali, it seems to me that LNDD was NOT required to reanalyze the EDF data in April. However, they DID do the reanalysis, and arguably, some significant quality control issues emerged in the reanalysis. What did the ISL require LNDD to do about this, and if the ISL is silent, what did ISO 17025 require LNDD to do about this?

Swim and Ali, I am thinking that the rules under ISL 5.4.4.2 are NOT the right place to start an analysis of problems we see in how LNDD performed its testing ... unless we are arguing that there's something wrong with these tests as a general matter. For example, our arguments about a GC column switch seem go to test validity.

I'd like your thoughts, as this line of analysis is being built into a longer piece I'm writing on the ISL.

Larry said...

Swim (and Duck, if you're listening), have you looked at ISL 5.4.7.3? What do you make of the statement there that lab quality control activities include comparison of mass spectra or SIM ion ratios to reference material or reference collection samples?

My first read of this provision was that the ISL sets up SIM selective ion monitoring as a satisfactory method of quality control, on a par with mass spectrum data. On a second read, I wasn't so sure.

Mike Solberg said...

It sounds to me like that says SIM is okay - the question is what is okay for? Certainly it is okay for identification. But specificity must still be guaranteed somehow.

Unless I misunderstand what SIM really is, then it shows definitively that substance x is present in a sample, but it does not rule out the presence of other substances.

This raises a larger issue I have been thinking about, but I need to confirm some things before I write about it.

syi

Larry said...

Mike -

I am utilizing my copious spare time to work on an opus for publication here at TBV that will discuss the ISL in some depth, and will hopefully serve as an analytical framework for the legal questions I've been discussing with you and Ali. The opus is 10 pages long and growing.

I've been thinking about the distinction between threshold and non-threshold substances, and how the tests for exogenous testosterone seem to straddle these two categories. I've indicated that testosterone is a non-threshold substance, and this continues to be my opinion. But it's making less sense to me to apply the ISL non-threshold substance rules to the existing tests for exogenous testosterone.

I understand the difference between having to detect the presence of a non-threshold substance versus having to quantify the presence of a threshold substance. However, the problem of quantifying a WADA threshold substance is less demanding than it might first appear: all the lab needs to do is to confirm that the substance is present above the legal threshold. That's pretty much the same thing the lab has to do with the T/E and CIR tests for exogenous testosterone.

One might argue that for threshold substances, you want to accurately know by how much the athelete exceeded the threshold. If you're looking at alcohol, for example, it might be important in a given case to know whether the athlete was drunk, or very drunk, or crazy drunk. To be certain, this is an important factor in DUI (driving under the influence) cases. But given WADA's strict liability rules, the need to quantify the excess above the threshold is lessened (drunk is drunk, no excuses). We've also seen in the FL case how WADA and the prosecution tried to gain ground by arguing that FL's testosterone levels were WAY above the limit. So, the more I look at this question, the more I think you're right to question whether the testing rules for threshold substances should be applied to testosterone testing (even though, as I keep saying, testosterone is a non-threshold substance).

Finally ... as I work my way through this opus ... I'm concluding that your arguments about the mass spectrum data are arguments that go to the question of method validation. (The article that Russ posted here on method validation is extremely valuable.) If you look at the argument about the mass spectrum data as an argument questioning the validation of LNDD's SOP, then you can understand why the FL legal team might have avoided this argument. I think it's a tall order to invalidate a lab method, both as a legal and a scientific matter. I presently think that LNDD's SOP for exogenous testosterone WAS improperly validated, because the method failed to require an examination of the mass spectrum data ... but you can understand why the FL team thought they had a better chance arguing that the problem was with the lab and not the method.

And by the way, this is now my "official" position: LNDD violated the ISL by failing to analyze the complete mass spectrum data as part of its CIR testing. This is not an easy argument to make, and it's not an argument that WADA arbitrators are likely to accept, but I think that this conclusion emerges clearly from a close analysis of the WADA rules and the ISL. It's going to be some time before I can put this analysis into written form.

(By the way, if you end up concluding that selective SIM ion monitoring works just as well as complete mass spectrum data to ensure peak purity, then of course my "official" position is going to change.)

Mike Solberg said...

Thanks for the update on where you are at Larry. I see we are moving in the same direction regarding the science, but I am seeing it more from a historical perspective than a legal one. I'll try to lay out what I have been thinking.

I have found the only way I can write this is to address some basic issues, so this will cover some ground that everybody probably knows, but I couldn't get it to come out clearly any other way. And, sorry, it is VERY, EMBARRASSINGLY long. And really, the only remotely new thing here is to put the lack of the complete mass spec data in historical context, explaining how this situation could have come about, and why it is so hard to fight it.

First, a brief note for Larry: I figured out what the comments of Goldberger meant in context, and it doesn't change anything about our understanding of SIM. SIM can tell you that substance x is in a sample, and it can even tell you how much of substance x is in a sample. But it cannot tell you whether there is any of substance a, b or c, that is co-eluting in a GC/MS peak. That is what we have thought and that is right.

Now, I trust that you, Larry, will clearly, and convincingly lay out the legal side of this (that 10 pages - so far - had better be good for something!), but my basic understanding is now that LNDD's method of detecting the presence of exogenous testosterone is not "fit for purpose." This failure of the assay to be fit for purpose has come about because LNDD failed to take account of a key difference between the pre-IRMS testosterone test, and the GC/C/IRMS testosterone test.

Before IRMS (about 1997 at UCLA - I don't know exactly when LNDD got it, but certainly before 2003), testosterone was clearly a threshold substance. The T/E ratio had to cross the quantitative threshold of 6/1 (now 4/1). The key measurement was the AMOUNT of testosterone compared to the amount of epitestosterone.

In that context, TD2003IDCR makes perfect sense. In order to use GC/MS to detect exogenous testosterone you can identify and measure the amount of testosterone and epitestosterone with a GC/MS in Selected Ion Monitoring (SIM) mode. By monitoring three ions of each substance you are able to both clearly identify and quantify both T and epi-T. First you separate the substances with the GC, and then monitor the three key most abundant ions (i.e. the "diagnostic ions") of the proper GC peak with the MS.

The key thing to note here is that even if your GC separation is not perfect, or even if there is an indistinguishable perfectly co-eluting peak, leading to other substances in your GC peak, it does not invalidate your measurement. If the ratio of the three diagnostic ions is right (those are known ratios) then you know you have the right substance with no interference AT THOSE IONS. There maybe other things in that peak, but, when you are not using IRMS, it doesn't matter because you have identified and quantified your testosterone properly with the three diagnostic ions. Same for epi-T.

(This is not really relevant to my present topic, but as it turns out LNDD only analyzed ONE diagnostic ion in their T/E test, so the arbs ruled their process violated TD2003IDCR and that the evidence of that test was without value. If that was the only test, the case would have been dismissed.)

Now, when TD2003IRCR was approved in 2003 IRMS was still relatively new, and not all WADA labs had an IRMS machine. When TD2004IAAS became effective in August of 2004, IMRS was still officially an "add on" - that is, if the T/E confirmation test, describe roughly above, was not conclusive, then an IRMS test was recommended. Thus TD2003IDCR was written so that administration of exogenous testosterone could still be confirmed without IRMS - and, in fact, as best I can tell, that is still the situation today. That said, even WADA understands that the IRMS test is much more conclusive when done properly.

So, historically speaking, you have the situation in which GC/MS works "fine" (at least by WADA standards), and then, at different times in different labs, the IRMS test is added as an additional piece of evidence.

Now, what LNDD apparently did when they added the IRMS test to their arsenal, was to continue to do the GCMS test the same way they had before - with SIM - the way it had been perfectly effective before, and, importantly, a way it is perfectly in line with TD2003IDCR.

HOWEVER, when the IRMS test is added to the GCMS test, the GCMS test must be done a little differently. Of course, you are analyzing metabolites of testosterone, not T itself, but that is not important here. The key thing is that SIM no longer does the job you need done. For GC/C/IRMS you not only need to know that there is substance x in a given GC peak, but you also need to know that there is nothing else in addition to substance x in the peak. SIM mode does not tell you this. Only doing a full ion scan of a given peak will tell you this. That is, you have to monitor not just three diagnostic ions, but all ions present in the peak because all the ions are going to the IRMS. Remember for the T/E test, it didn't matter if other stuff was present. The SIM/three diagnostic ions told you everything you needed to know. But doing the same thing as part of the IRMS test is not good enough. It does not give you the information you need.

So, if the GCMS part of GC/C/IRMS is done in SIM mode then the whole assay is not fit for purpose, even though, for "historical" reasons, it is probably technically allowed according to TD2003IDCR.

The wise reader will ask why WADA has not yet recognized and fixed this problem. Well, I can only guess that it is because, prior to the Landis case, no lab had run the GC/C/IRMS assay in exactly this way. You see, while TD2003IDCR says that SIM mode is acceptable, it also says the "preferred method" is to use full scan mode. And if you run the assay with this full scan mode then you have the conclusive evidence you need that you have measured the right stuff in the IRMS. I imagine that other labs have either not done IRMS or have done IRMS with the GCMS part done in full scan mode. Dr. Goldberger testified at the hearing that he had seen lab documentation packages from the UCLA lab and they included the full scan data for the T/E test - and if they did it for that, they would surely do it for the GCMS part of the GC/C/IRMS test also.

So, why didn't Landis' legal team press this particular issue at the hearing - that the way LNDD does the GCMS part of the IRMS test makes the assay not fit for purpose? Well, one answer is that, as Larry said above, that is a very difficult legal argument to make given the nature of the controlling documents. Especially given that the way LNDD does it appears to be okay according to TD2003IDCR. The truth is that TD2003IDCR has inadequately accounted for the nature of the IRMS test - leaving the "loop hole" of running SIM mode open, even when doing IRMS.

Another answer as to why Landis' legal team didn't press this at the hearing is to say that "They did, sort of." This is exactly what they were trying to get at with all the arguments about good chromatography. If they could not use (the inadequate) TD2003IDCR to prove an ISL violation, they had to get at the issue another way. If they could show that there was a some degree of likelihood of other material in the peaks of interest, then that should have raised enough questions to prove an ISL violation of the "matrix interference" bullet of ISL 5.4.4.2.1. But there are two problems with that approach - ISL 5.4.4.2.1 has weak language ("should" rather than "shall"), and also that there are no cut and dried criteria for what constitutes good or bad chromatography. The arbs, who did NOT understand the critical flaw in applying TD2003IDCR to IRMS, allowed the use of SIM to stand, with mediocre chromatography, because "should" does not mean "shall."

And this is where the "threshold substance" vs. "non-threshold substance" argument comes in. The way the IRMS test works, clearly testosterone SHOULD be a threshold substance, and the stronger language included in ISL 5.4.4.2.2 should apply. But because WADA has not yet admitted the fatal flaw in applying TD2003IDCR to the IRMS test, both Landis' legal team and the arbs assume that testosterone is a non-threshold substance and the weaker language of 5.4.4.2.1 applies.

There is more to talk about with regard to what exactly could be in those peaks of interest other than what should be there, but that's obviously enough for now.

Bottom line is that LNDDs method of using the SIM mode for the GCMS part of the GC/C/IRMS test makes the assay not "fit for purpose."

Oh, I just have to add one more thing - about ISO certification. This apparent technical adherence to TD2003IDCR is why LNDD could get ISO re-certification for its IRMS test just six months before Landis' tests. ISO certification does not assure that the test really does what it is supposed to do. It only assures that the test is in line with the controlling documents. If those documents are flawed, that's not ISO's fault.

Happy new year!

DBrower said...

Mike,

brilliant, thanks very much!

Happy new year to you too!.

-dB

Russ said...

syi,
Great post!

syi, Larry,
You guys are hot on the trail of the truth, best it can be sniffed out, me thinks.

You have clearly earned the BS degree (yes bachelor of science!) and now are on moving into the masters thesis part!

syi, You get a double major as the official TBV historian.

syi, Larry, Ali, TBV et al
Happy New Year!

Russ