Tuesday, January 29, 2008

Larry's Curb Your Anticipation, Part 7: Uncertainty

Up to the Introduction; back to part 6; on to part 8.

Hey folks, we're HALFWAY there!

By Commenter Larry


Criteria for Fitness for Purpose

There are a number of criteria a lab can use to determine the “fitness” of a particular method. Two of these criteria – uncertainty and selectivity/specificity – are arguably the most important, so we’ll limit our discussion of method validation to these two criteria – and to the myriad of sub-criteria related to these two principle criteria. It's going to take two posts to talk about uncertainty. We won't get to selectivity/specificity until part 9.



[MORE]


  • Uncertainty. In simplest terms, “uncertainty” is a measure of the “accuracy” of a lab method. The “accuracy” of a lab method consists of two components: “trueness” (the closeness of a single result to the true value) and “precision” (how close multiple measurements made by the same method are to one another). Eurachem Guide 6.30.


Obviously, “trueness” is a tricky concept: how can you tell if a new method is providing a “true” result? According to the Eurachem Guide, a lab determines the “trueness” of a method by comparing method results against a known “reference value”. The lab has two techniques available to determine a “reference value”: (1) the lab can utilize a characterized (or reference) material, where the value the method is supposed to measure is already known, or (2) the lab can compare the results of its test method against a different method that has already been validated and approved for “trueness”. See Eurachem Guide paragraph 6.31.


The concept of “trueness” has a few related concepts that are referred to in the ISL and in ISO 17025. One of these concepts is “bias”. “Bias” is the difference between the expectation of the test results and an accepted reference value. See Eurachem Guide paragraph A2. There are, in turn, two types of bias: “method bias” and “laboratory bias”. “Method bias” is bias inherent in the method; “laboratory bias” is the additional bias peculiar to the laboratory and its interpretation of the method. See Eurachem Guide paragraph 6.35. I personally do not find “bias” to be a useful concept, but it is referred to in the ISL (see ISL 5.4.4.3.2.1), so I thought I should mention it here.


A second related concept mentioned in ISO 17025 and the ISL is “traceability”. “Traceability” refers to the ability of a test method to relate to a known standard. See Eurachem Guide paragraph A30. In the context of our discussion here, “traceability” means that the “trueness” of a method has been measured by testing the method against a standard that is well-accepted in the scientific community – for example, the method would be “traceable” if it could be tested on a reference material that is widely accepted in the field of doping control. Traceability merits its own section in ISO 17025 (5.6), but I think traceability is best understood as a property of “trueness”.


(Interestingly, the ISL by its terms seems to preclude any meaningful validation of the “trueness” of a lab method. WADA labs cannot determine “trueness” by utilizing a known reference material, since according to the ISL, “[f]ew of the available reference drug and drug Metabolite(s) are traceable to national or international standards.” ISL 5.4.6.1. And given WADA’s proclamation (discussed above) that standard methods are not available for doping control, it would be impossible for a WADA lab to determine the “trueness” of a method by comparing the method to a second, already validated, method. My guess is that WADA labs validate the “trueness” of their methods by utilizing whatever reference materials they can find, but I have no way to know for certain that this is what they do.)


As stated above, method “accuracy” is a factor of both method “trueness” and method “precision”. “Precision” is a measure of the closeness of method results when the method is repeated under the same conditions. See Eurachem Guide paragraph 15.1.


“Precision” is itself a measurement of two other criteria, “Repeatability” and “Reproducibility”. “Repeatability” is a method’s precision where the method is performed on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time. (Eurachem Guide paragraph A21). “Reproducibility” is a method’s precision where the method is performed on identical test items in different laboratories with different operators using different equipment. (Eurachem Guide paragraph A22). To complicate matters slightly, the ISL requires that method validation for threshold substances consider a criterion called “Intermediate Precision”. (See ISL Rule 5.4.4.3.2.1.) “Intermediate Precision” is the variation in results observed when one or more factors, such as time, equipment and operator, are varied within a laboratory. See http://www.measurementuncertainty.org/mu/guide/analytical.html. In other words, “intermediate precision” is a criterion that falls somewhere in-between repeatability and reproducibility.


(Interestingly, the ISL rules briefly refer to repeatability, see ISL Rule 5.4.4.3.2.1, but never to reproducibility. This omission may reflect WADA’s relative lack of concern with achieving consistent results among its various accredited labs. One further point: we can see that when Ali points to the variation between the LNDD S17 test results and the results achieved later upon the EDF re-analysis, he is pointing to a potential problem with the “intermediate precision” of LNDD’s test methods.)


Method “accuracy” is measured differently, depending on whether the method purpose is quantitative (as it would be for WADA threshold substances) or qualitative (as it would be for WADA non-threshold substances). (The Eurachem Guide says that this distinction applies to measurement of precision, see Eurachem Guide paragraph 6.37, but it would seem to apply equally to measurement of trueness.) If the method purpose is quantitative, “accuracy” is measured by looking at the amount that the test results differ from each other and from the reference value. If the method purpose is qualitative, “accuracy” is measured based on the percentage of the time that the test generates a false positive result or a false negative result. In either case, the method’s “purpose” should define the required test accuracy.





Up to the Introduction; back to part 6; on to part 8.




20 comments:

blackmingo said...

HI Larry -plodding my way through the opus. Just a quick minor detail comment -will need to read the posts all at one time, I think.

Your last paragraph:
"If the method purpose is qualitative, “accuracy” is measured based on the percentage of the time that the test generates a false positive result or a false negative result. "

Accuracy is the % of true positives or true negatives out of all testing observations [(TP+TN) / (TP+TN+FP+FN)]. As you have it above, you are describing what could be called inaccuracy. I don't think it impacts on your general argument, but as it is, that statement is correct. Another way of describing accuracy in medical testing is the Area under the so called "ROC" or receiver operator curve, which plots the true positive rate on the vertical x-axis and the false positive rate on the y-axis.

On a side note, it has been pointed out previously that the enthusiasm over the T/E ratio catching dopers was drained upon learning of cases where false positives occur. Additionally, posters here and at DPF have tried to estimate the false positive rate for IRMS, with the small amount of data available. My take on this, after reading the Shackelton, Aguilera, Catlin studies that established its use in doping detection, there is a blinding obsession with increasing sensitivity of the testing, that they forgot that accuracy includes the true negatives in the numerator, that in diagnostic testing, as true positives increase, true negatives usually go down, to the point where they do not even mention false positive rates, let alone accuracy in their results or discussion sections. It is this type of behavior that usually signals trouble when analyzing the evidentiary base of medical testing; it is a common mistake in this area.

Anyway, onward with to the next installment -thanks for your labor!

blackmingo said...

I wish I could edit that 4th paragraph rant:

My take on this, after reading the Shackelton, Aguilera, Catlin studies that established its use in doping detection, there is a blinding obsession with increasing sensitivity of the testing, that they forgot that accuracy includes the true negatives in the numerator. They forgot that in diagnostic testing, as true positives increase, true negatives usually go down. They are blinded to the point where they do not even mention false positive rates, let alone accuracy in their results or discussion sections. It is this type of behavior that usually signals trouble when analyzing the evidentiary base of medical testing; it is a common mistake in this area.

dan

tbv@trustbut.com said...

While I agree about the sentiment of needing to know the false-positive rate, I'm not optimistic about its direct relevance in doping cases.

This was one of the things that was argued in Hamilton, and is, in fact, the underlying issue of the ridiculed "vanishing twin" hypothesis.

The validation studies don't seem to have a requirement for a statistically valid FP rate below some acceptable limit. It's one of the things we're told the LNDD validation study didn't even consider.

Which reminds me, has anyone seen the LNDD validation study? It seems notably absent in our archived collection, and I don't ever remember seeing it circulated.

TBV

Larry said...

blackmingo, point well taken about accuracy and inaccuracy. I should have said that qualitative accuracy is based on the percentage of time that the test generates TRUE positives and TRUE negatives.

I mostly tried to avoid math in the opus, figuring that I'd REALLY narrow my audience if I talked law and math at the same time!

You've mentioned an issue of great interest to me, which has to do with whether the ISL should specify a maximum false positive rate for qualitative method validation. I don't know whether such a thing is possible or even desirable. Are we better off NOT specifying an acceptable false positive rate, in the hope that ANY false positive encountered in method validation indicates that the method is not fit for purpose? I'd like to hear your thoughts on this, as I have not encountered much about this issue in my research.

TBV, I think that the question I'm raising here is pretty close to the one you raised here last night. I have not seen any LNDD validation studies, and I doubt that LNDD has disclosed any of its validation studies.

tbv@trustbut.com said...

Larry,

We know there was a validation study for their IRMS, and we think it is in the record, in some exhibit we don't yet have.

What we've heard is that it does catch subjects known to dope, but that it did nothing to check for false positives. While that may be important, it's also true the same argument was pretty much rejected in Hamilton. Jacobs/Suh didn't make a big point of it, I think because of that precedent.

So, while in part 12 there are good arguments you ought to be able to challenge based on that, it is not clear to me that it is productive to do so.

TBV

Larry said...

TBV -

I didn't know that there was a validation study for the IRMS in the record. Was this a method validation study performed by LNDD on the method it developed for its CIR testing?

I don't read the Hamilton case the way you do. The Hamilton decision evidences some amount of concern about false positives. The decision stated that "in addition to being satisfied with the WADA protocol, the Panel must also be comfortable that a false positive result is unlikely" and "The grave concern is the false positive in a mixed blood population analysis. Could it occur?"

Then the decision goes a little bit south on us. The decision did report a study performed on 48 subjects where there was "100% accuracy". The decision did note the theoretical possibility of false negatives, but ruled that it would be impossible to achieve a false positive, and thus that there was "no need to do so called validation studies." This is complete nonsense, as I hope should be apparent from the 12-part (and growing) series. Right?

Nevertheless, the decision does not stand for the proposition that you never have to consider false positives when doing method validation. At most it stands for the proposition that you don't have to consider false positives in testing for blood transfusions, and that you might not have to test for false positives when you're using a robust test with some good supporting published studies.

I don't think that you could apply the Hamilton decision to CIR testing, which is anything but robust.

tbv@trustbut.com said...

I think we agree in theory, and disagree in practice. My perception is that there are things argued as defenses in parts 12 and 13 that make logical and legal sense, but which are not practically useful.

I keep returning to the example of Hamilton because while logically, false positive ought to matter, the issue was not a winner for him. It's my sense that the issue is not ever going to be a winner on its own merits until someone comes in with proof of some kind that this case *is* a false positive.

WADA World seems to have an institutional tin-ear for false positives. The only one who seems to raise the spectre is Catlin, who always seems to be fretting about it, unlike anyone else. Catlin is too careful to call his erstwhile compatriots out on the issue, but his talk of a clear false-positive case being a utter disaster seems to go unheard.

So we can agree that Hamilton says you can and should consider it. However, in not demanding anything of the validation study about false positives, we're left with nothing but another empty promise that someone might be able to raise such a defense, but not this loser.

TBV

Larry said...

TBV, careful. The law is something more than just what the powers that be say it is. The law is something more than what the judges say it is, and it's something more than what proves to be practically useful. Otherwise, I could have written a two sentence CYA:

1. If your "B" sample was analyzed by the same person who analyzed your "A" sample, there's no AAF.

2. Otherwise, you're screwed.

Rules are rules. They have a language and a meaning separate and apart from how they are applied. 500 years ago, wars were fought over whether the Bible would be published in the vernacular, because the powers that be knew that words have their own power. 80 years after the Declaration of Independence, some "idiots" read that all men were created equal, and decided that meant the end of slavery. Yeah, it's laughable to discuss professional cycling in the same breath with Reformation and Abolition, but the principle is a big one even if the questions we consider here are not as big. "Stop" does not mean "go", "war" does not mean "peace", "freedom" does not mean "slavery", no matter what anyone else might say and how much power they might have.

TBV, I don't mean to attack you personally. I know where you stand and have a pretty good idea of what you believe. You've fought the good fight here for longer than I can imagine.

It's interesting. In my preface to this series, I warned everyone that they'd come to the end of the series and wonder what it was all supposed to be about. And I've been thinking about what I might say if I was asked to write an epilogue to this opus. TBV, maybe you've written it for me.

Here's why I wrote the series. What I wanted, Mr. TBV, was for you to read the Hamilton case, say "that's full of shit", and be able to employ the combined logical power of science and law to back your argument. I wanted Mr. Idiot to read what the FL arbitrators said about matrix interference, say "that's full of shit", and use the combined logical power of science and law to back his argument. If my article did what it was supposed to do, you guys can make your arguments not by plucking ISL rules from here and there, but by employing the ISL as an coherent whole, as a document with a logic and a purpose connected not to the exigencies of WADA World, but to the imperatives of science.

I wanted to put the lawyers and the scientists on the same page. There are many things wrong with the ISL, but even WADA World could not completely screw up a document that derives from ISO 17025 and is intended to provide standards for scientists. I was tired of hearing from you, and Ali, and Mr. Idiot, how the science and the law are at cross-purposes. If my article did what it was supposed to do, then you can now read the ISL (at least the portions of the ISL covered in the article) consistently with sound principles of good science.

I'm relatively confident that the article did not manage to do all I wanted it to do. I'm relatively confident that my article is a cautious step in the right direction. I'm reasonably happy with that.

blackmingo said...

TBV,

I agree that musing about the relative paucity of data articulating the accuracy of the IRMS testing in real life situations is not helpful to Floyd's case. I just have never tired of complaining about it.

I would love to see the phantom validation studies myself.

Larry -thanks for the work -hope to read all of it together this weekend.

Dan

tbv@trustbut.com said...

Larry,

I don't take it personally at all. I appreciate your passion, and your frustration with my cynicism pretending to be pragmatism.

There was a time when I was a certain kind of idealist, and believed in the law and the power of words are you present -- probably peaking about the time the Judiciary Committee voted to refer Nixon's impeachment to the floor of the Senate.

Now, I perceive words as political tools, and laws and regulations as compromises arrived at for certain ends. We end up with normative use of "should" in place of "must", being used to mean "you don't have to at all", and other non-specificities said to allow "flexibility" and "creativity" being used to evade responsibility.

I wish you are right, that actually understanding the regulations and rules would lead to an ability to argue them successfully given a set of facts. I'm afraid that in the geo-political circus that seems to be the environment here, that will not be the case. In honesty, we know Atticus Finch did not have a good chance on appeal with Tom Robinson, as much as we might wish it were otherwise, and as much as belief in the law might lead us to hope.

Have I ever mentioned Captain Dreyfuss here?

And for the record, I think Whittaker Chambers was a slime, that the case was politically exploited, and that Alger Hiss' guilt was only settled by the Soviet archives.

I'm very pleased to have your work shed light into the rules that seem so likely to be misinterpreted. If you are failing to provide completely clear answers, it is because the situation has been made intentionally murky by those who wrote the rules and interpret them.

thanks,

TBV

Mike Solberg said...

Larry, I want you to know that you have indeed accomplished your purpose, at least in my regard.

I wanted Mr. Idiot to read what the FL arbitrators said about matrix interference, say "that's full of shit", and use the combined logical power of science and law to back his argument.

As you saw, that is exactly what I did (in slightly different words) when I read part 13.

Your reading of the controlling documents makes perfect sense. The stunning thing to me, yes, even after all this time, is that not only did the majority arbs not understand the science, they didn't understand the law either.

Unfortunately, I think TbV is only about a baby step more cynical than I am with regard to the law in this case. He said,

If you are failing to provide completely clear answers, it is because the situation has been made intentionally murky by those who wrote the rules and interpret them.

The word I have doubt about in that sentence is "intentionally." For most of WADAWorld, I don't think the murkiness is intentional. Unfortunately, I think it was intentional by Dick Pound.

One of the great contributions to this whole situations was the legal history of the Code written way back some legal guy (does anyone know what I am talking about and where it is?). It basically said that Dick Pound wrote and rammed through the WADA Code, and I don't hesitate to believe that Pound has worked to be sure the Code and even the ISL are "murky" enough to allow politics to rule the day in important cases. I am not normally given to such distrust of authority, but I think Pound has earned it. See this story about him in Wired. He's a scary guy.

I don't quite understand your comments to TbV, Larry. You said

If my article did what it was supposed to do, you guys can make your arguments not by plucking ISL rules from here and there, but by employing the ISL as an coherent whole, as a document with a logic and a purpose connected not to the exigencies of WADA World, but to the imperatives of science.

But it seems to me that at the practical level the ISL is still not connected to the imperatives of science. You have explained well how the parts (ISO and ISL and TD's) are supposed to fit together, but the result, I think, is that even more is left to the discretion of the lab, which, in turn, is not held to adequate standards to assure they are doing what they are supposed to be doing. But, maybe I misread your opus.

syi

Mike Solberg said...

While looking for that other history article, I found ,Playing Fair: Why the United States Anti-Doping Agency’s Performance-Enhanced Adjudications
Should Be Treated as State Action
. If you want the gist, read the last paragraph. I don't remember reading it before. Very interesting.

I wonder if Suh and company are gearing up? If so, I bet they win at the California state level and lose at the Supreme Court (unless the process takes so long that our next Democratic president is able to appoint one or two people to the Supreme Court before the decision is made).

Mike Solberg said...

This is fascinating too. Governing Doped Bodies: The World Anti-Doping Agency and the Global Culture of Surveillance. You gotta love an article which puts WADA in the light of Michel Foucault.

syi

Larry said...

Mike, I'm probably more into hermeneutics than post-structuralism. I sometimes fear that TBV is sliding down the slippery slope of legal positivism into the morass of legal realism. I may have to send him the collected works of Ronald Dworkin for his birthday.

Enough! All this erudition is making me light-headed.

You wrote that "on the practical level the ISL is still not connected to the imperatives of science." Well, let's take one step at a time. We've seen that much of the ISL is connected to the imperatives of method validation, and method validation is an imperative of science. You say that much is left to the discretion of the lab, and that's true, but that's true for all labs. The LNDD is ISO accredited - perhaps the system of accreditation is not all it needs to be, but if so, that affects all labs and a heck of a lot of what passes for "scientific" out there.

On some level, Mr. Idiot, it kind of comes down to people. You can write rules, but the people have to follow them. You can have sound principles of science, but the scientists have to follow them. You can have oversight, but you're still relying on a human over-seer.

The law does not protect us because it sits on a shelf in a library. The science does not protect us because it sits on a different shelf in the same library. We say that words have power, but that's not quite true. An unread word has no power. A word that is read but not understood has no power either.

When the word is read and is understood, then it has power.

Understanding is a thing that evolves. It doesn't happen overnight. There are going to be mistakes along the way, and the consequences of some of these mistakes may be tragic. But there's no giving up. I'm not saying this as a pep talk, I'm saying this as a quality of the human condition. You can be as cynical as you want to be, but the second time you read a word, you'll see it differently.

You express surprise that the majority did not understand the law. But you shouldn't be surprised. It takes an effort to understand the word. Look at how much work went into the little bit of insight we've gained into the ISL.

As a general matter, I'm about as cynical as they come. But this point is too important to take cynically. Rules matter. Prevailing belief to the contrary degenerates into self-fulfilling prophecy. Rules stop mattering when we stop paying attention to them, or stop trying to understand them.

I'm sorry about what happened to FL. It wasn't fair. I think that a better set of rules might have saved him, and I think that a better understanding of the existing rules might have saved him, too. Maybe it's not too late.

In any event, it's a little too early to judge the power of the word here. I do think we've made some progress, but it's a small amount of progress, a baby step in what could be a long walk. There's still the matter of "quality control" to consider. And I fear that our inquiry will not end with "quality control".

tbv@trustbut.com said...

Perhaps it's my own Scandinavian descent that inclines me towards legal realism. I'm certainly more inclined towards Richard Posner than Scalia. Maybe I'm should read some Dworkin to broaden my mind. I guess what I really see is people posturing as positivists and acting like realists.

TBV

Mike Solberg said...

TbV: I guess what I really see is people posturing as positivists and acting like realists.

That perfectly captures what WADA seems to be all about. Perfectly.

Russ said...

Larry,
I finally read through the 'opus'. Thanks for the effort, great progress!

I came back to this section for a couple of comments:

Regarding traceability, I think there is more to it to consider. Perhaps your quality control will touch on this. Keep in mind that traceability is a key and basic component of all measurement.

Traceability is part of the calibration of the scales or measures that portion out the chemicals used in fractionation and the measure of the sample itself.
It is part of the calibration of the electronics that report the ion count in small fractions of the ampere. It is part of the specified accuracy of the pressure gauges. It is even part of the tech's use (or lack thereof) of the pressure gauge readings. The lack of proper application of readings and measures is a break in this traceability and of the normally assumed standard contribution to uncertainty.

Don't know if this list is actually helpful but I though it worthwhile to get in into print as a reminder to not forget these components.

Regards,
Russ

Russ said...

Oh and one more item!

Traceability is involved in the proper machining to close tolerance of the magnet in the newer isoprime that was to achieve a specified magnetic field pattern. The rerouting of that field by the 'mouse ears' destroyed that traceability and contributed and unknown variable into the measurements on that machine, as did the varying pressure and possible leaks in the first machine (going on memory about the variations of pressure, hope that one is correct).

Russ

Larry said...

Russ -

I'm glad you checked in. Special thanks to you, in particular for pointing out the Eurachem series of guides to me. There's no way I could have written the opus without your help.

Your comments make me smile. You think there's more to "traceability" than what I wrote? AS IF there is ANY portion of the opus that doesn't require further explanation! This is a "50,000 foot overview" if there ever was one.

I am making NO PROMISES that there will ever be a Part 2 of the opus.

I am doing some preliminary research to see what I think would be involved in writing a Part 2. I am trying to read up on quality control, and you're right that "traceability" appears to be a QC concept as well as a method validation concept.

Thanks for the nice words.

Larry said...
This comment has been removed by the author.