Monday, January 28, 2008

Idiots Return to Brenna and WM-A, Part II

Alasdair, once Ali, continues...

In Part 1, we managed to recreate Brenna's results. In Part 2, we're going to explain exactly how they occurred with reference to both the m45 and m44 signals and what happens when peaks overlap. However, before we do that, we need a little more background on what the software does.

Some of you may be thinking "We understand the logic, but m45 leads m44,not the other way round, you idiot."

Well, that's true, but hold on...

[MORE]


Brenna stated that his software compensated for the misalignment between m44 and m45, so we know that an attempt was made to align them perfectly.

That never happened though, because if it had, he would not have observed the results that he did. There are many factors which would make perfect alignment extremely difficult, including: The m44 and m45 signals are sampled quantities; In practise, they are not perfectly the same width and the same shape; The m45 peak is 100's of times smaller in magnitude than the m44 peak and more susceptible to chemical and electrical noise.

So Brenna's software tried but failed to achieve PERFECT alignment.

Older software may not even try to compensate for the m44-m45 time difference. Newer software may try but fail, resulting in a small finite misalignment either one way or the other. Consider this: To achieve perfect alignment and avoid errors when peaks overlap, you need perfect peaks.

At the end of the day, whether your software compensates or not, it looks like you're going to be left with an error in the alignment of m44 and m45. We don't know whether the OS2 software used for Floyd's Stage 17 compensates or not. We don't know if it behaves like Brenna predicted or whether it behaves like Meier-Augenstein predicted. After we've explained the results, we'll tackle that question.



Before we get into what can go wrong, lets see what is required for accurate CIR measurement. Figure 8 shows a m44 peak, with its corresponding m45 component superimposed on it. The two peaks are perfectly aligned. The integration limits are shown as two vertical lines. Remember, m44 and m45 exist as two separate signals, and we need to show both to explain what’s happening. The software needs to calculate the area under the m45 peak, calculate the area under the m44 peak and divide the m45 area by the m44 area. The result is the CIR for this substance.

Let’s say the m45 signal leads (occurs before) or lags (occurs after) the m44 peak by some small amount. As long as there’s no other peak interfering with our peak, that’s not a problem. We just position our integration limits so that they encompass both peaks and we’re good to go


Figure 8. m44 and m45 perfectly aligned. No overlap




Right, lets introduce another peak to the left of our first peak. Figure 9 shows two peaks of the same CIR overlapping by some arbitrary amount. We use the same method as Brenna to set up our integration limits. Lets examine what’s happening with Pk2 . We can see that our integration limits exclude the left hand tail of both the m44 and m45 peaks. That means that we’ve lost those areas from our calculation. That’s not a problem, because the m44 and m45 are perfectly aligned, so we lose the same relative proportions of each.

We’re also including the right hand tail of Pk1. Again that’s not a problem because in this case, both peaks have the same CIR so the contribution from Pk1 has the same relative proportions of m44 and m45 as Pk2 (it wouldn’t matter if Pk1 was much bigger than Pk2, the previous statement holds true).


Figure 9 m44 and m45 perfectly aligned. Overlapping peaks

So with perfectly aligned m44 and m45, it won’t matter how much those peaks overlap, because they have the same CIR.




Figure 10 shows our m45 leading the m44 by some amount (we haven’t labelled the peaks again as they are clearly identifiable from Fig.9). We’ve had to exaggerate this so that you can see what’s happening. What you’re seeing is m45 leading m44 by an enormously large 1 second. [To put that in context, if Figure 4 had used a 1 second lead instead of a 0.1 second lead, the measured value for that peak (which has a true value of –27) would not have been –28.4, it would have been –41.0 !]

We don’t need to look too hard to see that the m44 situation remains the same. We lose the left-hand tail of Pk2, but gain an equal amount of the right hand tail of Pk1. Total area same as before. It’s all going wrong with our m45 peak though. We’ve lost more of our Pk2 m45 than in Fig.9 and we’re gaining less of Pk1’s m45. That’s upset the balance. When we calculate our areas for Pk2 now, the ratio will make it look like we have less m45 than before. That makes our d45 figure more negative and it makes it look more probable that doping was involved.


Figure 10 m45 leads m44. Overlapping peaks




Figure 11 shows the opposite situation, m45 lagging m44. Again, all’s well on the m44 front, no change again.

So what’s happening with our m45 now?

We’re losing less m45 from Pk2 than in Figure 9 and we’re gaining more of Pk1’s m45. That’s upset the balance again, but this time it’s going to increase our m45 areas. That makes our Pk2 d45 figure less negative and it makes it look less probable that doping was involved.


Figure 11 m45 lags m44. Overlapping peaks




There you have it. The mystery of the overlapping peak and why the observations of Brenna and Meier-Augensten were both valid. It depends on the software system you’re using and how it performs with regard to aligning m44 and m45.

All very interesting, but what about the 64,000 dollar question … What is the LNDD system like?

The reprocessing of the Stage 17 EDFs provides our only real insight into the characteristics of the LNDD equipment. Specifically, we’re going to look at the results obtained from the blank sample. This is useful for us because: We know it shouldn’t test positive; It was processed in an automatic mode by the software; It was also processed in a manual mode by the technicians.

The F3 for the blank showed a degree of overlap between the 5bP peak and the 5aP peak (not to the extent that Floyd’s overlapped, but it was there).

In auto mode, the 5aP for blank came out as delta –3.65 (“delta” refer to the numerical difference between the d45 value for our substance under investigation and the d45 value of a reference substance generated by the athlete which is known not to be effected by the doping in question). The threshold for a doping positive is delta –3. Anything more negative is considered indicative of doping, however the uncertainty that LNDD claim is +/- 0.8 so more negative than delta -3 is suspicious, but delta –3.8 is the absolute threshold. As you can see, the known clean blank escaped testing positive only because it was within the uncertainty claimed by LNDD. That was when processed automatically by the software, without intervention. When LNDD redid it using their manual method, they got a result of delta –1.87. It is worth noting that they knew this was the blank and that it should not test positive.

So what do these results tell us?

Suppose the LNDD system behaved like the one Brenna used in his research and made the measured d45 appear less negative than it really is. That would imply two things. Firstly, the blank sample was a "positive", and really more negative than the measured delta –3.65. Secondly the LNDD manual method made the result worse, not better – they moved it in the wrong direction, away from the true value. (It also provides a stunning example of the influence of manual methods to achieve desired or expected results.)

This argues that we can’t accept that the LNDD equipment had the same characteristics as Brenna’s with respect to alignment of the m45 and m44 signals. If it did, the blank must have come from someone who had been doping with testosterone and the LNDD manual adjustment method is laughably inadequate.

Let’s consider that the LNDD system achieves perfect m45 and m44 alignment. Apart from the fact that it’s probably not achievable, it would mean that the blank, non-doping sample only just escaped a false positive. What does that say about the threshold of delta –3 ?. Also, it would mean that the LNDD manual adjustment method changed a correct result to become incorrect by more than delta –1.6. That’s way beyond their claimed uncertainty figure of +/- 0.8 and would call their competence into question.

We’re only left with one other option, and that is that the LNDD system results in the m45 leading the m44 by some finite amount, resulting in the 5aP appearing more negative than it really was. That would explain why the blank almost tested positive. That would explain why LNDD manually altered the result for the blank to make it look less negative. It would also explain why all of Floyd’s 5aP results appeared far more negative than they actually were.

Far more negative? We’ve established that the error is proportional to degree of overlap. Look at the blank F3 chromatograms. The degree of overlap between the 5bP and the 5aP is negligible, but that was enough to make the blank almost test positive. Now look at Floyd’s F3s.

Noticeably more overlap.

For those calling us idiots, let's remember that we are not talking about the actual lag between the peaks, but lags produced when the software attempts to account for the actual lag and fails to to a perfect job. Such software more or less works adequately when there are no overlaps in peaks. As we've shown, when there are minor errors in this compensation, overlapped peaks can be systematically measured with inaccurate results.

We've shown here that it is possible to reconcile Brenna's study and WM-A's theory in a way that leaves it less likely that LNDD got correct results.

Which seems like a perfect place to stop.

17 comments:

Larry said...

Alasdair -

Why apply the same integration limit to both the m44 and m45 peaks? Why not graph them separately and apply separate integration limits?

My understanding of the IRMS machine is that it DOES contain separate particle detectors for m44 and m45.

Rubber Side Down said...

From a layman's point of view, your analysis makes a whole lot of sense.

If you guys are idiots, what's that make LNDD? Amoebas?

RSD

Ali said...

Larry,

OK, the main reason is that in practise, because the m45 signal is tiny, it is subject to interference from chemical and electrical noise. This may make it difficult to identify exactly where the peak starts and stops (and we know how sensitive the results are to even tiny errors in the integration). The m44 signal, in comparison has a very good signal to noise ratio, so it's far easier to see exactly where to place your integration limits. Ergo, align the peaks using techniques such as "centroiding" and set your integration limits using the m44 peak (the short answer would have been "it's what Brenna did")

Huh ? You've been working too hard on that opus. Yes, there are separate detectors for m44 and m45 which generate two separate signals. I think you'll find I said that ? (at least I thought I said that !!!)

Mike Solberg said...

That's great stuff, Ali. Very clear.

Here is one possible relevant detail. I read somewhere that Dr. MA said that the degree of misalignment (the isotope effect) depends on the polarity of the column used for separation. If LNDD used one column, but set their software for some automatic adjustment appropriate for another column with a different polarity, that could account for some of the inaccuracy.

Hmmm...could be a relatively "innocent" explanation for the whole problem. They switched columns in order to get better separation of the metabolites, but, not understanding the big picture, failed to make all the necessary adjustments to other parts of the system. Hmmm...

Also, is there a Part III? In which you show us how things completely fall apart when there is a third peak present with unknown CIR? We know that little peak is there in between the 5bA and the 5aA from the GCMS chromatograms.

syi

Larry said...

syi -

I'd like to see the quote from Dr. MA that you're referring to. In theory, the polarity of the GC column should have no effect on the alignment or misalignment of m44 and m45 peaks, since we don't have separation of m44 and m45 until after combustion of the substance in question (after the material has passed out of the GC column). But if I'm missing something and the column really DOES affect the m44 - m45 alignment ... that becomes a very powerful argument for FL. First, it helps illustrate the relevance of the column switch, and increases the potential burden on USADA to prove that the column switch did not cause the AAF. Second, it eliminates the need to resolve the battle of the experts. If the column switch caused a greater m44 - m45 misalignment, then FL can prove his case without proving that Brenna was wrong.

So, inquiring minds want to know more about this Dr. MA quote.

Ali said...

syi,

I'd be very surprised if any compensation applied is of a fixed nature. That would be an incredibly risky thing to do. Having said that, I can't say categorically that isn't done. My understanding is that it is done on the fly, each time an integration is performed.

Larry's comment regarding the m44 and m45 being separated during combustion is correct (at least that's what I thought was the case).

Having said that, I know that you know that already ! (a forehead slapping moment for you ?)

:-)

Ali

Larry said...

Alaisdair -

In place of the part III of this idiot's guide requested by Mr. Idiot, I'd request instead part I of the idiot's guide to noise. My request is based on your statement that it's impossible to select an independent integration limit for the m45 peaks, because the signal to noise ratio for an m45 graph is too low. Hmm! If there's too much noise to allow the scientists to set an independent integration limit for the m45 peaks, that suggests that there may be too much noise to allow for other things, like the accurate measurement of the area of the m45 peaks.

So maybe noise is an important topic.

On your main argument, you must be right that Brenna and MA came to different conclusions because they were working from a different set of assumptions. Yes, Brenna probably based his conclusion on a corrected m45 peak that was in effect pushed further to the right than MA's corrected peak. (There are other possible differences in the assumptions used by Brenna and MA -- for example, perhaps they used different assumptions about the shape of the m45 peaks.)

I'm not sure why Brenna and MA would use different assumptions to reposition the m45 peak. Isn't is pretty well known exactly how far the m44 peak lags behind the m45 peak? (it's something like 150 ms, isn't it?)

If I'm following your argument (where you say, for example, that "we are not talking about the actual lag between the peaks"), then ALL of the methods used in the EDF reprocessing involved some effort to align the m45 peak with the m44 peak. That makes sense. This means that the LNDD manual processing method required the technician to first manually align the m44 and m45 peaks, then manually set the integration limit. In contrast, the automatic method used during EDF reprocessing would have automatically aligned the m44 and m45 peaks, then automatically set the integration limit.

If this is the case, then how can we say anything about whether EITHER of these two methods (manual and automatic) is like the method used by Dr. Brenna? True, the automatic processing method produced more negative results than the manual processing method, but we have two possible factors to account for this: the m45 alignment factor and the integration limit factor. Don't we have one too many "factors" to allow us to reach any conclusions?

tbv@trustbut.com said...

When you start talking about noise, you run perilously close to talking about linearity at the low end of the detection capabilities.

I don't think we want to go there in this particular series.

It's probably worth talking about the joint or independant setting of the m44 and m45 integration limits.

Our understanding, acquired after the original idiots series, was that practice in the field is to to a time adjustment of one or the other, often using "centroiding" as the mechanism. The very use of the -oid suffix should give some clue that the technique is an approximation, not exact.

Our contention is that this works fine for non-overlapped peaks, where the errors don't much matter. It's when they do overlap the errors in the algorithms start to matter. Shade one way, one result; shade the other, opposite result.

This could be one of the things that counted in the "explain what the machine does for integration" discovery questions that could not or would not be answered.

What is important is that it is a systematic, systemic, likely to be consistent error that appears not to be accounted for -- and which is likely to apply to all of the overlapping 5bA and 5aA peaks that LNDD tests, including the blanks, and all the Landis samples. Depending on the overlap, the results may skew more.

It raises questions again about the much higher rate of testosterone positives found by LNDD than other labs.

TBV

Larry said...

TBV -

Maybe someday we'll talk about noise. I'm probably not up for a discussion about linearity. I just want to know what the noise IS. Are we looking at the echo from the big bang? Is this just a low level stream of undifferentiated stuff that didn't get filtered out by the sample preparation or clumped into peaks by the GC?

But we won't talk about noise today.

The point I was trying to make is that the CIR methods would all involve centroiding followed by setting integration limits ("integroiding"?). Both centroiding and integroiding introduce the possibility for error. If you're looking at a CIR result that's too negative, you can't necessarily say it was caused by faulty centroiding. If you're comparing the LNDD results to the results you see in Dr. Brenna's paper, you can't necessarily say that Brenna centroided to the right and LNDD centroided to the left, unless you know that they both used identical integroiding.

So while I see the potential for error, I don't see it pointing in a particular direction. Yet.

Mike Solberg said...

Come on guys, I don't just make this stuff up! You don't trust me? Me, Mr. Idiot?

here it is

Search "polarity" and go to page 168.

It's got to do with Van der Waal's dispersion forces. Duh.

I think I read the full article months ago at my local med school, but, in any case, this is enough to see the comment.

syi

Larry said...

Wow. I don't know whether to be more impressed with the point you're making, or with the fact that you made it all the way to page 168!

syi, I think this could be a very, very important point.

Mike Solberg said...

Well I didn't read the whole book. Just WMA's article/chapter 8. Funny how little stuff sticks in your brain.

syi

Ali said...

syi,

Excuse me while I slap my own forehead !

Ali said...

Larry,

With regard to: "The point I was trying to make is that the CIR methods would all involve centroiding followed by setting integration limits ("integroiding"?). Both centroiding and integroiding introduce the possibility for error. If you're looking at a CIR result that's too negative, you can't necessarily say it was caused by faulty centroiding. If you're comparing the LNDD results to the results you see in Dr. Brenna's paper, you can't necessarily say that Brenna centroided to the right and LNDD centroided to the left, unless you know that they both used identical integroiding."

Let's go back to Brenna and forget about LNDD for the moment. Part 2 showed (perhaps not explicityly) that when you have perfectly aligned (and perfect) m44 and m45 peaks, it doesn't matter a damn where you place your integration limits. If you chop off part of your peak, you're losing m45 and m44 in equal proportions - you havn't changed the ratio. If you include part of another peak in your integration interval and that peak happens to have the same CIR as the peak you want to integrate (a la Brenna), that doesn't matter either, because the m44 and m45 you include from the other peak is in the same proportions as the peak you're integrating - you havn't changed the ratio.

So, when Brenna observes a systematic error, integrating overlapping peaks of identical CIR, I can say catagorically that, for whatever reason, his m44 and m45 were not perfectly aligned. I can also say that m45 lagged m44, because that's the only way you can get a systematic an error which makes your peak look less negative than it really is. I think if you study the diagrams in Part 2, you'll see what I'm talking about.

He was quite explicit in the methodology used to set the integration limits (although I read that as him describing what the software did, not what he did). With just two peaks overlapping (and say you're interested in the right-hand peak) you choose the minima at the intersection of the two peaks for the left hand limit and somewhere else which includes the rest of your peak for the right-hand limit (see Part 2). To be frank, where else would you place your limits ?.

Now to the centroiding factor. My understanding is that if this is done, it is done by the software, not the operator. In manual mode, they may be able to set the integration limits, but they don't align the m44 and m45 peaks. That's done by the software. It would be virtually impossible for that to be done accurately by eye.

Now back to LNDD. We don't know if the OS2 software used to analyse the Stage 17 results even compensates for the m44-m45 time difference. It's so old, nobody seems 100% sure what it does. Maybe it does and maybe it doesn't. If we knew for a fact that it didn't, we'd know for a fact that it will bias results in a more negative direction when you have overlapping peaks (as WM-A testified). If we knew for a fact that it did, then any misalignment between m44 and m45, from whatever source, would be altered by the software in an attempt to correct it. In Part 2, we made no assumption either one way or the other. We tried to deduce from what happened with the blank during the reprocessing of the Stage 17 EDFs.

PS: I'm pretty sure "integroiding" is not a word I've ever heard of :-)

wschart said...

Larry:

RE "integroiding": once the limits of integration are set, by whatever method, the integration itself is a rather straightforward, if perhaps complicated, mathematical calculation. Any degree of inexactitude (if that a new word?) comes from the estimation of the limits, the alignment, and any other "fiddling" is done with the raw data;

Larry said...

Alaisdair and wschart -

Alaisdair, you said: when you have perfectly aligned (and perfect) m44 and m45 peaks, it doesn't matter a damn where you place your integration limits. Yes, I realize that. You might then test centroiding by moving around the integration limits to see if it affects the CIR results. If it does, then you haven't achieved perfect centroiding of perfect peaks.

But you're not going to have perfect peaks. I think that's where this gets complicated. Say that the m44 peak is a perfect peak and the m45 is not perfect. Say that the m45 is "front loaded" -- the slope going up is steep and the slope going down is more gradual, sort of like the left-most two strokes in a capital "N". In such a case, it matters a lot where you put your integration limits, because if you put your beginning limit a little too far to the right, you're going to cut off proportionally more m45 than m44.

As far as the methodology you use to set integration limits ... you spent a whole month of your life convincing me that this is not an easy thing to do! Again, if we have two overlapping perfect same-sized m44 peaks, then sure, you'd put the integration limit at the lowest point where the two curves intersect -- as you said, where else would you put it? I'm not so sure we could be this confident if we were dealing with two imperfect overlapping peaks of different sizes. Imagine for the moment that we have two m44 peaks both shaped like the left-most two strokes of a capital "N". Make them the same size, and draw them so that they overlap. Then draw your integration limit at the low point of the intersection. If you do this, you'll end up with a lot of the left-most peak to the right of your integration limit, and very little of the right-most peak to the left of your integration limit. That's probably not a good thing.

I think you hit the same problem with perfectly shaped peaks of different sizes. These perfectly shaped peaks have bell-shaped curves, meaning that their slopes change as you move from beginning to end of the curve. The steepest slope on the curve will take place around the half-way point between the curve beginning and where the curve reaches its top. A small curve will presumably have a smaller RT to reach this steep point than a larger curve. I'm no geometrist, but it would seem to me that at the point of intersection between a perfectly shaped large curve and a perfectly shaped small curve, the slope of the small curve will be relatively steep and the slope of the large curve will be relatively flat. So if you drew your integration limit at the low point of the intersection of these two curves, you'd have the same situation I described above with curves shaped like the first two strokes of a capital "N" - you'll have a lot of the left-most peak to the right of your integration limit, and very little of the right-most peak to the left of your integration limit. That's probably not a good thing either.

And I'm not even considering most of the problems you discussed back in November, like noise levels and sloping baselines.

Now, I don't know whether any of this affects where a lab would set its integration limits. Maybe there isn't any other place to put an integration limit except at the lowest point of intersection, at which point you close your eyes hold your nose and hope for the best. But it still seems to me that there is some potential for error introduced here, and that this potential for error might be as significant as the error introduced by poor centroiding.

On to centroiding. My guess is that either the LNDD software does not compensate for the m44-m45 time difference, or that it does so in a relatively simple way (for example, by setting the m45 integration points 150 ms to the left of the m44 integration points set by the operator). The LNDD software is close to 20 years old, and I doubt it could do very much more than what I'm describing. Consider that the LNDD's IRMS was so old, they could back up its hard drive once a year on a single CD-ROM. That doesn't suggest a lot of graphical processing ability. I'm also guessing that when the EDFs were reprocessed using newer software, the newer software included a newer (hopefully better) method for centroiding. So I don't assume that the manual and automatic processing of the EDFs used identical centroiding.

If the old LNDD software did not do any centroiding, then all other things being equal, it should have produced more negative results than the automatic processing (which we presume DID employ centroiding). Of course, we saw the opposite result. It's difficult for me to interpret this finding the same way you have, but also, it's difficult for me to interpret this finding in a way that makes any sense whatsoever. My assumption is that the automatic processing would do a better job of centroiding than the OS2 software (which may not have even PERFORMED any centroiding). So why, then, did the automatic processing appear to have reached results that appear to be too negative? Maybe the LNDD technicians compensated for the poor centroiding performed by the OS2 software by setting the integration limits to make results less negative (and since we're dealing with imperfect peaks of different sizes, I'm assuming that it may be possible to do this) ... but that still doesn't explain why the automatic processing results seem so far off.

Perhaps the answer lies with Mr. Idiot's recent discovery of a connection between column polarity and m44-m45 offset?

Ali said...

Larry,

You're confusing me with your references to "automatic" and "OS2" as though they are different things. The OS2 can operate in either manual or automatic modes. On top of that, we also have the Masslynx results.

Yes, if you have some m44-m45 misalignment, placement of the integration limits is important with regard to how big the error is, but I'm just trying to establish that you get an error with m45-m44 misalignment and its direction depends on which leads which.

I'm trying to establish that Brenna's and Meier-Augebstein's predictions on direction of error are both equally possible and it depends on the system you're using.

Let's see if we can agree on that before complicating the matter with less simple scenarios than the ones used in my examples.