Throughout our look at chromatograms and parts of chromatograms, we've been counting things that look like they might be problems in the data set. We are not saying they are problems, we're saying they are things that may cause concern. The higher the number, the more careful we want to be about interpreting the data.
Adding up all the numbers we get an aggregate we'll call the Idiot's Insecurity Index (I3), pronounced, "aye-yi-yi". It consists of:
- The Number of peaks
- Rating of the baseline slope on a 1 to 5 scale.
- Rating of bumpiness of the background on a 1 to 5 scale.
- On the peaks of interest, the total count of shoulders, leading or trailing edges, connections to neighbors above the baseline, and the number of neighbors within one peak-width at the baseline.
Let's look at two trivial examples, one from UCLA, and one from the LNDD and score them.
UCLA: 3 peaks, flat slope = 1, no bumps = 1, total shoulders = 0, total edges = 0, total connections = 0, neighbors in one peak = 0, total = 5.
LNDD: 3 peaks, flat = 1, no bumps = 1, shoulders 0, edges = 3, connections = 0, neighbors, charitably = 0, total = 8. Given a choice, it might be better for them to be spaced a little further apart, and the trails on the pulses might indicate a problem somewhere in the system, we think.
So given the I3 and the scores we made all along, where are we with our look at the data in previous parts? We could make you go to another post or page, like a computer hardware review sites, but we'll be nice:
|Test ||Pks ||Slp ||Bmp||Shldr ||Edg||Conn ||Nbrs ||I3 |
|UCLA ||12 ||1 ||1||2 2 0 ||0 1 0||0 0 0 ||0 0 0 ||19 |
|Ex 92 3-Jul ||36 ||3 ||2||2 1 2 ||0 1 0||2 2 0 ||3 1 1 ||56 |
|Ex 88 13-Jul ||33 ||5 ||2||2 2 1 ||0 0 0||2 2 0 ||3 3 1 ||56 |
|Ex 90 14-Jul ||36 ||5 ||1||2 1 1 ||0 0 0||2 2 0 ||2 2 1 ||55 |
|Ex 86 ||47 ||5 ||3||2 2 1 ||0 0 0||2 2 0 ||2 2 1 ||69 |
|USADA 173 ||29 ||3 ||1||2 2 2 ||0 1 0||0 2 0 ||2 2 1 ||47 |
|USADA 349 ||27 ||3 ||1||1 2 1 ||0 1 0||2 2 0 ||2 2 1 ||44 |
|Ex 87 22-Jul ||39 ||2 ||2||1 1 0 ||0 1 0||1 2 1 ||1 2 3 ||56 |
|Ex 84 23-Jul ||32 ||1 ||3||1 1 0 ||0 1 0||2 2 1 ||2 2 3 ||51 |
|Ex 93 control ||22 ||1 ||2||1 1 1 ||1 1 0||0 2 1 ||0 2 3 ||38 |
|Ex 85 control ||35 ||2 ||3||1 1 1 ||1 2 0||0 2 1 ||1 2 2 ||54 |
|Ex 89 control ||36 ||2 ||2||1 1 1 ||1 1 0||1 2 1 ||1 4 3 ||57 |
|Shack fig3a ||17 ||1 ||1||0 1 1 ||0 1 1||0 2 0 ||0 4 0 ||29 |
|Shack fig3b ||19 ||2 ||1||0 0 2 ||0 1 0||0 2 0 ||0 1 0 ||27 |
This is bogus, you say. However, it follows thinking used in Software Engineering in measures such as the McCabe complexity, or the Halstead volume, or arguably function points. A chromatogram with an I3 of two is like a software subroutine that does nothing: Useless, but absolutely correct. On the other hand, one with an I3 of 200 is like a 10 page software function with a McCabe of 2000 -- it might appear to be correct, but how do you really know without looking very closely indeed?
Bigger numbers mean more stuff.
More stuff means more opportunity for error. The more stuff you have, the more careful you need to be about checking assumptions and pre-conditions.
In an earlier post, M has made comments suggesting it is unfair or improper to make some of the comparisons made here. We disagree; as shown above, the methodology is perfectly applicable to a straight line background or a series of reference pulses. It is a measure of the potential for problems, not an assertion there are problems.
M also suggested one reason it was unfair was that chemistry in the F3 fraction was more difficult than that of the F2 fraction. It is true the F3 chemistry is more difficult, and appreciate that admission from M. It raises the very question we'd like to ask.
How do you tell if the chemistry does the job properly?
One indicator is to look at the I3 of the resulting chromatograms.
Thanks to M's diligence, we found the Shackleton chromatograms that also reveal the 5bA and 5aA, so we do have fair, like-to-like comparisons. They appear to be much cleaner by I3 score than those produced by LNDD.
What did Shackleton do that LNDD didn't? This bears investigation.
When we started this series, we said that the preconditions for correctness in the integration that computes the numbers in a CIR result are:
- Clean, unambiguous baselines suggesting good chemical separation of the prepared samples. This is reflected in the count of the peaks in the chromatogram. Good chemistry give fewer peaks to be concerned about, and fewer unknowns floating about.
- Significant (a debatable term) baseline (chromatographic) separation of peaks. We've demonstrated that co-elutes can cause unexpected skews of significant magnitude.
- Absence of shoulders suggesting unidentified peaks. Where there are shoulders or tails, there may be unidentified co-elutes.
- Measurement of nearby peaks to consider their potential for influence. We may back away from this thought, but it seem like you ought to know the CIR of every adjacent peak in case it is co-eluting in some way.
Maybe LNDD's chemistry isn't separating as well as it ought to, and needs to.
If there is lots of stuff around, it is going to go somewhere. A high I3 score makes it prudent to be sure the peaks being measured contain only what they are purported to contain.
As we demonstrated in "Integration for Idiots", presence of unexpected material can invalidate any reported numeric results.
We are thus left with some questions to seed further discussion.
- What does UCLA do to ensure purity of peaks?
- What did Shackleton do to ensure purity of peaks?
- What did LNDD do to ensure purity of peaks?
Feel free to chop apart individual assessments and argue whether certain pixels represent particular flaws, and whether they have particular numeric significance in a particular test. This doesn't much interest us. At a scientific level, either the protocol is flawed and there is flawed data being processed and reported, or it is good data and good results. At the moment, indications such as the I3 suggest the data may not be good. A good process will be able to demonstrate the data is good.
We have said for a long time, if we can get confirmation the data is clean and pure, we're prepared to accept the numeric conclusions at a scientific level.
If there is no validation the data is clean and pure enough to trust, there is a different, legal question whether the reported results are correct.