Clinical Epidemiology & Evidence-Based Medicine
Glossary:
Terminology Specific to Clinical Testing
Updated
November 02, 2010
Clinical Testing
- Test: A test is anything that produces evidence from a patient
at any stage in the clinical process, based on which a different clinical course will be
taken depending on the different possible test outcomes (positive or negative, normal or
abnormal, present or absent, high or low, ...). From the lay perspective, a laboratory
test performed on a specimen (feces, urine, blood, CSF, biopsy, ...) from a patient. From
the clinical epidemiology perspective, the following are examples of a "test":
history taking (presence or absence of a component), clinical exam results (presence or
absence of a sign), imaging findings (presence or absence of a feature on a radiograph),
or response to therapy (as anticipated or not). Few if any tests in medicine are perfect;
that is, produce results that can always be interpreted with absolute certainty on every
patient to which the test is applied. The performance of tests can be compared objectively
(e.g., two clinicians (the "tests") can be compared in their ability to detect a
particular clinical sign in each of a group of patients).
- Sensitive Test: In a diagnostic sense, a higher proportion of the individuals with
the disease will test positive than with a less sensitive test.
- Specific Test: In a diagnostic sense, a higher proportion of the individuals without
the disease will test negative than with a less specific test.
- Screening Test: A test applied to individuals without observed signs of disease and
in which differential diagnoses of the disease of interest or clinically similar diseases
have not been established. That is, the population being tested is comprised predominately
of normal individuals that have not been identified as possibly having a clinical case of
the disease. Thus, the probability that such an individual has the disease is the
prevalence of the disease in the population being screened. Because the disease
manifestations are likely minimal in affected individuals, the spectrum of disease is
generally less severe in a screening than in a diagnostic setting.
- Diagnostic Test: A test used in the clinical environment on individuals with
clinical signs or other clinical information consistent with the presence of the
condition. The presence of disease has been recognized and the disease of interest is one
of the differential diagnoses. This fact raises the expected prevalence (the
clinicians estimate of the probability that the individual has the disease based on
what the clinician knows to that point) prior to performing the test and thus changes the
test performance considerably compared to the situation when the same test is used as a
screening test (the probability that a randomly selected individual has the disease is the
prevalence of the disease in that population). Affected individuals are more likely to
have more prominent disease manifestations in the clinical setting, meaning that the
spectrum of disease is generally more severe for in a diagnostic than in a screening
setting.
- "Gold" Standard Reference (Definitive) Test: The tests and procedures
necessary to definitively establish to a high level of certainty the presence or absence
of the disease in an individual. The reference standard usually requires death (necropsy
examination) or is too expensive, too risky, or too slow to be used regularly in the
clinical setting. For inevitably progressive, chronic conditions, the "gold
standard" may be prolonged follow-up. Note that the standard test, the current most
widely accepted test used day-to-day, is often not the "gold" standard test and
using it as such may be a serious (but not uncommon) mistake.
- Accuracy: Accuracy is the degree to which, on average, a test represents the true
value (that is, is unbiased). Accuracy is insufficient for describing the performance of
medical tests and deciding when to use what because accuracy has two separate components
(see Se, Sp below) and is dependent on the prevalence of the condition for which the test
is appropriate.
- Precision: Precision is the inverse of the influence of random or chance error on a
measurement, with the less the error the greater the precision. When testing an
individual, repeating the measurements and using a summary value increases precision. When
testing individuals to learn more about a group, precision is increased by increasing the
number of individuals tested (i.e., increasing sample size) and increases as the
reciprocal of the square root of the number tested. However, precise measurements may
still deviate systematically from the true value and thus be biased or invalid, a problem
that cant be reduced by repeating the measurements on an individual or testing more
individuals in a group.
- Disease Spectrum: For sensitivity of diagnostic tests, the disease spectrum is the
range of the disease states represented by the diseased individuals (acute vs. chronic or
convalescent cases, mild vs. severe cases, clinical vs. subclinical). For specificity of
diagnostic tests, the disease spectrum is the range of the disease states in the
individuals with diseases presenting similar clinical signs but not having the disease of
interest. The two disease spectrums among the patients used to develop tests tend to be
more severe than those of the typical clinical situation, meaning that test performance in
practice is often lower than published estimates. Removing the test positives from a group
changes the two disease spectrums, which is likely to adversely affect the performance of
subsequent tests. For determining sensitivity and specificity of screening tests, the
appropriate disease spectrums are those of a cross-section of an appropriate population,
which are usually less severe than those of individuals being diagnosed in clinical
settings.
- Bayes Theorem: The mathematical relationship between the probability that an
individual has the disease before the test is run to the probability that the individual
has the disease after the test result is known. This theorem relates five different
probabilities (Se, Sp, Pvn, Pvp, and Pr) and is crucial to understanding how to optimize
the use of imperfect tests in the diagnostic process. Bayes theorem essentially
relates the certainty that the individual has a disease prior to doing the test, the two
possible test results, and the certainty that the individual has the disease after doing
the test.
- Test Performance Measures:
- False Negative (Fn): An individual that is test negative but is disease positive
(equivalent to a type II error in statistics, governed by Beta). False negatives are
undesirable test outcomes as such individuals are missclassified.
- False Positive (Fp): An individual that is test positive but is disease negative
(equivalent to a type I error in statistics, governed by alpha). False positives are
undesirable test outcomes as such individuals are missclassified.
- True Negative (Tn): An individual that is both test negative and disease negative.
- True Positive (Tp): An individual that is both test positive and disease positive.
- Diagnostic Sensitivity (Se): Given that an individual has the disease, sensitivity
is the probability (between 0 and 1.0) that the individual will test positive. For groups,
sensitivity is the proportion of diseased individuals that will test positive. Sensitivity
is mathematically equivalent to Tp / (Tp + Fn). Raising a tests Se by changing its
cutoff lowers the tests specificity. Note that although many mistakenly view this
value as fixed, Se depends on the spectrum of the target disease that is present in the
group or conceptual population that the test is being applied to.
- Analytical Sensitivity: Analytical sensitivity is the ability of a test to detect
the target analyte, such as an antibody or antigen, and usually expressed as the minimum
concentration of the analyte that can be detected. Analytical sensitivity is related to
the sensitivity above but is not a probability.
- Diagnostic Specificity (Sp): Given that an individual does not have the disease,
specificity is the probability that the individual will test negative. For groups,
specificity is the proportion of non-diseased individuals that will test negative.
Specificity is mathematically equivalent to Tn / (Tn + Fp). Raising a tests Sp by
changing its cutoff lowers the tests Se. Note that although many mistakenly view
this value as fixed, for diagnostic tests Sp depends on the disease spectrum of the
competing diseases in the conceptual population that the test is being applied to. The
competing diseases are often different for individuals in different regions and different
circumstances.
- Negative Predictive Value (Pvn): Given a negative test result (the clinicians
perspective), negative predictive value is the probability that the individual does not
have the disease. Negative predictive value is the proportion of individuals without the
disease that are correctly diagnosed. Negative predictive value is mathematically
equivalent to Tn / (Tn + Fn). Note that for a given Se and Sp, this value changes
depending on the disease prevalence estimate prior to the testing being done.
- Positive Predictive Value (Pvp): Given a positive test result (the clinicians
perspective), positive predictive value is the probability that the individual actually
has the disease. Positive predictive value is the proportion of individuals with the
disease that are correctly diagnosed. Positive predictive value is mathematically
equivalent to Tp / (Tp + Fp). Note that for a given Se and Sp, this value changes
depending on the disease prevalence estimate prior to the testing being done.
- Receiver Operator Characteristic (ROC) Curve: Plot of Sensitivity vs. (1 -
Specificity) for different test cutoff values, which is used to establish the
"best" cutoff for a test with variable parameters. The optimum cutoff depends on
the relative costs of false-positives and false-negatives.
- Apparent Prevalence (Test Prevalence): The proportion of test positives in the
population tested. Note that apparent prevalence is equivalent to disease prevalence under
most circumstances only if a perfect test (no false negatives or false positives) is used.
- Information Gain: Having done a test, the amount of information the clinician gained
about the probability that the individual has the disease. This is the difference between
the clinicians estimate of the probability that an individual has the disease before
the test is done and the probability that the individual has the disease after the test
result is known. Depending on the tests Se and Sp values, on the pre-test
probability, and in a relationship that is defined by Bayes Theorem, the information
gain from a positive test is usually different from the information gain from a negative
test.
- "Rule-in" Test: A rule-in test has a large information gain when it is positive, which
means that the clinician keeps the differential on the list. In a diagnostic situation,
tests with very high specificity are generally rule-in tests.
- "Rule-out" Test: A rule-out test has a large information gain when it is negative,
which means the clinician removes the differential from the list. In a diagnostic
situation, tests with very high sensitivity are generally rule-out tests.
- Negative Likelihood Ratio: The number of times more likely that a negative test
comes from an individual with the disease rather than from an individual without the
disease. Equivalent to (1 - Se) / Sp.
- Positive Likelihood Ratio: The number of times more likely that a positive test
comes from an individual with the disease rather than from an individual without the
disease. Equivalent to Se / (1 - Sp).
- Reproducibility (Repeatability, Consistency): The degree to which a test yields the
same results when repeated under identical conditions on identical specimens.
- Reliability: How good is a procedure when applied by different users. The degree to
which different clinicians (observers) applying the procedure classify diseased
individuals into the same diagnostic, prognostic or treatment categories.
- Cohens Kappa (k ): Cohens Kappa is a measure
for comparing two tests. It is a summary measure, ranging between -1 and +1, of the level
of agreement beyond chance when two tests (or observers) are classifying the same set of
specimens into two or more exclusive categories (e.g., infected, not infected or normal,
mild, moderate, severe, critical) with 0 being no agreement beyond that expected by
chance, 1 being complete agreement, and -1 being contrary to agreement. This measure is
used when two imperfect tests (or observers) are being compared rather one test (or
observer) being compared with the "Gold" or definitive standard. When the
classification categories are ordered and more than three (e.g., normal, mild, severe,
critical), Cohens Kappa often underestimates the degree of actual agreement and a
weighted Kappa or other statistic captures it better. Many components of the clinical
examination range between 0.4 and 0.7, which is the source of many differences between
clinicians.
- Intra-Rater Agreement: A measure of repeatability. The level of agreement beyond
chance, typically quantified by Cohens Kappa, that a test or observer has with
itself or themselves ("intra" = within) when repeated on the same set of
materials. For example, the agreement that an observer such as radiologist has with
themselves when they unknowingly (blindly) repeat reading the same films.
- Inter-Rater Agreement: A measure of reliability. The level of agreement beyond
chance, typically quantified by Cohens Kappa, that two different tests (observers)
have ("inter" = between) when performed on the same materials. For example, the
agreement that two observers such as two radiologists have between them when they
unknowingly (blindly) read the same films or that two different diagnostic tests have when
they are performed on the same specimens.
[Return
to Top of Section] [Return to Glossary Contents List]