Explanation:
If evidence of
dependence between two or more items is suspected, this feature can be
investigated by combining these items to produce a single, more extended item,
called a subtest in RUMM [sometimes referred to as a testlet in other
discussions].
If two or more dependent
items are summed, then the subtest produced will be a single polytomous item
whose maximum score is the sum of the maximum scores of the individual items
involved. In this case, the interpretation of the threshold estimates is
different from those associated with a typical polytomous item composed of
ordered categories.
The structure of the
latter, where the categories are intended to be ordered, demands that
thresholds which define boundaries between the categories, are also ordered. On
the other hand, when subtests are formed from a set of dependent items, there
is no reason for the thresholds to be ordered. Indeed, the more local
dependence accounted for by the subtest, the more the thresholds will be
disordered.
This effect follows
because the more dependent the items within a subtest, the more the scores of
the subtest are extreme scores, that is, closer to 0 and the maximum on the
subtest, for any person location. Therefore, given a person location, the
probability of a response in the middle categories is less than it would be
with independence, and to produce these lower probabilities in the middle categories, the threshold estimates are closer together than under
independence, and indeed may be reversed.
At the same time, it is
important to note that the difference in difficulties
of the items of the subtest will trade off with their local dependence. As the
variance of the difficulty of the component items within a subtest gets
greater, so the thresholds of the subtests get further apart. In the case when
there is little local dependence and reasonable differences in difficulty, then
the effects trade off and the thresholds will be ordered.
If there is local
dependence among the items placed in subtests, then the subtest analysis will
generally show better fit than the original analysis. This is in part because
responses dependence has been taken into account and absorbed into the
thresholds, and in part because the reliability (person separation) with the
subtests will be reduced resulting in loss of relative power in the test of
fit.
References:
Andrich, D.
(1985). A latent trait model for items with response dependencies:
Implications for test construction and analysis. In S. Embretson (Ed.), Test
design: Contributions from psychology, education and psychometrics.
Academic Press, New York. (Chapter 9, pp. 245-273.)
Andrich, D.
(2005) Rasch models for ordered response categories. In B. Everitt
& D. Howell (eds.) Encyclopedia of Statistics in Behavioral Science.
New York: John Wiley & Sons. Volume 4,
pp. 1698-1707.