RUMM2030 will not read in all data correctly.


  1. Check that each record in the data file is in fixed text format.
  2. Ensure NO tabs are present. This can occur if saving from a spreadsheet.
RUMM2030 will not read in the missing character correctly.


  • This can occur with 2-character responses AND when the missing data character is NOT a blank space.
  • Make sure you include the missing data character in ALL columns of a response field.

RUMM2030 does not allow mixed characters for a response field. For example, if the missing character is 9 and there are two columns per response field then the 9 must appear twice, that is, as 99.

Thus, if four items have this set of responses: 12, 7, <missing data>,14 then the correct data entry would be: 12 79914.

NOTE: RUMM2030 allows for a single character type only. It will search for the correct number of such characters according to the number of characters specified for each response.

RUMM2030 will not read all data across each record.


  • This arises with incomplete data sets only.
  • The first record MUST have a character in at least the last column.
  • If the last column is missing data AND the missing character is a space then place some character (e.g., *, #) in the next column, i.e., the first column after the end of the record requirements.


Why is the spread of item estimates in RUMM2030 smaller than with other Rasch Applications?

As the algorithm in RUMM2030 is the Pairwise Conditional, it would be expected that this spread would not be as large as for algorithms using what is essentially the unconditional or joint method. The level of precision with respect to the convergence criterion will also influence the spread.

If the convergence criterion is made more precise by changing from, say, 0.01, to 0.001 or 0.0005, then the estimates in RUMM2030 will spread out a bit more.

What is the meaning of terms spread, skewness and kurtosis as appearing in RUMM2030 displays?


With at least three ordered categories it is possible to construct a second principal component (Andrich, 1985). This component is identified in RUMM2030 as spread and:

  • is the half distance between the thresholds when the threshold distances are taken to be equal.
  • has a category coefficient that is quadratic in the successive integer category counter values.
  • characterises the unit of measurement for the scale under construction.

With at least four ordered categories it is possible to construct a third principal component (Andrich, 1985). This component is identified in RUMM2030 as skewness and:

  • identifies any deviation from an equidistance between successive thresholds.
  • has a category coefficient that is cubic in the successive integer category counter values.
  • characterises the skewness of the thresholds.

A fourth principal component can be derived (Pedler, 1987) if at least five ordered categories are present. This component is identified in RUMM2030 as kurtosis and:

  • has a category coefficient that is quartic in the successive integer category counter values.
  • characterises the kurtosis of the thresholds.

Pedler, P. (1987). Accounting for psychometric dependence with a class of latent trait models. Unpublished Ph.D. thesis, Department of Education, The University of Western Australia.

With the estimation of the principal component parameters, the category coefficients are readily constructed depending on the number of components, and equally readily, the thresholds can be calculated.

How does RUMM2030 handle persons with extreme scores?


There are two important points regarding extreme scores with Rasch Measurement Model analyses using RUMM2030:

  1. As part of any analysis proper in which item and person estimates are determined, RUMM2030 identifies extreme persons. These are persons with zero scores or 'perfect' scores on the items attempted . These persons are retained throughout all analyses and included in many displays where an option to include or exclude such extreme persons is provided.
  2. Extreme persons are provided with an estimate of their location parameter. These estimates are derived from a geometric mean algorithm which uses, respectively, the three highest person location estimates [for the items attempted] and the three lowest person location estimates [for the zero score].
  3. For any set of items, each total score has a finite estimate, except for a score of zero or the maximum score M.
    The extrapolated score is based on the three finite scores: M - 1, M - 2, and M - 3
    [for an extreme score of M], and 1, 2 and 3 [for an extreme minimum score of zero].
Does RUMM2030 provide analyses for the Rating and Partial Credit Models?


When conducting a RaschMeasurement analysis with the Rasch Model, all items should be considered as polytomous. Items with 2 categories [usually called dichotomous] are treated simply as a special case of items with 3 or more categories.

Merely specify the maximum score possible for the item and RUMM2030 does the rest.

RUMM2030 nomenclature tries to avoid terms like €˜the partial credit model€™ because there is only ONE unidimensional Rasch Model for ordered categories. The situation with dichotomously scored items is just a special case. It is true it has given the impression that these are different models by calling

  • the case with all thresholds the same across the items the rating model 
  • and the case with different thresholds across items the partial credit model ™

Both the  rating and partial credit situations are only different in the parameterization and the number of parameters, and not in the structure of the response at the level of the response of a person to an item.

Can Item and Person Estimates be Imported into RUMM2030?

RUMM2030 allows for importing of item estimates through its anchor items routine and of person estimates through the creation of a New Project. These procedures are related as follows:

  1. Create a template of the item set either from scratch or by using RUMM2030.

    The latter involves first selecting the items required from the listing as displayed in Figure 2.2 or Figure 2.3 of the Displaying the RUMM2030 Analysis Manual. Then select the Anchoring option in the Create Template File for...  box and finally, click the Save Button at the bottom right.
  2. Next, create a New project using the specifications of the item set involved in the previous step. Here you import the persons directly from the data file containing the person records and end up at the Analysis Control Form ready to create an Analysis Name. See Chapters 2 to 4 of the Getting Started Manual.
  3. Create a new Analysis Name - at this stage it is the first one, of course - making sure to select the special anchor option. Where requested, import the set of item estimates in the Template File which was prepared earlier.
  4. Accept the set of anchor items in total and return to the Analysis Control Form.
  5. From here, run the analysis [using the normal RUMM2030 procedures described in Chapter 5 of the Getting Started Manual] and then display your results [as described in the Displaying the RUMM2030 Analysis Manual].

NOTE: The one thing you must be very careful of when importing items and persons is to have them checked in terms of the strict requirements imposed by the Rasch Measurement Model.

RUMM2030 does all this routinely when the normal procedures are invoked; the anchor items facility is a good way of achieving this outcome.

The persons must also be checked thoroughly before conducting a Rasch Measurement Model analysis; RUMM2030 does this in the editing and sufficient statistics stages which occur prior to the estimation routines. By using the sequence outlined above all of the necessary checks and balances are engaged. The procedure is really straight forward and does not involve much effort overall.

What parameterisation method is employed for item estimation in RUMM2030?


A reparameterised form of thresholds into their principal components is the method of estimation operationalised in RUMM2030.

This notion of principal components is used in the sense of Guttman (1950), who rearranged ordered categories into successive principal components, beginning with the usual linear one. They are analogous to the use of orthogonal polynomials in regression where the independent variable is ordered. The term does NOT refer to the common principal components analysis in which a matrix of correlation coefficients is decomposed by analogy to factor analysis.

The estimates of the principal components are obtained using items taken in pairs, and capitalise on the sufficiency property of the Rasch Model by eliminating the person parameters while estimating the item parameters. From the estimates of all principal components, the required threshold estimates are calculated readily. The method immediately accounts for missing data and readily generalises to the case of different numbers of categories for different items.


Guttman, L. (1950). The principal components of scale analysis. In S.A. Stouffer, L. Guttman, E.A. Suchman, P.F. Lazarsfeld, S.A. Star and J.A. Clausen (Eds.), Measurement and Prediction, pp.312-361.New York: Wiley.

What estimation algorithm is employed in RUMM2030?


When conducting a Rasch Measurement Model analysis, RUMM2030 uses the Pairwise Conditional Estimation procedure which generalises the equation for one pair of items in which the person parameter is eliminated to all pairs of items taken simultaneously.

Pairwise estimation is conditional estimation in the sense that the person parameters are eliminated while the item parameters are estimated. The estimates are consistent, as in:


Zwinderman, A, H. (1995). Pairwise estimation in the Rasch models. Applied Psychological Measurement, 19,4, 369--375.

What is the advantage in using the Principal Components estimation procedure?


The key property of the Principal Components estimation algorithm is that the relevant statistic for estimating each threshold is a function of the frequencies of ALL response categories rather than only a function of the frequency of the corresponding category. This property should enhance the stability and robustness of the estimates, especially when there might be relatively few cases in some categories for some items. It is expected, as in the dichotomous case, that the estimates of the principal components of the item parameters in the pairwise conditional maximum likelihood estimates are also consistent.

A key ingredient in the item estimation algorithm are the category coefficients which, along with the successive integer category counters [starting from zero, for the dichotomous case, up to one less than the number of categories], specify the nature of the coefficients of the principal components of the Rasch Measurement Model for ordered categories (Andersen, 1977; Andrich, 1978). It is the category coefficients that provide the link between the Principal Component and Threshold reparameterisations of the Rasch Measurement Model.


Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 69-81.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-574.

What is the relationship between the location and threshold estimates?


In the RUMM2030 algorithm for item estimation:

  • the item location is distinguished from the thresholds.
  • two separate constraints are used, one constrains the sum of the location estimates to zero and the other constrains the sum of the threshold estimates to be zero.
  • the threshold estimates produced by the RUMM2030 algorithm are referred to as centralised thresholds as they are mean deviated from the location estimate.
  • the set of threshold estimates used in most displays within RUMM2030, especially when mapping the item estimates onto the variable or measurement line, are referred to as uncentralised thresholds as they incorporate the location estimate and are derived by adding the location estimate to each centralised threshold respectively.
  • the mean of the set of uncentralised thresholds for an item is the location estimate for that item.

The location parameter is the first principal component of the thresholds as related to Guttman's (1950) work on principal components with ordered categories (Andrich, 1985). This parameter is always present as a minimum of two ordered categories [the dichotomous case] must be present in any analysis.

The category coefficient of the location parameter is linear in the successive integer category counter values.


Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. Brandon-Tuma (Ed.), Sociological Methodology, San Francisco, Jossey-Bass. (Chapter 2, pp. 33-80.).

What are the sufficient statistics displayed in RUMM2030?


The set of sufficient statistics for each item as displayed in RUMM2030 are derived from the respective category coefficients for each of the principal component parameters.

Model Fit

What is the consequence if persons are mis-targeted with respect to the Items?


When the location of the persons is far away from the location of the items, this is considered bad targeting. Mis-targeting (as this problem is usually labelled) is really a problem with the distribution of the persons as a whole.

The best place to start in RUMM2030 is to examine the Person/Item Distribution display to see visually how well the persons are located with respect to the item location distribution.

The Person Separation Index is a pointer to the level of mis-targeting as this value will decrease towards zero as the mismatch between the person and item distributions becomes more pronounced.

It is important to realise that the two reliability indices displayed in RUMM2030 need to be interpreted carefully. Coefficient Alpha (the traditional based index) is based on raw, or test, data and can give misleading information when mis-targeted data are present, and can often be artificially high under these circumstances. The Person Separation Index, on the other hand, is a sensitive indicator of mis-targeted data and should be given more attention than Coefficient Alpha if these two indices values are not close.

Mis-targeting can also be inferred from the Category Frequencies display.

If there is no data present in the extreme categories, that is, no persons are located there, it means that the estimates for these regions are unstable. The manifestation of this lack of data will almost certainly appear as disorder in the threshold estimates for categories within the region.

To address mis-targeted samples:

  • Re-assess your target sample to ensure that more data is present in the extreme categories;
  • Obtain more persons with low ability (as defined by the variable under review) if the lower categories are deficient or
  • Obtain more persons with high ability if the upper most categories have little or no data.
What role can Factor Analysis play in a Rasch Measurement Model analysis?


There is an important difference between carrying out a factor analysis of the responses of a person-by-item matrix compared to carrying out a factor, or principal component, analysis of the residuals of the responses arising from Rasch Measurement Model predictions.

A traditional factor analysis of the responses runs the risk of finding factors that are a function of item difficulty, and to some degree the distribution of the persons; items which are difficult and those which are easy have higher correlations amongst themselves.

In a factor, or principal component, analysis of residuals from the Rasch Measurement Model predictions, account is taken of both the item difficulty and the person locations. Conducting a factor analysis on these residuals will then reveal if there is any systematic relationship between subsets of items after minimizing the occurrence of difficulty factors.

In RUMM2030, such an analysis can be undertaken by selecting the Residual Principal Component feature from the Display Control form. This is often referred to as a Principal Component Analysis, or PCA. Loadings on the first and second factors can indicate subsets of items that might be more similar in their responses than accounted for by the model.


Refer to Chapter 3 of the Displaying the RUMM2030 Analysis manual for details.

Why are correlations between fit residuals generally negative?


An examination of the correlation matrix of residuals can also be informative. However, it is important to be aware that the expected correlation among the residuals is negative; in the case of just two items, the residual correlation would be -1.0. With a typical number of dichotomous items, say 30 or so, the expected correlations may be close to zero and, as a consequence, comparison of observed correlations with zero may not be very misleading. If the number of persons is very large, however, all observed correlations will be statistically, significantly different from 0, even when items fit the Rasch Measurement Model perfectly. Clearly, in this case, this significance is simply a function of sample size.

It is better to pay attention to positive correlations that are unusually high. This is evidence, generally, of some kind of local dependence between items. An examination of the content of the items generally indicates why responses to them might be more similar than expected under the model with independence.


  • RUMM2030 provides a new facility for assessing the magnitude of the effect of local dependence in terms of a change in difficulty of the dependent item.
  • For details on the conduct of this routine, available in the Plus Edition only, refer to Chapter 5 of the Advanced RUMM2030 manual.

NOTE: the independent item within the pair of dependent items to be investigated, may be referred to as the "Base" item in an earlier edition of this manual

How do RUMM2030 fit indices relate to OUTFIT and INTFIT statistics used in other programs?


The outfit and infit statistics used in other programs are similar to the Residual statistic in RUMM2030:

These values are differently weighted statistics based on the residual between a person's response and the expected response according to the model given the person and item estimates.

  • The Outfit statistic employed in other programs is closer in value to that display in RUMM2030.
  • All Residual statistics displayed in RUMM2030 have an expected mean of 0 and a standard deviation of 1 but, because they are approximations, the distributions are not strictly normal.
  • All of the distributions of these Residuals:
    • are affected by the relative locations of the persons and the items;
    • the number of parameters estimated, as well as
    • the fit between the data and the model.
  • In all of these Residual statistics:
    • a very negative value implies overfit (where Observations of means in successive class intervals are steeper than the ICC curve) for some reason (perhaps violation of local independence), and
    • a very large value implies underfit (where Observations of means in successive class intervals are flatter than the ICC curve) of some kind (perhaps a violation of unidimensionality).
  • The chi square test of fit formalises the graphical display of the ICC curves.
What if the persons are mis-targeted with respect to the Items?


When the location of the persons is far away from the location of the items, this is considered bad targeting. Mis-targeting, as this problem is usually labelled, is really a problem with the distribution of the persons as a whole.

The best place to start in RUMM2030 is to examine the Person/Item Distribution display to see visually how well the persons are located with respect to the item location distribution.

The Person Separation Index is a pointer to the level of mis-targeting as this value will decrease towards zero as the mis-match between the person and item distributions becomes more pronounced.

Mis-targeting can also be inferred from the Category Frequencies display.

If there is no data present in the extreme categories, i.e. no persons are located there, it means that the estimates for these regions are unstable. The manifestation of this lack of data will almost certainly appear as disorder in the threshold estimates for categories within the region.

To address mis-targeted samples:

  • Re-assess your target sample to ensure that more data is present in the extreme categories;
  • Obtain more persons with low ability [as defined by the variable under review] if the lower categories are deficient or
  • Obtain more persons with high ability if the upper most categories have little or no data.
How does RUMM2030 handle DIF issues and strategies?


When conducting a Rasch Measurement analysis, RUMM2030 carries out a DIF analysis using the following sequence:

  1. estimating the parameters for the whole sample;
  2. plotting the observed means for class intervals for each group (for example, males and females) to reveal in the empirical curves whether:
    • there is a main effect (ICCs not crossing) or
    • an interaction effect (ICCs intersecting)
  3. formalising the graphical display by
    • conducting a two way ANOVA on the residuals
    • testing an interaction effect
    • testing the main effect, or both effects


There are many ways of checking for DIF, but the key is that the item works the same way for two or more different groups, irrespective of their locations on the trait.

Because we do not have the actual locations, these generally have to be estimated. There are three basic forms of systematic DIF:

  1. when the locations of the items are different but the slopes of the observed points are parallel - this is called uniform DIF and is identified by the MAIN effect for groups in ANOVA of residuals;
  2. when the locations are the same but the slopes are different (this is non-uniform DIF) and it is detected by the INTERACTION effect in the ANOVA,
  3. both of the above.

The MAIN effect for class intervals in RUMM2030 tests for the overall fit, irrespective of the groups into which people are classified. Thus the ANOVA of residuals gives:

  • A summary of the two kinds of DIF, as well as of
  • the overall fit of each item to the model.

It is possible that there is an overall fit of an item when groups are not considered, even though DIF is observed when persons are classified into groups, and vice versa.

What limitations apply to test-of-fit statistics used in RUMM2030?


The Residual test-of-fit statistic: is constructed as a standard normalised residual, but is not perfectly normally distributed:

  • a very positive value implies poor discrimination;
  • a very negative value implies too good a discrimination.

The Chi-square test-of-fit: (and its probability) is constructed as an approximate chi square but is not perfectly distributed as the chi square.

Overall: the tests-of-fit employed by RUMM2030 for a Rasch Measurement Model analysis should be used relatively, and not strictly absolutely according to external criteria.

How does RUMM2030 calculate the slopes associated with the ICC displays?


The slope of an ICC is calculated at the location of the item simply as the first differential coefficient of the expected value curve as a function of the latent trait (beta).

The slopes of ICCs for dichotomous response items are identical.

In ordered category data:

  • Slopes of the latent responses at the thresholds are identical.
  • for the ordered category item as a whole, the slope of the expected value curve (the ICC) is a function of how close the thresholds are together.


Is threshold order dependent on category frequency in a Rasch Measurement Model analysis?


When you have an item format in which the categories are intended to reflect order, then the structure demands that ordered thresholds are relevant and central. When assessing threshold order, it is important to consider the standard errors as it is possible that the threshold estimates are unstable and so reversed because of this. They in fact should be ordered and sufficiently different from each other relative to the variable.

For such an item, it is important to realize a distinction between the distribution of frequencies from the sample and the probabilities in a particular item. It is possible to have few frequencies in a particular category and yet have the thresholds properly ordered. Reversals in an item are not simply a function of small empirical frequencies, but of the relationship between the frequencies.

The issue is not that there are few people in a category, but given all the evidence that has been put together in the estimation process, the issue is that persons who by their location estimates should be responding in a category, are not responding in it at the required rate. Thus it does not matter that a whole lot of people from a sample are not responding in the category, it is only a problem that the people who should be responding in the category are not responding in it.


What can I do if RUMM2030 advises that 'overflow' has occurred?


Overflow is the term used to indicate that a computer is unable to proceed with the processing of data. It occurs when values become so small that it is impossible for the computer to conduct normal algebraic processes due to serious lack of precision. Overflow can occur with Rasch analysis using RUMM2030 during the estimation of item and person parameters and for test of fit statistics:

  • Poor targeting can result in overflow, especially with polytomous items, when too many responses are present in one category and little or no responses in the other categories. The total, or almost complete, absence of data in many adjacent categories creates problems in estimating the Rasch thresholds. In this case, values are so small that the accumulative effect over all persons creates the situation in which the computer cannot cope with so many extreme values, and overflow results. If a large sample is also present, this problem will be magnified.
  • Because RUMM2030 can handle data with zero frequencies in some categories, the presence of overflow means that the data must be very extreme.


Andrich, D. & Luo, G. (2003). Conditional Pairwise Estimation in the Rasch Model for Ordered Response Categories using Principal Components. Journal of Applied Measurement, 4(3),

Andrich, D. & Luo, G. (2004). Estimating parameters in the Rasch model in the presence of null categories. Journal of Applied Measurement.


  • Examine your data response file carefully and especially any extended scored items. The simplest, and usually most comprehensive, method is to load the data response file into a spreadsheet like EXCEL and check the response frequencies across all items. If a heavy bias is detected in the response frequencies as described above, then re-think the targeting issue, that is, have you considered a wide enough range of locations of persons so that most categories have a non-zero frequency.
What can I do if RUMM2030 advises that one or more system files are missing.?


Windows runs on a large number of different types of special files, usually called system files. Sometimes one or more of these files become corrupted for all sorts of reasons, are shifted to a different folder unexpectedly (when using the mouse drag and drop technique, for example, a dangerous thing to do around folders with system files, and therefore not recommended) or a file simply disappears! In all such cases, an application involved with Rasch Measurement Model analysis will inevitably experience problems, including RUMM2030.


  • Because of the complexity involved with programs such as RUMM2030, the simplest solution in this case is to uninstall the program and then re-install again using the installation file downloaded at the time of your licence purchase. This ensures that all system files are correctly loaded and located on your computer.
  • It is a good rule of thumb that, when a Windows-based product experiences problems that do not appear to make sense, despite trying a number of different work arounds, the uninstall/reinstall procedure is almost always successful.


How does RUMM2030 compare with the Two- and Three-Parameter IRT Models?


The language of the two- and three- parameter models for dichotomous responses in IRT is mixed up with the general idea of parameters in models. That is, in IRT, when one says two parameter, there is an immediate association with the Birnbaum model which has a discrimination parameter and a location parameter for each item. However, one can have more than one parameter in other models.

In a Rasch Measurement analysis using the Rasch Model for dichotomous items, that is, two ordered categories, there is only the one parameter for an item, its location. In the psychometric literature concerned with psychophysical scaling, this entity is called a threshold. It is the point at which each of the response has a 50% chance of occurring.

In the model for three ordered categories, it is possible to estimate two independent parameters. These are the thresholds dividing the continuum into three categories. They can EITHER be parameterized as two separate values, normalized to all the item-threshold parameters (that is sum to zero across all item-threshold combinations) OR they can be parameterized as the average location of these thresholds, and then have each value deviate from this average and sum to zero. In this way, one parameter summarises the location of the item, and a second summarises the distance between the thresholds (that is, the spread).

With four categories entered into the model, it is possible to get the average of the thresholds, then the spread of the thresholds, and then their skewness, and so on.

RUMM2030 also makes this re-parameterization of the thresholds. Instead of modeling the thresholds directly (e.g., for a five category item: threshold1, threshold2, threshold3, threshold4) RUMM2030 estimates moments of the distribution of these thresholds. So, in this case, you have the mean of the thresholds, their variance (spread), and the higher moments skewness and kurtosis. Thus, instead of four thresholds, you estimate four moments, and then recover the thresholds from these parameters. This process is done for convenience of conditional estimation.

However, in the case of five ordered categories, it is possible to constrain the re-parameterization of the thresholds so that only the location and spread are involved. In that case, all the thresholds within an item will be equally spaced. Thus, we now have only two parameters for the item, when in fact it is possible to have more.

What is the role of Subtests in a Rasch Measurement Model analysis?


If evidence of dependence between two or more items is suspected, this feature can be investigated by combining these items to produce a single, more extended item, called a subtest in RUMM [sometimes referred to as a testlet in other discussions].

If two or more dependent items are summed, then the subtest produced will be a single polytomous item whose maximum score is the sum of the maximum scores of the individual items involved. In this case, the interpretation of the threshold estimates is different from those associated with a typical polytomous item composed of ordered categories.

The structure of the latter, where the categories are intended to be ordered, demands that thresholds which define boundaries between the categories, are also ordered. On the other hand, when subtests are formed from a set of dependent items, there is no reason for the thresholds to be ordered. Indeed, the more local dependence accounted for by the subtest, the more the thresholds will be disordered.

This effect follows because the more dependent the items within a subtest, the more the scores of the subtest are extreme scores, that is, closer to 0 and the maximum on the subtest, for any person location. Therefore, given a person location, the probability of a response in the middle categories is less than it would be with independence, and to produce these lower probabilities in the middle categories, the threshold estimates are closer together than under independence, and indeed may be reversed.

At the same time, it is important to note that the difference in difficulties of the items of the subtest will trade off with their local dependence. As the variance of the difficulty of the component items within a subtest gets greater, so the thresholds of the subtests get further apart. In the case when there is little local dependence and reasonable differences in difficulty, then the effects trade off and the thresholds will be ordered.

If there is local dependence among the items placed in subtests, then the subtest analysis will generally show better fit than the original analysis. This is in part because responses dependence has been taken into account and absorbed into the thresholds, and in part because the reliability (person separation) with the subtests will be reduced resulting in loss of relative power in the test of fit.


Andrich, D. (1985). A latent trait model for items with response dependencies: Implications for test construction and analysis. In S. Embretson (Ed.), Test design: Contributions from psychology, education and psychometrics. Academic Press, New York. (Chapter 9, pp. 245-273.)

Andrich, D. (2005) Rasch models for ordered response categories. In B. Everitt & D. Howell (eds.) Encyclopedia of Statistics in Behavioral Science. New York: John Wiley & Sons. Volume 4,
pp. 1698-1707.