Classical Test Theory

Classical Test Theory

A test is a scientific instrument To the extent that it measures what it intends, that is, it is valid, and it measures well, that is, it is necessary or reliable. If we find an instrument that we cannot trust of the measures they provide, since they vary from once to another when we measure the same object then we will say that it is not reliable. An instrument, to measure correctly Something, it must be precise, because if not, measure what measure, it will measure it badly. Therefore, being necessary is a necessary but not sufficient condition. In addition, it must be valid, that is, what measures with precision will be what is intended to measure, and nothing else.

You may also be interested: item response theory - applications and tests

Reliability:

Absolute and relative reliability: We can address the problem of reliability of a test in two different ways, although in the coincidental background.

Reliability as the inaccuracy of its measures: when a subject responds to a test obtains an empirical score, which is affected by an error. If there was no mistake, the subject would get his true score. The test is inaccurate because the empirical score does not match the authentic true score. This difference between both scores is the sample error, the measurement error. He Typical measurement error be The typical deviation of measurement errors. He Typical measurement error Indicates the absolute precision of the test, since it allows estimating the difference between the measure obtained and the one that would be obtained if there was no error.

Reliability as the stability of the measures: a test will be more reliable the more constant or stable the results they provide when repeated are maintained. The more stable the results are twice, the greater the correlation between them. This correlation is called reliability coefficient. This expresses us, not the amount of error, but the coherence of the test itself and the proof of the information it offers. He reliability coefficient expresses the relative reliability of the test.

The reliability coefficient and the reliability index: - The reliability coefficient of a test is the correlation of the test itself, obtained for example, in two parallel forms: Rxx. - The precision index It is the correlation between the empirical scores of a test and its true scores: RXV The precision index will always be greater than the reliability coefficient to find out the reliability coefficient are to highlight these three classic methods:

  • Finding the correlation between the test and its repetition: the repetition method or test-retest method: it consists in applying the same test to the same group twice and the correlation between the two series of scores is calculated. This correlation is the reliability coefficient. This method usually gives a higher reliability coefficient than those obtained by other procedures, and can be contaminated by the disturbing factors.
  • Find the correlation between two parallel forms of the test: the method of parallel forms: two parallel forms of the same test are prepared, that is, two equivalent forms that give the same information, and apply to the same group of subjects. The correlation between the two forms is the reliability coefficient. With this method, the same test is not repeated, the disturbing sources of the re-test reliability are avoided.
  • Find the correlation between two parallel halves of the test: the method of the two halves: the test is divided into two equivalent halves and the correlation between them is found. It is the preferable method, since it is simple and avoids the limitations of the previous procedures. You can choose the odd elements of the test, to constitute one half, and the even elements to constitute the other.

The reliability coefficient and the correlation between parallel tests

He reliability coefficient of a test indicates the proportion that the true variance is of the empirical variance: Graph33 The reliability coefficient of a test varies between 0 and 1 . For example: If the correlation between two parallel tests is rxx´ = 0'80, it means that 80% of the variance of the test is due to the authentic measure, and the rest, that is, 20% of the variance of the test is due to error. He reliability index of a test is the correlation between its empirical scores and its true scores index reliability = the reliability index is equal to the square root of the reliability coefficient

Once two parallel forms of a test have been elaborated, the variance analysis procedure to verify the homogeneity of the variances and the difference between the measures is applied. If the variances are homogeneous, the difference between the stockings is not significant and the two forms are built with the same number of elements of the same type and psychological content, it can be affirmed that they are parallel. If not, you have to reform them until they are. The absence of reliability is identified with the value rxx´ = 0 4.- The typical measurement error: the difference between empirical and true score is the random error, called measurement error. The typical deviation of measurement errors is called the typical error of measure. He Typical measurement error allows estimates about the absolute reliability of the test, that is, estimating how much measurement error affects a score.

Reliability and length: The length of the test refers to the number of its elements. This length depends on your reliability. If a test consists of three elements, a subject can obtain a score of 1 and another, or in a parallel form, a score of

From one occasion to another, the score has varied a point; A point over three is a variation of 33%, a high variation. If the subjects obtain casual variations of this type, the correlation of the test itself or that of the two parallel forms of the test, will be highly reduced and cannot be high. If the test is much longer, if it has, for example, 100 elements, a subject can obtain 70 points on one occasion and 67 in a parallel form. Over again it has varied 3 points; It is a relatively small variance in relation to the total test, specifically 3%. These small casual alterations of this magnitude, which occur in the subjects of the subjects, when passing in a way to the parallel, are relatively unimportant and will not decrease so much as before the correlation between the two.

The reliability coefficient will be much greater than in the previous case. The Spearman-Brown equation expresses the relationship between reliability and length. The accuracy of a test is void when the length is 0, and it increases as the length increases. Although the increase is relatively lower as the length of which is laid is greater. This means that precision grows a lot at the beginning and relatively less after. When the length tends to infinity, the reliability coefficient tends to

By increasing the length of a test, its precision increases because the true variance increases at a higher rate than the error variance. This means that the precision of the test increases because the proportion of variance due to error decreases. The Rulon formula, as well as the Flanagan and Guttman formula, are especially applicable when the reliability coefficient is calculated by the two halves. These are formulas that are used for the calculation of the reliability coefficient.

Reliability and consistency: the reliability coefficient can also be found in another way, it is the so -called alpha coefficient either Coefficient of generalizability or representativeness (Cronbach). This alpha coefficient indicates the precision with which some items measure an aspect of personality or behavior. It can be interpreted as: an estimate of the average correlation of all possible items in a certain aspect. A measure of the precision of the test based on its coherence or internal consistency (interrelation between its elements; to what extent the elements of the test are measuring all the same) and its length. Indicating the representativeness of the test, that is, the amount in which the sample of items that composes it is representative of the population of possible items of the same type and psychological content. He alpha coefficient mainly reflects two basic concepts in the precision of a test: 1. The interrelation between its elements: the extent to which everyone measures the same thing well.

The length of the test: by increasing the number of cases of a sample, and if systematic errors are eliminated, the sample better represents the population that is extracted and is more unlikely to intervene the casual error. If the items of the test are dichotomics, (yes or no, 1 or 0, agreement or disagreement, etc), the alpha coefficient equation is simplified, giving rise to the equations of Kuder-Richardson (KR20 and KR21). Given a certain number of items, a test will be all the more reliable, the more homogeneous. The alpha coefficient indicates the reliability as soon as it represents homogeneity and coherence or internal consistency of the elements of a test.

Reliability standards and criteria

According to the Item sample space model, the objective of the test is to estimate the measure that would be obtained if all the items of the sample space were used. This measure would be the true score, to which the real measures are approaching more or less. According to the degree to which a sample of items correlates with true scores, the test is more or less reliable. In this model, the correlation matrix between all the items of the sample space is central.This sample model insists more directly on internal consistency, and to the extent that it achieves it, it indirectly guarantees stability.

The linear model of the parallel tests insists more on the stability of the scores, and to the extent that it achieves stability, indirectly favors internal consistency. If we apply a test to establish individual diagnoses and forecasts, the reliability coefficient must be 0.90 up. In collective forecasts and classifications, the demand is not so.

Sometimes, in a certain kind of tests, such as personality, it is difficult to achieve coefficients of more than 0'70. If parallel forms, or parallel halves are applied, after a more or less large interval, casual errors may be more numerous than those that affect alpha coefficient. This is because what reduces the correlation are not only the random errors intrinsic to the test and on a single occasion, which are the ones that take into account the alpha coefficient, but also influence all the errors that can come from the two different situations , which can differ in numerous details. Therefore, the alpha coefficient is usually greater than the other coefficients.

With the exception of the coefficient found by repetition of the same test, since there is more likely that the random errors of the first application are repeated in the second, and instead of reducing the correlation between the two, they increase it. It must be ensured that the second application is completely independent of the first. If we achieve this, this will be the easiest and most economical and advisable method when trying to appreciate the stability of the scores, especially for long periods of time and with complex tests. > Next: Validity of the tests

This article is merely informative, in psychology-online we have no power to make a diagnosis or recommend a treatment. We invite you to go to a psychologist to treat your particular case.

If you want to read more articles similar to Classical Test Theory, We recommend that you enter our category of experimental psychology.