Skip Nav


Assessing Reliability

❶The answer depends on the amount of research support for such a relationship.

Reliability in research

Test-Retest Reliability
Internal Reliability and Personality Tests
This article is a part of the guide:

When we examine a construct in a study, we choose one of a number of possible ways to measure that construct [see the section on Constructs in quantitative research , if you are unsure what constructs are, or the difference between constructs and variables]. For example, we may choose to use questionnaire items, interview questions, and so forth. These questionnaire items or interview questions are part of the measurement procedure. This measurement procedure should provide an accurate representation of the construct it is measuring if it is to be considered valid.

For example, if we want to measure the construct, intelligence , we need to have a measurement procedure that accurately measures a person's intelligence.

Since there are many ways of thinking about intelligence e. In quantitative research, the measurement procedure consists of variables ; whether a single variable or a number of variables that may make up a construct [see the section on Constructs in quantitative research ]. When we think about the reliability of these variables, we want to know how stable or constant they are. This assumption, that the variable you are measuring is stable or constant, is central to the concept of reliability.

In principal, a measurement procedure that is stable or constant should produce the same or nearly the same results if the same individuals and conditions are used. So what do we mean when we say that a measurement procedure is constant or stable?

Some variables are more stable constant than others; that is, some change significantly, whilst others are reasonably constant. Therefore, the score measured e. The true score is the actual score that would reliably reflect the measurement e. The error reflects conditions that result in the score that we are measuring not reflecting the true score , but a variation on the actual score e. This error component within a measurement procedure will vary from one measurement to the next, increasing and decreasing the score for the variable.

It is assumed that this happens randomly, with the error averaging zero over time; that is, the increases or decreases in error over a number of measurements even themselves out so that we end up with the true score e. Provided that the error component within a measurement procedure is relatively small , the scores that are attained over a number of measurements will be relatively consistent ; that is, there will be small differences in the scores between measurements.

As such, we can say that the measurement procedure is reliable. Take the following example:. Intelligence using IQ True score: The parallel forms approach is very similar to the split-half reliability described below. The major difference is that parallel forms are constructed so that the two forms can be used independent of each other and considered equivalent measures.

For instance, we might be concerned about a testing threat to internal validity. If we use Form A for the pretest and Form B for the posttest, we minimize that problem. With split-half reliability we have an instrument that we wish to use as a single measurement instrument and only develop randomly split halves for purposes of estimating reliability.

In internal consistency reliability estimation we use our single measurement instrument administered to a group of people on one occasion to estimate reliability. In effect we judge the reliability of the instrument by estimating how well the items that reflect the same construct yield similar results.

We are looking at how consistent the results are for different items for the same construct within the measure. There are a wide variety of internal consistency measures that can be used. The average inter-item correlation uses all of the items on our instrument that are designed to measure the same construct. We first compute the correlation between each pair of items, as illustrated in the figure. For example, if we have six items we will have 15 different item pairings i. The average interitem correlation is simply the average or mean of all these correlations.

In the example, we find an average inter-item correlation of. This approach also uses the inter-item correlations. In addition, we compute a total score for the six items and use that as a seventh variable in the analysis.

The figure shows the six item-to-total correlations at the bottom of the correlation matrix. In split-half reliability we randomly divide all items that purport to measure the same construct into two sets. We administer the entire instrument to a sample of people and calculate the total score for each randomly divided half. In the example it is. Imagine that we compute one split-half reliability and then randomly divide the items into another set of split halves and recompute, and keep doing this until we have computed all possible split half estimates of reliability.

Cronbach's Alpha is mathematically equivalent to the average of all possible split-half estimates, although that's not how we compute it. Notice that when I say we compute all possible split-half estimates, I don't mean that each time we go an measure a new sample!

That would take forever. Instead, we calculate all split-half estimates from the same sample. Because we measured all of our sample on each of the six items, all we have to do is have the computer analysis do the random subsets of items and compute the resulting correlations.

The figure shows several of the split-half estimates for our six item example and lists them as SH with a subscript. Just keep in mind that although Cronbach's Alpha is equivalent to the average of all possible split half correlations we would never actually calculate it that way. Some clever mathematician Cronbach, I presume! Each of the reliability estimators has certain advantages and disadvantages.

Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. For example, let's say you collected videotapes of child-mother interactions and had a rater code the videos for how often the mother smiled at the child.

To establish inter-rater reliability you could take a sample of videos and have two raters code them independently. To estimate test-retest reliability you could have a single rater code the same videos on two different occasions. You might use the inter-rater approach especially if you were interested in using a team of raters and you wanted to establish that they yielded consistent results.

If you get a suitably high inter-rater reliability you could then justify allowing them to work independently on coding different videos. You might use the test-retest approach when you only have a single rater and don't want to train any others.

On the other hand, in some studies it is reasonable to do both to help establish the reliability of the raters or observers. The parallel forms estimator is typically only used in situations where you intend to use the two forms as alternate measures of the same thing.

Both the parallel forms and all of the internal consistency estimators have one major constraint -- you have to have multiple items designed to measure the same construct. This is relatively easy to achieve in certain contexts like achievement testing it's easy, for instance, to construct lots of similar addition problems for a math test , but for more complex or subjective constructs this can be a real challenge.

If you do have lots of items, Cronbach's Alpha tends to be the most frequently used estimate of internal consistency. The test-retest estimator is especially feasible in most experimental and quasi-experimental designs that use a no-treatment control group. In these designs you always have a control group that is measured on two occasions pretest and posttest.

Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters.

What is Reliability?

Main Topics

Privacy Policy

Reliability is a necessary ingredient for determining the overall validity of a scientific experiment and enhancing the strength of the results. Debate between social and pure scientists, concerning reliability, is robust and ongoing.

Privacy FAQs

Reliability in research. Reliability, like validity, is a way of assessing the quality of the measurement procedure used to collect data in a dissertation. In order for the results from a study to be considered valid, the measurement procedure must first be reliable.

About Our Ads

The term reliability in psychological research refers to the consistency of a research study or measuring test. For example, if a person weighs themselves during the course of a day they would expect to see a similar reading. Scales which measured weight differently each time would be of little Saul Mcleod. The use of reliability and validity are common in quantitative research and now it is reconsidered in the qualitative research paradigm. Since reliability and validity are rooted in positivist perspective then they should be redefined for their use in a naturalistic approach. Like reliability and validity as used in quantitative research are providing .

Cookie Info

Reliability and Validity. In order for research data to be of value and of use, they must be both reliable and valid.. Reliability. Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time.