National Cancer Institute Home at the National Institutes of Health |
Please wait while this form is being loaded....
The Applied Research Program Web site is no longer maintained. ARP's former staff have moved to the new Healthcare Delivery Research Program, the Behavioral Research Program, or the Epidemiology & Genetics Research Program, and the content from this Web site is being moved to one of those sites as appropriate. Please update your links and bookmarks!

Publication Abstract

Authors: Edelen MO, Reeve BB

Title: Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement.

Journal: Qual Life Res 16 Suppl 1:5-18

Date: 2007

Abstract: BACKGROUND: Health outcomes researchers are increasingly applying Item Response Theory (IRT) methods to questionnaire development, evaluation, and refinement efforts. OBJECTIVE: To provide a brief overview of IRT, to review some of the critical issues associated with IRT applications, and to demonstrate the basic features of IRT with an example. METHODS: Example data come from 6,504 adolescent respondents in the National Longitudinal Study of Adolescent Health public use data set who completed to the 19-item Feelings Scale for depression. The sample was split into a development and validation sample. Scale items were calibrated in the development sample with the Graded Response Model and the results were used to construct a 10-item short form. The short form was evaluated in the validation sample by examining the correspondence between IRT scores from the short form and the original, and by comparing the proportion of respondents identified as depressed according to the original and short form observed cut scores. RESULTS: The 19 items varied in their discrimination (slope parameter range: .86-2.66), and item location parameters reflected a considerable range of depression (-.72-3.39). However, the item set is most discriminating at higher levels of depression. In the validation sample IRT scores generated from the short and long forms were correlated at .96 and the average difference in these scores was -.01. In addition, nearly 90% of the sample was classified identically as at risk or not at risk for depression using observed score cut points from the short and long forms. CONCLUSIONS: When used appropriately, IRT can be a powerful tool for questionnaire development, evaluation, and refinement, resulting in precise, valid, and relatively brief instruments that minimize response burden.

Last Modified: 03 Sep 2013