Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?

Jonathan D. Kibble, Teresa Johnson

Research output: Contribution to journalArticle

Abstract

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors "easy," "mod- erate," or "hard" and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ =-0.19, P< 0.01), indicating that, as intended item difficulty increased, the result- ing student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2= 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ=0.09, P = 0.14) or item discrimination (ρ=0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examina- tions were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70-0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.

Original languageEnglish (US)
Pages (from-to)396-401
Number of pages6
JournalAmerican Journal of Physiology - Advances in Physiology Education
Volume35
Issue number4
DOIs
StatePublished - 2011
Externally publishedYes

Fingerprint

Students
Aptitude
Curriculum
Teaching
Learning
Discrimination (Psychology)
Rejection (Psychology)

Keywords

  • Assessment
  • Bloom's taxonomy
  • Evaluation
  • Hidden curriculum
  • Medical education
  • Multiple-choice questions
  • Physiology education
  • Standard setting

ASJC Scopus subject areas

  • Medicine(all)
  • Physiology

Cite this

Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations? / Kibble, Jonathan D.; Johnson, Teresa.

In: American Journal of Physiology - Advances in Physiology Education, Vol. 35, No. 4, 2011, p. 396-401.

Research output: Contribution to journalArticle

@article{05bb67e0d5c54b5cadbe97f76cd52d61,
title = "Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?",
abstract = "The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors {"}easy,{"} {"}mod- erate,{"} or {"}hard{"} and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ =-0.19, P< 0.01), indicating that, as intended item difficulty increased, the result- ing student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48{\%} of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2= 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ=0.09, P = 0.14) or item discrimination (ρ=0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examina- tions were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70-0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.",
keywords = "Assessment, Bloom's taxonomy, Evaluation, Hidden curriculum, Medical education, Multiple-choice questions, Physiology education, Standard setting",
author = "Kibble, {Jonathan D.} and Teresa Johnson",
year = "2011",
doi = "10.1152/advan.00062.2011",
language = "English (US)",
volume = "35",
pages = "396--401",
journal = "American Journal of Physiology - Advances in Physiology Education",
issn = "1043-4046",
publisher = "American Physiological Society",
number = "4",

}

TY - JOUR

T1 - Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?

AU - Kibble, Jonathan D.

AU - Johnson, Teresa

PY - 2011

Y1 - 2011

N2 - The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors "easy," "mod- erate," or "hard" and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ =-0.19, P< 0.01), indicating that, as intended item difficulty increased, the result- ing student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2= 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ=0.09, P = 0.14) or item discrimination (ρ=0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examina- tions were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70-0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.

AB - The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors "easy," "mod- erate," or "hard" and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ =-0.19, P< 0.01), indicating that, as intended item difficulty increased, the result- ing student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2= 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ=0.09, P = 0.14) or item discrimination (ρ=0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examina- tions were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70-0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.

KW - Assessment

KW - Bloom's taxonomy

KW - Evaluation

KW - Hidden curriculum

KW - Medical education

KW - Multiple-choice questions

KW - Physiology education

KW - Standard setting

UR - http://www.scopus.com/inward/record.url?scp=84859315912&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859315912&partnerID=8YFLogxK

U2 - 10.1152/advan.00062.2011

DO - 10.1152/advan.00062.2011

M3 - Article

C2 - 22139777

AN - SCOPUS:84859315912

VL - 35

SP - 396

EP - 401

JO - American Journal of Physiology - Advances in Physiology Education

JF - American Journal of Physiology - Advances in Physiology Education

SN - 1043-4046

IS - 4

ER -