Station-length requirements for reliable performance-based examination scores

John H. Shatzer, Debra Darosa, Jerry A. Colliver, Lynne Barkmeier

Research output: Contribution to journalArticle

Abstract

Purpose. To directly compare the generalizability of medical students’ performance scores under systematically varied station times in two surgery end-of-clerkship performance-based examinations. Method. The participants were 36 third-year students randomly assigned to the first two rotations of the core surgery clerkship during 1991-92 at Southern Illinois University School of Medicine. The students rotated through a 12-station examination that employed standardized patients (SPs). In the first rotation, the student took six five-minute stations and six ten-minute stations. In the second rotation, the time lengths were reversed for the same stations. The students’ total scores were based on (1) subscores on checklists that were completed by the SPs and (2) subscores on the students’ written responses to short questions about each station (these responses were pro-vided at station couplets that were five minutes long, regardless of station length). Generalizability coefficients were computed from the pooled rotation results to provide reliabilities for scores from the two station lengths. Results. Generalizability decreased in the ten-minute stations, mostly attributable to less variability among students’ performances. The checklist subscores accounted for most of this variability, while couplet subscores remained stable between station lengths. Conclusion. The longer station length actually decreased the generalizability of the scores by decreasing the variability among students’ performances; thus, allocating different times to stations can affect the score reliability, as well as impact on the overall testing time, of performance-based examinations.

Original languageEnglish (US)
Pages (from-to)224-229
Number of pages6
JournalAcademic Medicine
Volume68
Issue number3
StatePublished - 1993

Fingerprint

Students
examination
performance
student
Checklist
surgery
Medical Students
medical student
Medicine
medicine
time
school

ASJC Scopus subject areas

  • Medicine(all)
  • Education
  • Public Health, Environmental and Occupational Health

Cite this

Shatzer, J. H., Darosa, D., Colliver, J. A., & Barkmeier, L. (1993). Station-length requirements for reliable performance-based examination scores. Academic Medicine, 68(3), 224-229.

Station-length requirements for reliable performance-based examination scores. / Shatzer, John H.; Darosa, Debra; Colliver, Jerry A.; Barkmeier, Lynne.

In: Academic Medicine, Vol. 68, No. 3, 1993, p. 224-229.

Research output: Contribution to journalArticle

Shatzer, JH, Darosa, D, Colliver, JA & Barkmeier, L 1993, 'Station-length requirements for reliable performance-based examination scores', Academic Medicine, vol. 68, no. 3, pp. 224-229.
Shatzer JH, Darosa D, Colliver JA, Barkmeier L. Station-length requirements for reliable performance-based examination scores. Academic Medicine. 1993;68(3):224-229.
Shatzer, John H. ; Darosa, Debra ; Colliver, Jerry A. ; Barkmeier, Lynne. / Station-length requirements for reliable performance-based examination scores. In: Academic Medicine. 1993 ; Vol. 68, No. 3. pp. 224-229.
@article{bd3e3aaeaf9a4a3d93c5dfedb49877b4,
title = "Station-length requirements for reliable performance-based examination scores",
abstract = "Purpose. To directly compare the generalizability of medical students’ performance scores under systematically varied station times in two surgery end-of-clerkship performance-based examinations. Method. The participants were 36 third-year students randomly assigned to the first two rotations of the core surgery clerkship during 1991-92 at Southern Illinois University School of Medicine. The students rotated through a 12-station examination that employed standardized patients (SPs). In the first rotation, the student took six five-minute stations and six ten-minute stations. In the second rotation, the time lengths were reversed for the same stations. The students’ total scores were based on (1) subscores on checklists that were completed by the SPs and (2) subscores on the students’ written responses to short questions about each station (these responses were pro-vided at station couplets that were five minutes long, regardless of station length). Generalizability coefficients were computed from the pooled rotation results to provide reliabilities for scores from the two station lengths. Results. Generalizability decreased in the ten-minute stations, mostly attributable to less variability among students’ performances. The checklist subscores accounted for most of this variability, while couplet subscores remained stable between station lengths. Conclusion. The longer station length actually decreased the generalizability of the scores by decreasing the variability among students’ performances; thus, allocating different times to stations can affect the score reliability, as well as impact on the overall testing time, of performance-based examinations.",
author = "Shatzer, {John H.} and Debra Darosa and Colliver, {Jerry A.} and Lynne Barkmeier",
year = "1993",
language = "English (US)",
volume = "68",
pages = "224--229",
journal = "Academic Medicine",
issn = "1040-2446",
publisher = "Lippincott Williams and Wilkins",
number = "3",

}

TY - JOUR

T1 - Station-length requirements for reliable performance-based examination scores

AU - Shatzer, John H.

AU - Darosa, Debra

AU - Colliver, Jerry A.

AU - Barkmeier, Lynne

PY - 1993

Y1 - 1993

N2 - Purpose. To directly compare the generalizability of medical students’ performance scores under systematically varied station times in two surgery end-of-clerkship performance-based examinations. Method. The participants were 36 third-year students randomly assigned to the first two rotations of the core surgery clerkship during 1991-92 at Southern Illinois University School of Medicine. The students rotated through a 12-station examination that employed standardized patients (SPs). In the first rotation, the student took six five-minute stations and six ten-minute stations. In the second rotation, the time lengths were reversed for the same stations. The students’ total scores were based on (1) subscores on checklists that were completed by the SPs and (2) subscores on the students’ written responses to short questions about each station (these responses were pro-vided at station couplets that were five minutes long, regardless of station length). Generalizability coefficients were computed from the pooled rotation results to provide reliabilities for scores from the two station lengths. Results. Generalizability decreased in the ten-minute stations, mostly attributable to less variability among students’ performances. The checklist subscores accounted for most of this variability, while couplet subscores remained stable between station lengths. Conclusion. The longer station length actually decreased the generalizability of the scores by decreasing the variability among students’ performances; thus, allocating different times to stations can affect the score reliability, as well as impact on the overall testing time, of performance-based examinations.

AB - Purpose. To directly compare the generalizability of medical students’ performance scores under systematically varied station times in two surgery end-of-clerkship performance-based examinations. Method. The participants were 36 third-year students randomly assigned to the first two rotations of the core surgery clerkship during 1991-92 at Southern Illinois University School of Medicine. The students rotated through a 12-station examination that employed standardized patients (SPs). In the first rotation, the student took six five-minute stations and six ten-minute stations. In the second rotation, the time lengths were reversed for the same stations. The students’ total scores were based on (1) subscores on checklists that were completed by the SPs and (2) subscores on the students’ written responses to short questions about each station (these responses were pro-vided at station couplets that were five minutes long, regardless of station length). Generalizability coefficients were computed from the pooled rotation results to provide reliabilities for scores from the two station lengths. Results. Generalizability decreased in the ten-minute stations, mostly attributable to less variability among students’ performances. The checklist subscores accounted for most of this variability, while couplet subscores remained stable between station lengths. Conclusion. The longer station length actually decreased the generalizability of the scores by decreasing the variability among students’ performances; thus, allocating different times to stations can affect the score reliability, as well as impact on the overall testing time, of performance-based examinations.

UR - http://www.scopus.com/inward/record.url?scp=0027318104&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027318104&partnerID=8YFLogxK

M3 - Article

C2 - 8447919

AN - SCOPUS:0027318104

VL - 68

SP - 224

EP - 229

JO - Academic Medicine

JF - Academic Medicine

SN - 1040-2446

IS - 3

ER -