Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores

Seth D. Goldstein, Brenessa Lindeman, Jorie Colbert-Getz, Trisha Arbella, Robert A Dudas, Anne Lidor, Bethany C Sacks

Research output: Contribution to journalArticle

Abstract

Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.

Original languageEnglish (US)
Pages (from-to)231-235
Number of pages5
JournalAmerican Journal of Surgery
Volume207
Issue number2
DOIs
StatePublished - Feb 2014

Fingerprint

Medical Students
Coroners and Medical Examiners
Students
Medical Records
Surgeons
Analysis of Variance
Medicine

Keywords

  • Assessment
  • Medical student education
  • Surgery clerkship

ASJC Scopus subject areas

  • Surgery

Cite this

Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores. / Goldstein, Seth D.; Lindeman, Brenessa; Colbert-Getz, Jorie; Arbella, Trisha; Dudas, Robert A; Lidor, Anne; Sacks, Bethany C.

In: American Journal of Surgery, Vol. 207, No. 2, 02.2014, p. 231-235.

Research output: Contribution to journalArticle

@article{b1bfeba1bba24ff8a0c22c7a0e76735e,
title = "Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores",
abstract = "Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19{\%}). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.",
keywords = "Assessment, Medical student education, Surgery clerkship",
author = "Goldstein, {Seth D.} and Brenessa Lindeman and Jorie Colbert-Getz and Trisha Arbella and Dudas, {Robert A} and Anne Lidor and Sacks, {Bethany C}",
year = "2014",
month = "2",
doi = "10.1016/j.amjsurg.2013.10.008",
language = "English (US)",
volume = "207",
pages = "231--235",
journal = "American Journal of Surgery",
issn = "0002-9610",
publisher = "Elsevier Inc.",
number = "2",

}

TY - JOUR

T1 - Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores

AU - Goldstein, Seth D.

AU - Lindeman, Brenessa

AU - Colbert-Getz, Jorie

AU - Arbella, Trisha

AU - Dudas, Robert A

AU - Lidor, Anne

AU - Sacks, Bethany C

PY - 2014/2

Y1 - 2014/2

N2 - Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.

AB - Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.

KW - Assessment

KW - Medical student education

KW - Surgery clerkship

UR - http://www.scopus.com/inward/record.url?scp=84893672911&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893672911&partnerID=8YFLogxK

U2 - 10.1016/j.amjsurg.2013.10.008

DO - 10.1016/j.amjsurg.2013.10.008

M3 - Article

C2 - 24239528

AN - SCOPUS:84893672911

VL - 207

SP - 231

EP - 235

JO - American Journal of Surgery

JF - American Journal of Surgery

SN - 0002-9610

IS - 2

ER -