A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change

Sandra A. Mitchell; David Jacobsohn; Kimberly E. Thormann Powers; Paul A. Carpenter; Mary E.D. Flowers; Edward W. Cowen; Mark Schubert; Maria L. Turner; Stephanie J. Lee; Paul Martin; Michael R. Bishop; Kristin Baird; Javier Bolaños-Meade; Kevin Boyd; Jane M. Fall-Dickson; Lynn H. Gerber; Jean Pierre Guadagnini; Matin Imanguli; Michael C. Krumlauf; Leslie Lawley; Li Li; Bryce B. Reeve; Janine Austin Clayton; Georgia B. Vogelsang; Steven Z. Pavletic

doi:10.1016/j.bbmt.2011.04.002

A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change

Sandra A. Mitchell, David Jacobsohn, Kimberly E. Thormann Powers, Paul A. Carpenter, Mary E.D. Flowers, Edward W. Cowen, Mark Schubert, Maria L. Turner, Stephanie J. Lee, Paul Martin, Michael R. Bishop, Kristin Baird, Javier Bolaños-Meade, Kevin Boyd, Jane M. Fall-Dickson, Lynn H. Gerber, Jean Pierre Guadagnini, Matin Imanguli, Michael C. Krumlauf, Leslie LawleyLi Li, Bryce B. Reeve, Janine Austin Clayton, Georgia B. Vogelsang, Steven Z. Pavletic

School of Medicine

Research output: Contribution to journal › Article › peer-review

40 Scopus citations

Abstract

The lack of standardized criteria for measuring therapeutic response is a major obstacle to the development of new therapeutic agents for chronic graft-versus-host disease (cGVHD). National Institutes of Health (NIH) consensus criteria for evaluating therapeutic response were published in 2006. We report the results of 4 consecutive pilot trials evaluating the feasibility and estimating the interrater reliability and minimum detectable change of these response criteria. Hematology-oncology clinicians with limited experience in applying the NIH cGVHD response criteria (n = 34) participated in a 2.5-hour training session on response evaluation in cGVHD. Feasibility and interrater reliability between subspecialty cGVHD experts and this panel of clinician raters were examined in a sample of 25 children and adults with cGVHD. The minimum detectable change was calculated using the standard error of measurement. Clinicians' impressions of the brief training session, the photo atlas, and the response criteria documentation tools were generally favorable. Performing and documenting the full set of response evaluations required a median of 21 minutes (range: 12-60 minutes) per rater. The Schirmer tear test required the greatest time of any single test (median: 9 minutes). Overall, interrater agreement for skin and oral manifestations was modest; however, in the third and fourth trials, the agreement between clinicians and experts for all dimensions except movable sclerosis approached satisfactory values. In the final 2 trials, the threshold for defining change exceeding measurement error was 19% to 22% body surface area (BSA) for erythema, 18% to 26% BSA for movable sclerosis, 17% to 21% BSA for nonmovable sclerosis, and 2.1 to 2.6 points on the 15-point NIH Oral cGHVD scale. Agreement between clinician-expert pairs was moderate to substantial for the measures of functional capacity and for the gastrointestinal and global cGVHD rating scales. These results suggest that the NIH response criteria are feasible for use, and these reliability estimates are encouraging, because they were observed following a single 2.5-hour training session given at multiple transplant centers, with no opportunity for iterative training and calibration. Research is needed to evaluate inter- and intrarater reliability in larger samples, and to evaluate these response criteria as predictors of outcomes in clinical trials.

Original language	English (US)
Pages (from-to)	1619-1629
Number of pages	11
Journal	Biology of Blood and Marrow Transplantation
Volume	17
Issue number	11
DOIs	https://doi.org/10.1016/j.bbmt.2011.04.002
State	Published - Nov 2011

Keywords

Chronic graft-versus-host disease
Interrater reliability
Minimum detectable change
Response criteria

ASJC Scopus subject areas

Hematology
Transplantation

Access to Document

10.1016/j.bbmt.2011.04.002

Fingerprint

Dive into the research topics of 'A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change'. Together they form a unique fingerprint.

Cite this

Mitchell, S. A., Jacobsohn, D., Thormann Powers, K. E., Carpenter, P. A., Flowers, M. E. D., Cowen, E. W., Schubert, M., Turner, M. L., Lee, S. J., Martin, P., Bishop, M. R., Baird, K., Bolaños-Meade, J., Boyd, K., Fall-Dickson, J. M., Gerber, L. H., Guadagnini, J. P., Imanguli, M., Krumlauf, M. C., ... Pavletic, S. Z. (2011). A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change. Biology of Blood and Marrow Transplantation, 17(11), 1619-1629. https://doi.org/10.1016/j.bbmt.2011.04.002

A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change. / Mitchell, Sandra A.; Jacobsohn, David; Thormann Powers, Kimberly E. et al.
In: Biology of Blood and Marrow Transplantation, Vol. 17, No. 11, 11.2011, p. 1619-1629.

Research output: Contribution to journal › Article › peer-review

Mitchell, SA, Jacobsohn, D, Thormann Powers, KE, Carpenter, PA, Flowers, MED, Cowen, EW, Schubert, M, Turner, ML, Lee, SJ, Martin, P, Bishop, MR, Baird, K, Bolaños-Meade, J, Boyd, K, Fall-Dickson, JM, Gerber, LH, Guadagnini, JP, Imanguli, M, Krumlauf, MC, Lawley, L, Li, L, Reeve, BB, Clayton, JA, Vogelsang, GB & Pavletic, SZ 2011, 'A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change', Biology of Blood and Marrow Transplantation, vol. 17, no. 11, pp. 1619-1629. https://doi.org/10.1016/j.bbmt.2011.04.002

Mitchell SA, Jacobsohn D, Thormann Powers KE, Carpenter PA, Flowers MED, Cowen EW et al. A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change. Biology of Blood and Marrow Transplantation. 2011 Nov;17(11):1619-1629. doi: 10.1016/j.bbmt.2011.04.002

Mitchell, Sandra A. ; Jacobsohn, David ; Thormann Powers, Kimberly E. et al. / A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures : Feasibility, interrater reliability, and minimum detectable change. In: Biology of Blood and Marrow Transplantation. 2011 ; Vol. 17, No. 11. pp. 1619-1629.

@article{8cb4a4820d874f2aafd6d508178a6eda,

title = "A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change",

abstract = "The lack of standardized criteria for measuring therapeutic response is a major obstacle to the development of new therapeutic agents for chronic graft-versus-host disease (cGVHD). National Institutes of Health (NIH) consensus criteria for evaluating therapeutic response were published in 2006. We report the results of 4 consecutive pilot trials evaluating the feasibility and estimating the interrater reliability and minimum detectable change of these response criteria. Hematology-oncology clinicians with limited experience in applying the NIH cGVHD response criteria (n = 34) participated in a 2.5-hour training session on response evaluation in cGVHD. Feasibility and interrater reliability between subspecialty cGVHD experts and this panel of clinician raters were examined in a sample of 25 children and adults with cGVHD. The minimum detectable change was calculated using the standard error of measurement. Clinicians' impressions of the brief training session, the photo atlas, and the response criteria documentation tools were generally favorable. Performing and documenting the full set of response evaluations required a median of 21 minutes (range: 12-60 minutes) per rater. The Schirmer tear test required the greatest time of any single test (median: 9 minutes). Overall, interrater agreement for skin and oral manifestations was modest; however, in the third and fourth trials, the agreement between clinicians and experts for all dimensions except movable sclerosis approached satisfactory values. In the final 2 trials, the threshold for defining change exceeding measurement error was 19% to 22% body surface area (BSA) for erythema, 18% to 26% BSA for movable sclerosis, 17% to 21% BSA for nonmovable sclerosis, and 2.1 to 2.6 points on the 15-point NIH Oral cGHVD scale. Agreement between clinician-expert pairs was moderate to substantial for the measures of functional capacity and for the gastrointestinal and global cGVHD rating scales. These results suggest that the NIH response criteria are feasible for use, and these reliability estimates are encouraging, because they were observed following a single 2.5-hour training session given at multiple transplant centers, with no opportunity for iterative training and calibration. Research is needed to evaluate inter- and intrarater reliability in larger samples, and to evaluate these response criteria as predictors of outcomes in clinical trials.",

keywords = "Chronic graft-versus-host disease, Interrater reliability, Minimum detectable change, Response criteria",

author = "Mitchell, {Sandra A.} and David Jacobsohn and {Thormann Powers}, {Kimberly E.} and Carpenter, {Paul A.} and Flowers, {Mary E.D.} and Cowen, {Edward W.} and Mark Schubert and Turner, {Maria L.} and Lee, {Stephanie J.} and Paul Martin and Bishop, {Michael R.} and Kristin Baird and Javier Bola{\~n}os-Meade and Kevin Boyd and Fall-Dickson, {Jane M.} and Gerber, {Lynn H.} and Guadagnini, {Jean Pierre} and Matin Imanguli and Krumlauf, {Michael C.} and Leslie Lawley and Li Li and Reeve, {Bryce B.} and Clayton, {Janine Austin} and Vogelsang, {Georgia B.} and Pavletic, {Steven Z.}",

note = "Funding Information: Financial disclosure: This work was supported by the Intramural Research Program at the National Institutes of Health Clinical Center , and by the Center for Cancer Research, National Cancer Institute . ",

year = "2011",

month = nov,

doi = "10.1016/j.bbmt.2011.04.002",

language = "English (US)",

volume = "17",

pages = "1619--1629",

journal = "Biology of Blood and Marrow Transplantation",

issn = "1083-8791",

publisher = "Elsevier Inc.",

number = "11",

}

TY - JOUR

T1 - A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures

T2 - Feasibility, interrater reliability, and minimum detectable change

AU - Mitchell, Sandra A.

AU - Jacobsohn, David

AU - Thormann Powers, Kimberly E.

AU - Carpenter, Paul A.

AU - Flowers, Mary E.D.

AU - Cowen, Edward W.

AU - Schubert, Mark

AU - Turner, Maria L.

AU - Lee, Stephanie J.

AU - Martin, Paul

AU - Bishop, Michael R.

AU - Baird, Kristin

AU - Bolaños-Meade, Javier

AU - Boyd, Kevin

AU - Fall-Dickson, Jane M.

AU - Gerber, Lynn H.

AU - Guadagnini, Jean Pierre

AU - Imanguli, Matin

AU - Krumlauf, Michael C.

AU - Lawley, Leslie

AU - Li, Li

AU - Reeve, Bryce B.

AU - Clayton, Janine Austin

AU - Vogelsang, Georgia B.

AU - Pavletic, Steven Z.

N1 - Funding Information: Financial disclosure: This work was supported by the Intramural Research Program at the National Institutes of Health Clinical Center , and by the Center for Cancer Research, National Cancer Institute .

PY - 2011/11

Y1 - 2011/11

N2 - The lack of standardized criteria for measuring therapeutic response is a major obstacle to the development of new therapeutic agents for chronic graft-versus-host disease (cGVHD). National Institutes of Health (NIH) consensus criteria for evaluating therapeutic response were published in 2006. We report the results of 4 consecutive pilot trials evaluating the feasibility and estimating the interrater reliability and minimum detectable change of these response criteria. Hematology-oncology clinicians with limited experience in applying the NIH cGVHD response criteria (n = 34) participated in a 2.5-hour training session on response evaluation in cGVHD. Feasibility and interrater reliability between subspecialty cGVHD experts and this panel of clinician raters were examined in a sample of 25 children and adults with cGVHD. The minimum detectable change was calculated using the standard error of measurement. Clinicians' impressions of the brief training session, the photo atlas, and the response criteria documentation tools were generally favorable. Performing and documenting the full set of response evaluations required a median of 21 minutes (range: 12-60 minutes) per rater. The Schirmer tear test required the greatest time of any single test (median: 9 minutes). Overall, interrater agreement for skin and oral manifestations was modest; however, in the third and fourth trials, the agreement between clinicians and experts for all dimensions except movable sclerosis approached satisfactory values. In the final 2 trials, the threshold for defining change exceeding measurement error was 19% to 22% body surface area (BSA) for erythema, 18% to 26% BSA for movable sclerosis, 17% to 21% BSA for nonmovable sclerosis, and 2.1 to 2.6 points on the 15-point NIH Oral cGHVD scale. Agreement between clinician-expert pairs was moderate to substantial for the measures of functional capacity and for the gastrointestinal and global cGVHD rating scales. These results suggest that the NIH response criteria are feasible for use, and these reliability estimates are encouraging, because they were observed following a single 2.5-hour training session given at multiple transplant centers, with no opportunity for iterative training and calibration. Research is needed to evaluate inter- and intrarater reliability in larger samples, and to evaluate these response criteria as predictors of outcomes in clinical trials.

AB - The lack of standardized criteria for measuring therapeutic response is a major obstacle to the development of new therapeutic agents for chronic graft-versus-host disease (cGVHD). National Institutes of Health (NIH) consensus criteria for evaluating therapeutic response were published in 2006. We report the results of 4 consecutive pilot trials evaluating the feasibility and estimating the interrater reliability and minimum detectable change of these response criteria. Hematology-oncology clinicians with limited experience in applying the NIH cGVHD response criteria (n = 34) participated in a 2.5-hour training session on response evaluation in cGVHD. Feasibility and interrater reliability between subspecialty cGVHD experts and this panel of clinician raters were examined in a sample of 25 children and adults with cGVHD. The minimum detectable change was calculated using the standard error of measurement. Clinicians' impressions of the brief training session, the photo atlas, and the response criteria documentation tools were generally favorable. Performing and documenting the full set of response evaluations required a median of 21 minutes (range: 12-60 minutes) per rater. The Schirmer tear test required the greatest time of any single test (median: 9 minutes). Overall, interrater agreement for skin and oral manifestations was modest; however, in the third and fourth trials, the agreement between clinicians and experts for all dimensions except movable sclerosis approached satisfactory values. In the final 2 trials, the threshold for defining change exceeding measurement error was 19% to 22% body surface area (BSA) for erythema, 18% to 26% BSA for movable sclerosis, 17% to 21% BSA for nonmovable sclerosis, and 2.1 to 2.6 points on the 15-point NIH Oral cGHVD scale. Agreement between clinician-expert pairs was moderate to substantial for the measures of functional capacity and for the gastrointestinal and global cGVHD rating scales. These results suggest that the NIH response criteria are feasible for use, and these reliability estimates are encouraging, because they were observed following a single 2.5-hour training session given at multiple transplant centers, with no opportunity for iterative training and calibration. Research is needed to evaluate inter- and intrarater reliability in larger samples, and to evaluate these response criteria as predictors of outcomes in clinical trials.

KW - Chronic graft-versus-host disease

KW - Interrater reliability

KW - Minimum detectable change

KW - Response criteria

UR - http://www.scopus.com/inward/record.url?scp=80054858845&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80054858845&partnerID=8YFLogxK

U2 - 10.1016/j.bbmt.2011.04.002

DO - 10.1016/j.bbmt.2011.04.002

M3 - Article

C2 - 21536143

AN - SCOPUS:80054858845

SN - 1083-8791

VL - 17

SP - 1619

EP - 1629

JO - Biology of Blood and Marrow Transplantation

JF - Biology of Blood and Marrow Transplantation

IS - 11

ER -

A multicenter pilot evaluation of the national institutes of health chronic graft-versus-host disease (cGVHD) therapeutic response measures: Feasibility, interrater reliability, and minimum detectable change

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this