Risk prediction models using multicenter data

1. Residual intraclass correlation coefficient

In multicenter studies there will usually be systematic variations in measurements between centers (clustering). This can be caused by several reasons, such as measurement biases, differences in equipment, different clinical practices, or different definitions of variables. We propose to use the residual intraclass correlation (RICC) as a tool to assess the level of ‘clustering’ in variables and to define outlying centers or physicians (1).

2. EPV for risk models based on multicenter data

In order to develop robust prediction models, the sample size should be sufficiently large in relation to the considered predictors and higher order effects such as transformations or interaction terms. Practical guidelines have been suggested to this end, but the potential effect of clustering in multicenter studies has never been investigated. We have carried out a simulation study to do so (2), and have observed that the amount of clustering does not play a crucial role in determining whether the sample size is sufficient.

References

Wynants L, Timmerman D, Bourne T, Van Huffel S, Van Calster B. Screening for data clustering in multicenter studies: the residual intraclass correlation. BMC Med Res Methodol. 2013;13:128.

Wynants L, Bouwmeester W, Moons KG, Moerbeek M, Timmerman D, Van Huffel S, et al. A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data. J Clin Epidemiol. 2015.