Revisiting the Relationship Between Correlation Coefficient, Confidence Level, and Sample Size.
Qifan YangMinyi SuYan LiRen-Xiao WangPublished in: Journal of chemical information and modeling (2019)
In the field of computational chemistry, it is a very common task to compare the predictive power of theoretical models with Pearson correlation coefficients. A general understanding is that larger sample sizes lead to increased precision. However, what is the minimum sample size required for comparing two models? This issue has not been well addressed in this field. To the best of our knowledge, the only serious study of this kind was published by Carlson in 2013 [ J. Chem. Inf. Model. 2013 , 53 1837 - 1841 ], where they proposed a method for estimating the minimum sample size required by this task. Considering how a benchmark comparison is conducted in reality, we want to point out that (i) the possible intercorrelation between two models should not be neglected and (ii) the one-sided test is more reasonable because comparison direction is known a priori. Carlson's method has significantly overestimated the required minimum sample size due to these two issues. Here, we will describe a more appropriate method based on Dunn and Clark's test statistic, and we have designed an extensive numerical test to validate our method. The minimum sample sizes required by comparing two models under various conditions are computed with our method. Our study has shown that the required minimum sample size is determined by several factors, including confidence, power, correlation coefficients as well as the intercorrelation between two models. As a rule of thumb, a couple of hundred samples are sufficient at 90% confidence or above for comparing two models producing meaningful R values.