Healthcare quality measures are statistics that serve to evaluate healthcare providers and identify those that need to improve their care. Before using these measures in clinical practice, developers and reviewers assess measure reliability, which describes the degree to which differences in the measure values reflect actual variation in healthcare quality, as opposed to random noise. The Inter-Unit Reliability (IUR) is a popular statistic for assessing reliability, and it describes the proportion of total variation in a measure that is attributable to between-provider variation. However, Kalbfleisch, He, Xia, and Li (2018) [ Health Services and Outcomes Research Methodology , 18, 215-225] have argued that the IUR has a severe limitation in that some of the between-provider variation may be unrelated to quality of care. In this paper, we illustrate the practical implications of this limitation through several concrete examples. We show that certain best-practices in measure development, such as careful risk adjustment and exclusion of unstable measure values, can decrease the sample IUR value. These findings uncover potential negative consequences of discarding measures with IUR values below some arbitrary threshold.