Risk-adjusted quality measures are used to evaluate healthcare providers with respect to national norms while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the between-provider variation in these measures is entirely due to meaningful differences in quality of care. However, in practice, much of the between-provider variation will be due to trivial fluctuations in healthcare quality, or unobservable confounding risk factors. If these additional sources of variation are not accounted for, conventional methods will disproportionately identify larger providers as outliers, even though their departures from the national norms may not be "extreme" or clinically meaningful. Motivated by efforts to evaluate the quality of care provided by transplant centers, we develop a composite evaluation score based on a novel individualized empirical null method, which robustly accounts for overdispersion due to unobserved risk factors, models the marginal variance of standardized scores as a function of the effective sample size, and only requires the use of publicly-available center-level statistics. The evaluations of United States kidney transplant centers based on the proposed composite score are substantially different from those based on conventional methods. Simulations show that the proposed empirical null approach more accurately classifies centers in terms of quality of care, compared to existing methods.