Model benchmarking allows us to separate uncertainty in model predictions caused by model inputs from uncertainty due to model structural error. We extend this method with a "large-sample" approach (using data from multiple field sites) to measure prediction uncertainty caused by errors in (i) forcing data, (ii) model parameters, and (iii) model structure, and use it to compare the efficiency of soil moisture state and evapotranspiration flux predictions made by the four land surface models in the North American Land Data Assimilation System Phase 2 (NLDAS-2). Parameters dominated uncertainty in soil moisture estimates and forcing data dominated uncertainty in evapotranspiration estimates; however, the models themselves used only a fraction of the information available to them. This means that there is significant potential to improve all three components of the NLDAS-2 system. In particular, continued work toward refining the parameter maps and look-up tables, the forcing data measurement and processing, and also the land surface models themselves, has potential to result in improved estimates of surface mass and energy balances.