A hitchhiker's guide to working with large, open-source neuroimaging datasets.
Corey HorienStephanie NobleAbigail S GreeneKangjoo LeeDaniel S BarronSiyuan GaoDavid O'ConnorMehraveh SalehiJavid DadashkarimiXilin ShenEvelyn M R LakeR xsTodd ConstableDustin SheinostPublished in: Nature human behaviour (2020)
Large datasets that enable researchers to perform investigations with unprecedented rigor are growing increasingly common in neuroimaging. Due to the simultaneous increasing popularity of open science, these state-of-the-art datasets are more accessible than ever to researchers around the world. While analysis of these samples has pushed the field forward, they pose a new set of challenges that might cause difficulties for novice users. Here we offer practical tips for working with large datasets from the end-user's perspective. We cover all aspects of the data lifecycle: from what to consider when downloading and storing the data to tips on how to become acquainted with a dataset one did not collect and what to share when communicating results. This manuscript serves as a practical guide one can use when working with large neuroimaging datasets, thus dissolving barriers to scientific discovery.