Login / Signup

The Neuroimaging Data Model Linear Regression Tool (nidm_linreg): PyNIDM Project.

Ashmita KumarAlbert CrowleyNazek QuederJ B PolineSatrajit S GhoshDavid N KennedyJeffery S GretheDavid B Keator
Published in: F1000Research (2022)
The Neuroimaging Data Model (NIDM) is a series of specifications for describing all aspects of the neuroimaging data lifecycle from raw data to analyses and provenance. NIDM uses community-driven terminologies along with unambiguous data dictionaries within a Resource Description Framework (RDF) document to describe data and metadata for integration and query. Data from different studies, using locally defined variable names, can be retrieved by linking them to higher-order concepts from established ontologies and terminologies. Through these capabilities, NIDM documents are expected to improve reproducibility and facilitate data discovery and reuse. PyNIDM is a Python toolbox supporting the creation, manipulation, and querying of NIDM documents. Using the query tools available in PyNIDM, users are able interrogate datasets to find studies that have collected variables measuring similar phenotypic properties. This, in turn, facilitates the transformation and combination of data across multiple studies. The focus of this manuscript is the linear regression tool which is a part of the PyNIDM toolbox and works directly on NIDM documents. It provides a high-level statistical analysis that aids researchers in gaining more insight into the data that they are considering combining across studies. This saves researchers valuable time and effort while showing potential relationships between variables. The linear regression tool operates through a command-line interface integrated with the other tools (pynidm linear-regression) and provides the user with the opportunity to specify variables of interest using the rich query techniques available for NIDM documents and then conduct a linear regression with optional contrast and regularization.
Keyphrases
  • electronic health record
  • big data
  • machine learning
  • healthcare
  • computed tomography
  • magnetic resonance imaging
  • risk assessment
  • data analysis
  • high throughput
  • wastewater treatment
  • human health