nf-core/airrflow: an adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.
Gisela GabernetSusanna MarquezRobert BjornsonAlexander PeltzerHailong MengEdel AronNoah Yann LeeCole JensenDavid LaddFriederike HanssenSimon Heumosnull nullGur YaariMarkus C KowarikSven NahnsenSteven H KleinsteinPublished in: bioRxiv : the preprint server for biology (2024)
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).
Keyphrases
- single cell
- rna seq
- signaling pathway
- lps induced
- high throughput
- pi k akt
- nuclear factor
- oxidative stress
- sars cov
- data analysis
- inflammatory response
- electronic health record
- genome wide
- public health
- mental health
- emergency department
- dna methylation
- squamous cell carcinoma
- toll like receptor
- coronavirus disease
- big data
- cell proliferation
- social media
- patient safety
- tyrosine kinase
- working memory
- artificial intelligence
- single molecule
- binding protein
- advance care planning