Synthetic Data Resource and Benchmarks for Time Cell Analysis and Detection Algorithms.
Kambadur G AnanthamurthyUpinder Singh BhallaPublished in: eNeuro (2023)
Hippocampal CA1 cells take part in reliable, time-locked activity sequences in tasks that involve an association between temporally separated stimuli, in a manner that tiles the interval between the stimuli. Such cells have been termed time cells. Here we adopt a first-principles approach to comparing diverse analysis and detection algorithms for identifying time cells. We generated synthetic activity datasets using calcium signals recorded in vivo from the mouse hippocampus using 2-Photon imaging, as template response waveforms. We assigned known, ground truth values to perturbations applied to perfect activity signals, including noise, calcium event width, timing imprecision, hit-trial ratio and background (untuned) activity. We tested a range of published and new algorithms and their variants on this dataset. We find that most algorithms correctly classify over 80% of cells, but have different balances between true and false positives, and different sensitivity to the five categories of perturbation. Reassuringly, most methods are reasonably robust to perturbations, including background activity, and show good concordance in classification of time cells. The same algorithms were also used to analyse and identify time cells in experimental physiology datasets recorded in vivo and most show good concordance. Significance Statement Numerous approaches have been developed to analyze time cells and neuronal activity sequences, but it is not clear if their classifications match, nor how sensitive they are to various sources of data variability. We provide two main contributions to address this: 1) A resource to generate ground truth labelled synthetic 2-P Calcium activity data with defined distributions for confounds such as noise and background activity, and 2) a survey of several methods for analyzing time-cell data using our synthetic data as ground truth.As a further resource, we provide a library of efficient C++ implementations of several algorithms with a Python interface. The synthetic dataset and its generation code are useful for profiling future methods, testing analysis toolchains, and as input to computational and experimental models of sequence detection.
Keyphrases
- induced apoptosis
- machine learning
- cell cycle arrest
- deep learning
- endoplasmic reticulum stress
- gene expression
- electronic health record
- oxidative stress
- systematic review
- single cell
- stem cells
- photodynamic therapy
- high resolution
- dna methylation
- study protocol
- brain injury
- bone marrow
- artificial intelligence
- copy number
- rna seq
- current status
- living cells
- label free
- phase iii
- prefrontal cortex
- fluorescent probe