CAVE: Connectome Annotation Versioning Engine.
Sven DorkenwaldCasey M Schneider-MizellDerrick BrittainAkhilesh HalageriChris JordanNico KemnitzManual A CastroWilliam M SilversmithJeremy Maitin-ShephardJakob TroidlHanspeter PfisterValentin GilletDaniel XenesJ Alexander BaeAgnes L BodorJoAnn BuchananDaniel J BumbargerLeila ElabbadyZhen JiaDaniel KapnerSam KinnKisuk LeeKai LiRan LuThomas MacrinaGayathri MahalingamEric MitchellShanka Subhra MondalShang MuBarak NehoranSergiy PopovychMarc M TakenoRussel M TorresNicholas L TurnerWilliam WongJingpeng WuWenjing YinSzi-Chieh YuR Clay ReidNuno Maçarico da CostaH Sebastian SeungForrest C CollmanPublished in: bioRxiv : the preprint server for biology (2023)
Advances in Electron Microscopy, image segmentation and computational infrastructure have given rise to large-scale and richly annotated connectomic datasets which are increasingly shared across communities. To enable collaboration, users need to be able to concurrently create new annotations and correct errors in the automated segmentation by proofreading. In large datasets, every proofreading edit relabels cell identities of millions of voxels and thousands of annotations like synapses. For analysis, users require immediate and reproducible access to this constantly changing and expanding data landscape. Here, we present the Connectome Annotation Versioning Engine (CAVE), a computational infrastructure for immediate and reproducible connectome analysis in up-to petascale datasets (∼1mm 3 ) while proofreading and annotating is ongoing. For segmentation, CAVE provides a distributed proofreading infrastructure for continuous versioning of large reconstructions. Annotations in CAVE are defined by locations such that they can be quickly assigned to the underlying segment which enables fast analysis queries of CAVE's data for arbitrary time points. CAVE supports schematized, extensible annotations, so that researchers can readily design novel annotation types. CAVE is already used for many connectomics datasets, including the largest datasets available to date.