Scipion3: A workflow engine for cryo-electron microscopy image processing and structural biology.
Pablo ConesaYunior C FonsecaJorge Jiménez de la MorenaGrigory SharovJose Miguel de la Rosa-TrevínAna CuervoAlberto García MenaBorja Rodríguez de FranciscoDaniel Del HoyoDavid HerrerosDaniel MarchanDavid StrelakEstrella Fernández-GiménezErney Ramírez-AportelaFederico Pedro de Isidro-GómezIrene SánchezJames KriegerJosé Luis VilasLaura Del CanoMarcos GrageraMikel IcetaMarta MartínezPatricia LosanaRoberto MeleroRoberto MarabiniJosé María CarazoCarlos Oscar Sánchez SorzanoPublished in: Biological imaging (2023)
Image-processing pipelines require the design of complex workflows combining many different steps that bring the raw acquired data to a final result with biological meaning. In the image-processing domain of cryo-electron microscopy single-particle analysis (cryo-EM SPA), hundreds of steps must be performed to obtain the three-dimensional structure of a biological macromolecule by integrating data spread over thousands of micrographs containing millions of copies of allegedly the same macromolecule. The execution of such complicated workflows demands a specific tool to keep track of all these steps performed. Additionally, due to the extremely low signal-to-noise ratio (SNR), the estimation of any image parameter is heavily affected by noise resulting in a significant fraction of incorrect estimates. Although low SNR and processing millions of images by hundreds of sequential steps requiring substantial computational resources are specific to cryo-EM, these characteristics may be shared by other biological imaging domains. Here, we present Scipion, a Python generic open-source workflow engine specifically adapted for image processing. Its main characteristics are: (a) interoperability, (b) smart object model, (c) gluing operations, (d) comparison operations, (e) wide set of domain-specific operations, (f) execution in streaming, (g) smooth integration in high-performance computing environments, (h) execution with and without graphical capabilities, (i) flexible visualization, (j) user authentication and private access to private data, (k) scripting capabilities, (l) high performance, (m) traceability, (n) reproducibility, (o) self-reporting, (p) reusability, (q) extensibility, (r) software updates, and (s) non-restrictive software licensing.