An acquisition, curation and management workflow for sustainable, terabyte-scale marine image analysis.
Timm SchoeningKevin KöserJens GreinertPublished in: Scientific data (2018)
Optical imaging is a common technique in ocean research. Diving robots, towed cameras, drop-cameras and TV-guided sampling gear: all produce image data of the underwater environment. Technological advances like 4K cameras, autonomous robots, high-capacity batteries and LED lighting now allow systematic optical monitoring at large spatial scale and shorter time but with increased data volume and velocity. Volume and velocity are further increased by growing fleets and emerging swarms of autonomous vehicles creating big data sets in parallel. This generates a need for automated data processing to harvest maximum information. Systematic data analysis benefits from calibrated, geo-referenced data with clear metadata description, particularly for machine vision and machine learning. Hence, the expensive data acquisition must be documented, data should be curated as soon as possible, backed up and made publicly available. Here, we present a workflow towards sustainable marine image analysis. We describe guidelines for data acquisition, curation and management and apply it to the use case of a multi-terabyte deep-sea data set acquired by an autonomous underwater vehicle.