Zero-shot prompt-based video encoder for surgical gesture recognition.
Mingxing RaoYinhong QinSoheil KolouriJie Ying WuDaniel MoyerPublished in: International journal of computer assisted radiology and surgery (2024)
Bridge-prompt and similar pre-trained + prompt-tuned video encoder models present significant visual representation for surgical robotics, especially in gesture recognition tasks. Given the diverse range of surgical tasks (gestures), the ability of these models to zero-shot transfer without the need for any task (gesture) specific retraining makes them invaluable.