Advancing Surgical Intelligence through Multi-Modal Representation Learning
Date: 2025/04/21 - 2025/04/21
Academic Seminar: Advancing Surgical Intelligence through Multi-Modal Representation Learning
Speaker: Kun Yuan, Ph.D. at Technical University of Munich
Time: 1:30 p.m., April 21st, 2025 (Beijing Time)
Online:
Abstract
Understanding surgical scenes across multiple modalities is essential for building context-aware and intelligent surgical systems. This talk focuses on integrating visual information from both laparoscopic and external cameras, capturing the internal operative field and the surrounding OR environment, to enable holistic perception and reasoning in surgery. Kun Yuan will present recent advances in surgical multi-modal representation learning, leveraging surgical foundation models trained on large-scale video-text data. Emphasis will be placed on knowledge-guided adaptation strategies, cross-view alignment, and the challenges of sparse supervision. Applications include surgical phase recognition, team interaction analysis, and enhanced decision support, with an outlook on building generalizable and trustworthy AI tools for the OR.
Biography
Kun Yuan is a joint senior Ph.D. student at the University of Strasbourg, France, and the Technical University of Munich, Germany, supervised by Prof. Nicolas Padoy and Prof. Nassir Navab. His research focuses on the development of multi-modal learning methods for surgical video analysis, with applications in surgical workflow understanding and intelligent operating rooms. He has been dedicated to cross-modal representation learning using laparoscopic and external OR video, contributing to the next generation of context-aware surgical AI systems.