This paper introduces a novel user-centered approach for generating confidence maps in ultrasound imaging. Existing methods, relying on simplified models, often fail to account for the full range of ultrasound artifacts and are limited by arbitrary boundary conditions, making frame-to-frame comparisons challenging. Our approach integrates sparse binary annotations into a physics-inspired probabilistic graphical model that can estimate the likelihood of confidence maps. We propose to train convolutional neural networks to predict the most likely confidence map. This results in an approach that is fast, capable of dealing with various artifacts, temporally stable, and allows users to directly influence the algorithm’s behavior using annotations. We demonstrate our method’s ability to cope with a variety of challenging artifacts and evaluate it quantitatively on two downstream tasks, bone shadow segmentation and multi-modal image registration, with superior performance than the state-of-art. We make our training code public.
DualTrack: Sensorless 3D Ultrasound needs Local and Global Context
Paul Wilson, Matteo Ronchettin, Rüdiger Göbl, Viktoria Markova, Sebastian Rosenzweig, Raphael Prevost, Parvin Mousavi, and Oliver Zettinig
In MICCAI 2025 - International Workshop of Advances in Simplifying Medical UltraSound (ASMUS), Sep 2025
Three-dimensional ultrasound (US) offers many clinical advantages over conventional 2D imaging, yet its widespread adoption is limited by the cost and complexity of traditional 3D systems. Sensorless 3D US, using deep learning to estimate a 3D probe trajectory from a sequence of 2D US images, is a promising alternative. Local features such as speckle patterns can help predict frame-to-frame motions, while global features, such as coarse shapes and anatomical structures, can situate the scan relative to anatomy and help predict its general shape. In prior approaches, global features are either ignored or tightly coupled with local feature extraction, restricting the ability to robustly model these two complementary aspects. We propose DualTrack, a novel dual encoder architecture leveraging decoupled local and global encoders specializing in their respective scale of feature extraction. The local encoder uses dense spatiotemporal convolutions to capture fine-grained features, while the global encoder utilizes an image backbone such as a 2D CNN or foundation model and temporal attention layers to embed high-level anatomical features and long-range dependencies. A lightweight fusion module then combines these features to estimate trajectory. Experimental results on a large public benchmark show that DualTrack achieves state-of-the-art accuracy and globally consistent 3D reconstructions, outperforming previous methods and yielding an average reconstruction error below 5 mm.
Update: The ASMUS paper was the Best Paper Award runner-up! :-)