CS-UNet

A DPLR-like 3D Segmentation Model for Multimodal Brain Images

Jonathan Cui 1,
Suman Saha 1, &
David A. Araujo 2,
Md Faisal Kabir 2.
1 College of Engineering
The Pennsylvania State University
University Park, PA 16802, USA
{jpc6988, szs339}@psu.edu
2 School of Science, Engineering, and Technology
The Pennsylvania State University, Harrisburg
Middletown, PA, 17057, USA
{daa5724, mpk5904}@psu.edu

Introduction

Recent research [1, 2] in brain image segmentation have made significant progress in coupling local, data-efficient convolution operations with global, expressive spatial-mixing layers such as Transformer [3] and Mamba [4]. Enhanced by the latest backbone structures in the general domain, these models overcome the locality of convolutions and effectively capture long-range spatial dependencies. However, they do not have inherent structure that attends to cross-scale vision information beyond those encoded by the backbone, forfeiting powerful inductive biases that are particularly important and relevant to medical segmentation.

Theoretical Work

Theoretically, we explore and generalize diagonal plus low-rank (DPLR) linear maps [5, 6] to a larger class of parametrized transformations and interpret ResNet-like architectures [7] through this lens.

Empirical Observations

Empirically, we propose CS-UNet, a UNet-like [8] 3D medical segmentation model that transforms latent features with strong visual priors, and test its throughput and performance on BraTS 2023 [9].
References
[1]
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H. R., & Xu, D. (2021). Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. In International MICCAI Brainlesion Workshop (pp. 272-284). Cham: Springer International Publishing. 
[2]
Liu, J., Yang, H., Zhou, H. Y., Xi, Y., Yu, L., Yu, Y., ... & Wang, S. (2024). Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining. arXiv preprint arXiv:2402.03302.
[3]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems30.
[4]
Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.
[5]
Zhao, Y., Li, J., Kumar, K., & Gong, Y. (2017). Extended low-rank plus diagonal adaptation for deep and recurrent neural networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5040-5044). IEEE. 
[6]
Gu, A., Goel, K., & Ré, C. (2021). Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396.
[7]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). 
[8]
Falk, T., Mai, D., Bensch, R., Çiçek, Ö., Abdulkadir, A., Marrakchi, Y., ... & Ronneberger, O. (2019). U-Net: deep learning for cell counting, detection, and morphometry. Nature Methods16(1), 67-70. 
[9]
Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J. S., ... & Davatzikos, C. (2017). Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific Data4(1), 1-13. 
[10]
Rogozhnikov, A. (2021). Einops: Clear and reliable tensor manipulations with Einstein-like notation. In International Conference on Learning Representations.
[11]
Xing, Z., Ye, T., Yang, Y., Liu, G., & Zhu, L. (2024). SegMamba: Long-range sequential modeling Mamba for 3D medical image segmentation. arXiv preprint arXiv:2401.13560.
[12]
Lee, H. H., Bao, S., Huo, Y., & Landman, B. A. (2022). 3D UX-Net: A large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076.
Copyright © Authors 2024