医学3D计算机视觉:研究进展和挑战
摘 要
医学影像的诊断是许多临床决策的基础,而医学影像的智能分析是医疗人工智能的重要组成部分。与此同时,随着越来越多3D空间传感器的兴起和普及,3D计算机视觉正变得越发重要。本文关注医学影像分析和3D计算机的交叉领域,即医学3D计算机视觉或医学3D视觉。本文将医学3D计算机视觉系统划分为任务、数据和表征3个层面,并结合最新文献呈现这3个层面的研究进展。在任务层面,介绍医学3D计算机视觉中的分类、分割、检测、配准和成像重建,以及这些任务在临床诊断和医学影像分析中的作用和特点。在数据层面,简要介绍了医学3D数据中最重要的数据模态:包括计算机断层成像(computed tomography,CT)、磁共振成像(magnetic resonance imaging,MRI)、正电子放射断层成像(positron emission tomography,PET)等,以及一些新兴研究提出的其他数据格式。在此基础上,整理了医学3D计算机视觉中重要的研究数据集,并标注其数据模态和主要视觉任务。在表征层面,介绍并讨论了2D网络、3D网络和混合网络在医学3D数据的表征学习上的优缺点。此外,针对医学影像中普遍存在的小数据问题,重点讨论了医学3D数据表征学习中的预训练问题。最后,总结了目前医学3D计算机视觉的研究现状,并指出目前尚待解决的研究挑战、问题和方向。
关键词
Advances and challenges in medical 3D computer vision
Yang Jiancheng, Ni Bingbing(Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China) Abstract
Medical imaging is an important tool used for medical diagnosis and clinical decision support that enables clinicians to view the internal of human bodies. Medical image analysis, as an important part of healthcare artificial intelligence, provides fast, smart, and accurate decision supports for clinicians and radiologists. 3D computer vision is an emerging research area with the rapid development and popularization of 3D sensors (e.g., light detection and ranging (LIDAR), RGB-D cameras) and computer-aided design in game industry and smart manufacturing. In particular, we focus on the interface of medical image analysis and 3D computer vision called medical 3D computer vision. We introduce the research advances and challenges in medical 3D computer vision in three levels, namely, tasks (medical 3D computer vision tasks), data (data modalities and datasets), and representation (efficient and effective representation learning for 3D images). First, we introduce classification, segmentation, detection, registration, and reconstruction in medical 3D computer vision at the task level. Classification, such as malignancy stratification and symptom estimation, is an everyday task for clinicians and radiologists. Segmentation denotes assigning each voxel (pixel) a semantic label. Detection refers to localizing key objects from medical images. Segmentation and detection include organ segmentation/detection and lesion segmentation/detection. Registration, that is, calculating the spatial transformation from one image to another, plays an important role in medical imaging scenarios, such as spatially aligning multiple images from serial examination of a follow-up patient. Reconstruction is also a key task in medical imaging that aims at fast and accurate imaging results to reduce patients' costs. Second, we introduce the important data modalities in medical 3D computer vision, such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), at the data level. The principle and clinical scenario of each imaging modality are briefly discussed. We then depict a comprehensive list of medical 3D image research datasets that cover classification, segmentation, detection, registration, and reconstruction tasks in CT, MRI, and graphics format (mesh). Third, we discuss the representation learning for medical 3D computer vision. 2D convolutional neural networks, 3D convolutional neural networks, and hybrid approaches are the commonly used methods for 3D representation learning. 2D approaches can benefiting from large-scale 2D pretraining, triplanar, and trislice 2D representation for 3D medical images, whereas they are generally weak in capturing large 3D contexts. 3D approaches are natively strong in 3D context. However, few publicly available 3D medical datasets are large and sufficiently diverse for universal 3D pretraining. For hybrid (2D + 3D) approaches, we introduce multistream and multistage approaches. Although they are empirically effective, the intrinsic disadvantages within the 2D/3D parts still exist. To address the small-data issues for medical 3D computer vision, we discuss the pretraining approaches for medical 3D images. Pretraining for 3D convolutional neural network(CNN) with videos is straightforward to implement. However, a significant domain gap is found between medical images and videos. Collecting massive medical datasets for pretraining is theoretically feasible. However, it only results in thousands of 3D medical image cases with tens of medical datasets, which is significantly smaller compared with natural 2D image datasets. Research efforts exploring unsupervised (self-supervised) learning to obtain the pretrained 3D models are reported. Although its results are extremely impressive, the model performance of up-to-date unsupervised learning is incomparable with that of fully supervised learning. The unsupervised representation learning from medical 3D images cannot leverage the power of massive 2D supervised learning datasets. We introduce several techniques for 2D-to-3D transfer learning, including inflated 3D(I3D), axial-coronal-sagittal(ACS) convolutions, and AlignShift. I3D enables 2D-to-3D transfer learning by inflating 2D convolution kernels into 3D, and ACS convolutions and AlignShift enable that by introducing novel operators that shuffle the features from 3D receptive fields into a 2D manner. Finally, we discuss several research challenges, problems, and directions for medical 3D computer vision. We first determine the anisotropy issue in medical 3D images, which can be a source of domain gap, that is, between thick- and thin-slice data. We then discuss the data privacy and information silos in medical images, which are important factors that lead to small-data issues in medical 3D computer vision. Federated learning is highlighted as a possible solution for information silos. However, numerous problems, such as how to develop efficient systems and algorithms for federated learning, how to deal with adversarial participators in federated learning, and how to deal with unaligned and missing data, are found. We determine the data imbalance and long tail issues in medical 3D computer vision. Efficient and effective learning of representation from the noisy, imbalanced, and long-tailed real-world data can be extremely challenging in practice because of the imbalanced and long-tailed distributions of real-world patients. We mention the automatic machine learning as a future direction of medical 3D computer vision. With end-to-end deep learning, the development and deployment of medical image application is inapplicable. However, excessive engineering staff need to be tuned for a new medical image task, such as design of deep neural networks, choices of data argumentation, how to preform data preprocessing, and how to tune the learning procedure. The tuning of these hyperparameters can be performed with a hand-crafted or intelligent system to reduce the research efforts by numerous researchers and engineers. Thus, medical 3D computer vision is an emerging research area. With increasing large-scale datasets, easy-to-use and reproducible methodology, and innovative tasks, medical 3D computer vision is an exciting research area that can facilitate healthcare into a novel level.
Keywords
medical image analysis 3D computer vision deep learning convolutional neural networks(CNN) pre-training
|