融合上下文和注意力的视盘视杯分割
摘 要
目的 青光眼会对人的视力造成不可逆的损伤,从眼底图像中精确地分割视盘和视杯是青光眼诊治中的一项重要工作,为有效提升视盘和视杯的分割精度,本文提出了融合上下文和注意力的视盘视杯分割方法(context attention U-Net,CA-Net)。方法 进行极坐标转换,在极坐标系下进行分割可以平衡数据分布。使用修改的预训练ResNet作为特征提取网络,增强特征提取能力。采用上下文聚合模块(context aggregation module,CAM)多层次聚合图像上下文信息,使用注意力指导模块(attention guidance module,AGM)对融合后的特征图进行特征重标定,增强有用特征;使用深度监督思想同时对浅层网络权重进行训练,同时在视杯分割网络中引入了先验知识,约束对视杯的分割。结果 在3个数据集上与其他方法进行对比实验,在Drishti-GS1数据集中,分割视盘的Dice (dice coefficient)和IOU (intersection-over-union)分别为0.981 4和0.963 5,分割视杯的Dice和IOU分别为0.926 6和0.863 3;在RIM-ONE (retinal image database for optic nerve evaluation)-v3数据集中,分割视盘的Dice和IOU分别为0.976 8和0.954 6,分割视杯的Dice和IOU分别为0.864 2和0.760 9;在Refuge数据集中,分割视盘的Dice和IOU分别为0.975 8和0.952 7,分割视杯的Dice和IOU分别为0.887 1和0.797 2,均优于对比算法。同时,消融实验验证了各模块的有效性,跨数据集实验进一步表明了CA-Net的泛化性,可视化图像也表明CA-Net能够分割出更接近标注的分割结果。结论 在Drishti-GS1、RIM-ONE-v3和Refuge三个数据集的测试结果表明,CA-Net均能取得最优的视盘和视杯分割结果,跨数据集测试结果也更加表明了CA-Net具有良好的泛化性能。
关键词
Optic disc and cup segmentation by combining context and attention
Liu Hongpu1,2, Zhao Yihao1,2, Hou Xiangdan1,2, Guo Hongyong2, Ding Mengyuan2(1.School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;2.Hebei Provincial Key Laboratory of Big Data Computing, Tianjin 300401, China) Abstract
Objective Glaucoma can cause irreversible damage to vision. Glaucoma is often diagnosed on the basis of the cup-to-disc ratio (CDR). A CDR greater than 0.65 is considered to be glaucoma. Therefore, segmenting the optic disc (OD) and optic cup (OC) accurately from fundus images is an important task. The traditional methods used to segment the OD and OC are mainly based on deformable model, graph-cut, edge detection, and super pixel classification. Traditional methods need to manually extract image features, which are easily affected by light and contrast, and the segmentation accuracy is often low. In addition, these methods require careful adjustment of model parameters to achieve performance improvements and are not suitable for large-scale promotion. In recent years, with the development of deep learning, the segmentation methods of OD and OC based on the convolutional neural network (CNN), which can automatically extract image features, have become the main research direction and have achieved better segmentation performance than traditional methods. OC segmentation is more difficult than OD segmentation because the OC boundary is not obvious. The existing methods based on CNN can be divided into two categories. One is joint segmentation; that is, the OD and OC can be segmented simultaneously using the same segmentation network. The other is two-stage segmentation; that is, the OD is segmented first, and then the OC is segmented. Previous studies have shown that the accuracy of joint segmentation is often inferior to that of two-stage segmentation, and joint segmentation can lead to biased optimization result to OD or OC. However, the connection between OD and OC is often ignored in two-stage segmentation. In this study, U-Net is improved, and a two-stage segmentation network called context attention U-Net (CA-Net) is proposed to segment the OD and OC sequentially. The prior knowledge is introduced into the OC segmentation network to further improve the OC segmentation accuracy. Method First, we locate the OD center and crop the region of interest (ROI) from the whole fundus image according to the OD center, which can reduce irrelevant regions. The size of the ROI image is 512×512. Then, the cropped ROI image is transferred by polar transformation from the Cartesian coordinate into the polar coordinate, which can balance disc and cup proportion. Because the OC region always accounts for a low proportion, it easily leads to overfitting and bias in training the deep model. Finally, the transferred images are fed into CA-Net to predict the final OD or OC segmentation maps. The OC is inside the OD, which means that the area that belongs to the OC also belongs to the OD. Specifically, we train two segmentation networks with the same structure to segment the OD and OC, respectively. To segment the OC more accurately, we utilize the connection between the OD and the OC as prior information. The modified pre-trained ResNet34 is used as the feature extraction network to enhance the feature extraction capability. Concretely, the first max pooling layer, the last average pooling layer, and the full connectivity layer are removed from the original ResNet34. Compared with training the deep learning model from scratch, loading pre-trained parameters on ImageNet (ImageNet Large-Scale Visual Recognition Challenge) helps prevent overfitting. Moreover, a context aggregation module (CAM) is proposed to aggregate the context information of images from multiple scales, which exploits the different sizes of atrous convolution to encode the rich semantic information. Because there will be a lot of irrelevant information during the fusion of shallow and deep feature maps, an attention guidance module (AGM) is proposed to recalibrate the feature maps after fusion of shallow and deep feature maps to enhance the useful feature information. In addition, the idea of deep supervision is also used to train the weights of shallow network. Finally, CA-Net outputs the probability map, and the largest connected region is selected as the final segmentation result to remove noise. We do not use any post-processing techniques such as ellipse fitting. DiceLoss is used as loss function to train CA-Net. We use a NVIDIA GeForce GTX 1080 Ti device to train and test the proposed CA-Net. Result We conducted experiments on three commonly used public datasets (Drishti-GS1, RIM-ONE-v3, and Refuge) to verify the effectiveness and generalization of CA-Net. We trained the model on the training set and reported the model performance on the test set. The Drishti-GS1 dataset was split into 50 training images and 51 test images. The RIM-ONE-v3 dataset was randomly split into 99 training images and 60 test images. The Refuge dataset was randomly split into 320 training images and 80 test images. Two measures were used to evaluate the results, namely, Dice coefficient (Dice) and intersection-over-union (IOU). For OD segmentation, the Dice and IOU obtained by CA-Net are 0.981 4 and 0.963 5 on the Drishti-GS1 dataset and 0.976 8 and 0.954 6 on the retinal image database for optic nerve evaluation(RIM-ONE)-v3 dataset, respectively. For OC segmentation, the Dice and IOU obtained by CA-Net are 0.926 6 and 0.863 3 on the Drishti-GS1 dataset and 0.864 2 and 0.760 9 on the RIM-ONE-v3 dataset, respectively. Moreover, CA-Net achieved a Dice of 0.975 8 and IOU of 0.952 7 in the case of OD segmentation and a Dice of 0.887 1 and IOU of 0.797 2 in the case of OC segmentation on the Refuge dataset, which further demonstrated the effectiveness and generalization of CA-Net. We also used the Refuge training dataset to train CA-Net and directly evaluated it on the Drishti-GS1 and RIM-ONE-v3 testing datasets. In addition, ablation experiments on the three datasets also showed the effectiveness of each module in the network, such as AGM, CAM, polar transformation, and deep supervision. The experiments also showed that CA-Net could achieve higher segmentation accuracy in the case of Dice and IOU when compared with U-Net, M-Net, and DeepLab v3+. The visual segmentation results also proved that CA-Net could achieve segmentation results more similar to ground truth. Conclusion This study presents a new two-stage segmentation method based on U-Net for OD and OC segmentation, which is proved to be effective. The experiments showed that CA-Net can obtain better results than other methods on the Drishti-GS1, RIM-ONE-v3, and Refuge datasets. In the future, we will focus on the problem of domain adaptation and solve the problem of OD and OC segmentation when the training samples are insufficient.
Keywords
glaucoma optic disc(OD) optic cup(OC) context aggregation module(CAM) attention guidance module(AGM) deep supervision prior knowledge
|