多核多示例学习的糖尿病视网膜病变诊断
任福龙1,2, 曹鹏1,3, 杨金柱1,3, 万超4, 赵大哲1,3(1.东北大学计算机科学与工程学院, 沈阳 110819;2.软件架构新技术国家重点实验室, 沈阳 110179;3.医学影像计算教育部重点实验室, 沈阳 110819;4.中国医科大学附属第一医院眼科, 沈阳 110001) 摘 要
目的 在传统糖尿病视网膜病变(糖网)诊断系统中,微动脉瘤和出血斑病灶检测的精确性决定了最终诊断性能。目前的检测诊断方法为了保证高敏感性而产生了大量假阳性样本,由于数据集没有标注病灶区域导致无法有效地建立监督性分类模型以去除假阳性。为了解决监督性学习在糖网诊断中的问题,提出一种基于多核多示例学习的糖网病变诊断方法。方法 首先,检测疑似的微动脉瘤和出血斑病灶区域,并将其视为多示例学习模型中的示例,而将整幅图像视为示例包,从而将糖网诊断转化为多示例学习问题;其次,提取病灶区域的特征对示例进行描述,并通过极限学习机(ELM)分类算法过滤不相关示例以提升后续多示例学习的分类性能;最后,构建多核图的多示例学习模型对健康图像和糖网病变图像进行分类,以实现糖网病变的诊断。结果 通过对国际公共数据集MESSIDOR进行糖网病变诊断评估实验,获得的准确率为90.1%,敏感性为92.4%,特异性为91.4%,ROC(receiver operating characteristic)曲线下面积为0.932,相比其他算法具有较大性能优势。结论 基于多核多示例学习方法在无需提供病灶标注的情况下,能够高效自动地对糖网病变进行诊断,从而既能避免医学图像中标注病灶的费时费力,又可以免除分类算法中假阳性去除的问题,获得较好的效果。
关键词
Multi-kernel multi-instance learning based diabetic retinopathy diagnosis
Ren Fulong1,2, Cao Peng1,3, Yang Jinzhu1,3, Wan Chao4, Zhao Dazhe1,3(1.Computer Science and Engineering, Northeastern University, Shenyang 110819, China;2.The State Key Laboratory for New Technology of Software Architecture, Shenyang 110179, CHina;3.Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang 110819, China;4.Department of Ophthalmology, the First Hospital of China Medical University, Shenyang 110001, China) Abstract
Objective Diabetic retinopathy (DR) is one of the complications of diabetes and causes severe vision loss and blindness in severe cases if left untreated. A regular eye examination is important for initial diagnosis and early treatment. The change in the blood vessels of the retina is the leading cause of DR. The form of red lesions, such as hemorrhage/microaneurysm (HMA), is the first explicit sign and an important symptom of DR. Hence, in the traditional DR diagnosis system, the accuracy of HMA lesion detection determines the final diagnosis performance. The diagnosis method produces a large number of false positive samples for high sensitivity, and the supervised classification model is ineffective in removing false positives because the dataset does not label the lesion area. A new algorithm based on multi-kernel and multi-instance learning is proposed to solve the problem of supervised learning in DR diagnosis. Method First, a multi-scale morphological top-hat transformation is employed to enhance blood vessels and red lesions on the green channel image, then the main vessels of the retina are segmented by thresholding technique on the mask image obtained by binarizing the enhanced image. All regions of interest are generated by subtracting the main vessels from the mask image, and a connected-component labeling technique based on region growing is conducted to detect the suspicious HMA. The detected HMA areas are considered instances, and the entire image is considered a bag. Thus, the problem of DR diagnosis is considered a multi-instance learning problem. Second, a 37D feature based on color, texture, and shape is extracted for each candidate HMA to describe the instance in multi-instance learning. Numerous suspected HMAs are generally obtained to ensure high sensitivity in the initial detection of the lesions, but many HMAs would produce a negative effect on the performance of the multi-instance learning. An extreme learning machine (ELM)-based classifier is accordingly constructed to filter irrelevant instances for improving the multi-instance learning performance. Nevertheless, no such database that contains both the ground truth of DR diagnosis at an image label and HMA segmentation at a lesion label is publicly available. For example, the MESSIDOR dataset contains diagnosis information for DR but not the ground-truth location of HMAs, whereas the E-ophtha dataset contains the location information of HMAs without the diagnosis label. Consequently, the ELM-based classifier trained on the E-ophtha dataset cannot be applied directly to the MESSIDOR dataset due to the difference between datasets. A threshold on the output probability value of the ELM-based classifier is designed to filter the irrelevant instances, and the best threshold can be obtained by cross validation on the training set. Finally, a multi-instance learning method, mi-Graph, which assumes that the instances in a bag are not independently and identically distributed, combined with a multi-kernel learning framework, is adopted for DR diagnosis. The method implicitly constructs graphs by deriving affinity matrices and defines an efficient graph kernel considering clique information. The kernel in the multi-kernel learning is defined as a linear combination of multiple kernels, including Gaussian, polynomial, and linear kernels. As a result, a multi-instance learning model based on multi-kernel graph is constructed to classify the input retinal image into DR or no-DR status. Result The evaluation is implemented on 1 200 images from the publicly available MESSIDOR dataset, which provides the DR diagnosis results. We verify the effectiveness of the proposed method and the irrelevant instance filtration method. The contributions of different features to DR diagnosis in a multi-kernel learning framework are analyzed. We compare our method with other multi-instance learning methods, such as iterative axis parallel rectangle, expectation-maximization diverse density, citation-k-nearest neighbor, and multiple-instance support vector machine. Our method and other DR diagnosis methods on the MESSIDOR dataset are also compared. Our proposed method achieved an accuracy of 90.1%, sensitivity of 92.4%, specificity of 91.4%, and area under the receiver operating characteristic curve of 0.932. Results show that the proposed method performs better than the other methods and is comparable to previous methods. Conclusion A multi-instance learning algorithm is introduced into DR diagnosis in this study. The detected HMAs and the entire image are considered instances and a bag of multi-instance learning, respectively. The relationship among the instances in a bag is established by using a kernel graph. A multi-kernel learning framework is adopted to enhance the generalization classification performance. Consequently, a multi-instance learning model based on multi-kernel graph is constructed for DR diagnosis. The experimental results indicate that the proposed approach can be used to diagnose DR efficiently without label information of suspicious lesions to avoid the time-consuming effort of labeling the lesions by specialists and false positive reduction.
Keywords
|