Current Issue Cover
跨模态注意力YOLOv5的PET/CT肺部肿瘤检测

周涛1,2, 叶鑫宇1,2, 赵雅楠1,2, 陆惠玲3, 刘凤珍1,2(1.北方民族大学计算机科学与工程学院, 银川 750021;2.北方民族大学图像图形智能处理国家民委重点实验室, 银川 750021;3.宁夏医科大学医学信息与工程学院, 银川 750004)

摘 要
目的 肺部肿瘤早期症状不典型易导致错过最佳治疗时间,有效准确的肺部肿瘤检测技术在计算机辅助诊断中变得日益重要,但在肺部肿瘤PET/CT(positron emission computed tomography/computed tomography)多模态影像中,肿瘤与周围组织粘连导致边缘模糊和对比度低,且存在病灶区域小、大小分布不均衡等问题。针对上述问题,提出一种跨模态注意力YOLOv5(cross-modal attention you only look once v5,CA-YOLOv5)的肺部肿瘤检测模型。方法 首先,在主干网络中设计双分支并行的自学习注意力,利用实例归一化学习比例系数,同时利用特征值与平均值之间差值计算每个特征所包含信息量,增强肿瘤特征和提高对比度;其次,为充分学习多模态影像的多模态优势信息,设计跨模态注意力对多模态特征进行交互式学习,其中Transformer用于建模深浅层特征的远距离相互依赖关系,学习功能和解剖信息以提高肺部肿瘤识别能力;最后,针对病灶区域小、大小分布不均衡的问题,设计动态特征增强模块,利用不同感受野的多分支分组扩张卷积和分组可变形卷积,使网络充分高效挖掘肺部肿瘤特征的多尺度语义信息。结果 在肺部肿瘤PET/CT数据集上与其他10种方法进行性能对比,CA-YOLOv5获得了97.37%精度、94.01%召回率、96.36%mAP(mean average precision)和95.67%F1的最佳性能,并且在同设备上训练耗时最短。在LUNA16(lung nodule analysis 16)数据集中本文同样获得了97.52%精度和97.45%mAP的最佳性能。结论 本文基于多模态互补特征提出跨模态注意力YOLOv5检测模型,利用注意力机制和多尺度语义信息,实现了肺部肿瘤检测模型在多模态影像上的有效识别,使模型识别更加准确和更具鲁棒性。
关键词
Cross-modal attention YOLOv5 PET/CT lung cancer detection

Zhou Tao1,2, Ye Xinyu1,2, Zhao Yanan1,2, Lu Huiling3, Liu Fengzhen1,2(1.College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China;2.The Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China;3.College of Medical Information and Engineering, Ningxia Medical University, Yinchuan 750004, China)

Abstract
Objective Cancer is the second leading cause of death worldwide,with nearly one in five patients dying from lung cancer.Many cancers have a high chance of cure through early detection and effective therapeutic care.However,atypical early symptoms of lung cancer can easily lead to missed optimal treatment time.Treatment procedures can be utilized to reduce the risk of death with the successful identification of benign and malignant cancer.Manual determination of lung cancer is a time-consuming and error-prone process,and effective and accurate lung cancer detection techniques are becoming increasingly important in computer-aided diagnosis.Method Computed tomography is a common clinical modality for examining lung conditions by localizing lesion structures through anatomical information,and positron emission computed tomography can reveal the pathophysiological features of lesions by detecting glucose metabolism.Combining positron emission computed tomography(PET)/computed tomography(CT)has been shown to be effective in cases where conventional imaging is inadequate,identifying lesions while pinpointing them,which improves accuracy and clinical value.However,in PET/CT images of lung cancer,adhesion of cancer to surrounding tissues leads to blurred edges and low contrast,and problems such as small lesion areas and uneven size distribution are encountered.A cross-modal attention YOLOv5(CA-YOLOv5)model for lung cancer detection is proposed in this paper to address the above problems.This model focuses on the following:First,a two-branch parallel self-learning attention is designed in the backbone network to learn the scaling factor using instance normalization and also calculate the amount of information contained in each feature using the difference between feature and average values.Self-learning attention enhances cancer features and improves contrast.Second,cross-modal attention is designed to facilitate the interactive learning of multimodal features to fully learn the multimodal dominant information of 3D multimodal images.Transformer is designed to model the long-range interdependence of deep and shallow layer features and learn key functional and anatomical information to improve lung cancer recognition.Third,a dynamic feature enhancement module is established to address the problem of small lesion areas and uneven size distribution using multibranch grouped dilated and deformable convolution with different sensory fields,enabling networks to mine multiscale semantic information of lung cancer features fully and efficiently.Result In a comparison test with 10 other methods,CA-YOLOv5 obtained the best performance with 97.37% precision,94.01% recall,96.36% mean average precision(mAP),and 95.67% F1 score on the PET/CT dataset of lung cancer,and the training time on the same device is the shortest.Compared with YOLOv5,each index improved by 2.55%,4.84%,3.53%,and 3.49%,respectively.On the PR curve with precision and recall as the horizontal and vertical axes,respectively,the curve area of the proposed model is optimal on each category,and the area enclosed under the curve of this model is the largest on the F1 curve with F1 score at high confidence level.The heat map of the proposed model not only identifies all the labels but also focuses on accuracy.In the LUNA16 dataset,the proposed model obtained the highest performance of 97.52% accuracy and 97.45% mAP,and the overall coverage was the largest in the precision-recall (PR)curve.Conclusion This paper established CA-YOLOv5,a lung cancer detection model.Lightweight and effective self-learning attention mechanisms are designed to enhance cancer features and improve contrast.Transformer is also created at the end of the backbone network to explore the advantages of convolution and self-attention mechanisms and extract local and global information of deep and shallow layer features.Dynamic feature enhancement modules at the feature enhancement neck are constructed to mine multiscale semantic information of lung cancer features fully and efficiently.Experimental results of the two datasets show that the proposed model in this paper has superior lung cancer recognition and strong network characterization capabilities,which effectively improve detection accuracy and reduce leakage rate.Thus,this model effectively facilitates computer-aided diagnosis and improves the efficiency of preoperative preparation.The effectiveness and robustness of the model are further verified using heat map visualization technique and LUNA16 dataset,respectively.
Keywords

订阅号|日报