Current Issue Cover
乳腺超声双模态数据的协同约束网络

杨子奇1, 龚勋1, 朱丹1, 郭颖2(1.西南交通大学信息科学与技术学院, 成都 610031;2.华北理工大学附属医院, 唐山 063000)

摘 要
目的 通过深度学习对乳腺癌早期的正确诊断能大幅提高患者生存率。现阶段大部分研究者仅采用B型超声图像作为实验数据,但是B型超声自身的局限性导致分类效果难以提升。针对该问题,提出了一种综合利用B型超声和超声造影视频来提高分类精度的网络模型。方法 针对B型超声图像及造影视频双模态数据的特性设计了一个双分支模型架构。针对传统提取视频特征中仅使用单标签的不足,制定了病理多标签预训练。并设计了一种新的双线性协同机制,能更好地融合B型超声和超声造影的特征,提取其中的病理信息并抑制无关噪声。结果 为了验证提出方法的有效性,本文设计了3个实验,前两个实验分别对B型超声和超声造影进行预训练。在造影分支,使用根据医学领域设计的病理多标签进行预训练。最后,采用前两个实验的预训练模型进行第3个实验,相比单独使用B型超声图像精度提升6.5%,比单用超声造影视频精度提高7.9%。同时,在使用双模态数据里,本文方法取得了最高精度,相比排名第2的成绩提高了2.7%。结论 本文提出的协同约束网络,能对不同模态的数据进行不同处理,以提取出其中的病理特征。一方面,多模态数据确实能从不同角度展示同一个病灶区,为分类模型提供更多的病理特征,进而提高模型的分类精度。另一方面,合适的融合方式也至关重要,能最大程度地利用特征并抑制噪声。
关键词
Cooperative suppression network for bimodal data inbreast cancer classification

Yang Ziqi1, Gong Xun1, Zhu Dan1, Guo Ying2(1.School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China;2.North China University of Science and Technology Affiliated Hospital, Tangshan 063000, China)

Abstract
Objective Computer-aided breast cancer diagnosis is a fundamental problem in the field of medical imaging. Correct diagnosis of breast cancer through deep learning can immensely improve the patients' survival rate. At present, most researchers only use B-mode ultrasound images as experimental data, but the limitation of B-mode ultrasound data makes it difficult to achieve a high classification accuracy. With the development of medical images, contrast-enhanced ultrasound (CEUS) video can provide accurate pathological information by observing the dynamic enhancement of the lesion area in temporal sequence. In view of the above ultrasound image problems, this paper proposes a network model that can comprehensively utilize B-mode ultrasound video data and CEUS video data to improve the classification accuracy. Method First, a dual-branch model architecture is designed on the basis of the characteristics of two-stream structure and dual-modal data. One branch uses a frame of B-mode ultrasound video data and Resnet34 network model to extract pathological features. The other branch uses ultrasound contrast data and R (2+1) network model to extract temporal sequence information. Second, pathological multilabel pretraining is designed in this branch using 10 pathological information in CEUS video data because of the shortcoming of traditional video feature extraction. After the two-branch network, the characteristics of B-made ultrasound data and CEUS video data are obtained. We perform bilinear fusion on the obtained features to better integrate the features of B-mode ultrasound and CEUS. To extract pathological information and suppress irrelevant noise, the extracted and fused features from the two-branch network are processed using the attention mechanism to obtain the attention weight of the corresponding feature, and the corresponding weight is applied to the original feature. Weighted ultrasound and contrast features are obtained. Finally, the features obtained through the attention mechanism are bilinearly fused to obtain the final features. Result This article designed three experiments, where the first two experiments are pretraining on B-mode ultrasound and CEUS to verify the effectiveness of the proposed method and select the network with the strongest feature extraction ability for ultrasound data. In the B-mode ultrasound data pretraining experiment, the classic VGG(visual geometry group)16-BN(batch normalization), VGG19-BN, ResNet13, ResNet34, and ResNet50 networks were selected as the backbone network of the ultrasound branch for training to select the network with the strongest extraction ability for ultrasound images. The final classification results of each network are 74.2%, 75.6%, 80.5%, 81.0%, and 92.1%. Considering that the accuracy of the Resnet50 network in the test set is only 79.3%, which is relatively different from the accuracy of the training set and resulting in serious overfitting, the Resnet34 network is used as the backbone network of B-mode ultrasound data. In the pretraining experiment of the CEUS branch, the current mainstream P3D, R3D, CM3, and R (2+1) D convolutional networks are used as the backbone network of the CEUS branch for training. The final classification results of each network are 75.2%, 74.6 %, 74.1%, and 78.4%, and the R (2+1) D network with better results in the experiment is selected as the backbone network of the CEUS branch. Pretraining using pathological multilabels is designed in accordance with the medical field. The accuracy of the experiment combining the two data is improved by 6.5% compared with the use of B-mode ultrasound images alone and improved by 7.9% compared with the single-use CEUS video. At the same time, the proposed method achieves the highest accuracy in the use of bimodal data, which increases by 2.7% compared with the highest score. Conclusion The proposed cooperative suppression network can process different modal data differently to extract the pathological features. On the one hand, multimodal data can certainly display the same lesion area from different angles, providing many pathological features for the classification model, thereby improving its classification accuracy. On the other hand, a proper fusion method is crucial because it can maximize the use of features and suppress noise.
Keywords

订阅号|日报