Current Issue Cover
染色体核型分析深度学习方法综述

罗纯龙1,2, 赵屹1(1.中国科学院计算技术研究所泛在计算系统研究中心, 北京 100080;2.中国科学院大学, 北京 100049)

摘 要
染色体核型分析是细胞遗传学领域重要的实验技术,并逐步在包括生殖医学在内的诸多现代临床领域和科学研究方面得到广泛应用,但即使是经验丰富的细胞遗传学家也需要大量时间才能完成染色体核型分析。基于传统方法的染色体核型自动化分析方法精度较低,仍需要细胞遗传学家花费大量时间、精力校正。目前基于深度学习的染色体核型自动分析方法成果较多,但缺乏对该领域现状的总结、对未来发展的展望等。因此,本文对基于深度学习的染色体核型自动分析方法进行综述,归纳总结了现有的研究分析任务,挑选了具有代表性的方法并梳理解决方案,展望了未来发展方向。通过整理发现,基于深度学习的染色体核型自动化分析方法取得了很多成果,但仍存在一些问题。首先,已有的中文综述性工作仅集中于某一子领域或者调研不够全面和深入。其次,染色体核型分析任务与临床紧密结合,受各种因素制约,任务类型繁多,解决方案复杂,难以窥斑见豹。最后,现有方法主要集中于染色体分类和染色体分割任务,而诸如染色体计数、染色体预处理等任务研究成果较少,需要厘清问题,吸引更多研究人员关注。综上所述,基于深度学习的染色体核型自动分析方法仍有较大发展空间。
关键词
Review of deep learning methods for karyotype analysis

Luo Chunlong1,2, Zhao Yi1(1.Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China;2.University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract
Chromosomal abnormalities can lead to serious diseases,such as chronic myeloid leukemia and down syndrome. Karyotyping can count chromosomes in metaphase images,segment them from the background,arrange them according to certain rules,and observe and issue diagnostic results. Therefore,karyotype analysis has been widely used in many modern clinical fields and scientific research. However,even an experienced cytogeneticist requires much time to complete karyotyping. Although machine learning or traditional geometric methods have tried to automate karyotype analysis,most of them have shown poor performance and do not satisfy clinical requirements,which means that cytogeneticists still require much time for manual intervention. While many deep-learning-based methods have been proposed,systematic reviews are lacking. This paper reviews the recent literature and summarizes them into chromosome counting,chromosome segmentation,chromosome cluster classification,chromosome preprocessing,chromosome classification,and chromosome anomaly. First,the chromosome counting methods are summarized based on bounding box detection to accurately identify each chromosome on the metaphase images. Specifically,these methods need to find candidate object proposals,classify them into different classes,and refine the locations. However,they must solve self-similarity problems,over-deletion problems,and inaccurate localization problems resulting from overlapping chromosomes. Researchers have also attempted to accelerate model inference speed through lightweight backbones. Methods for the chromosome segmentation task can be divided into semantic and instance segmentation methods. On the one hand,semantic segmentation methods can only solve the problem of segmenting chromosome clusters formed by two or more overlapping chromosomes,and some postprocessing should be introduced to splice chromosomes. On the other hand,instance segmentation methods can automate chromosome segmentation,and additional supervision information,such as key points or orientation information,can further improve its performance. Given that some chromosome segmentation methods can only solve a specific type of chromosome cluster,the types of clusters should be identified. Existing methods roughly classify chromosome clusters according to two criteria,namely,based on the number of overlapping chromosomes and based on the interrelationship between the touching and overlapping chromosomes. However,from the methodological perspective,previous studies are mostly based on simple convolution neural networks(CNNs). Therefore,further innovative studies on chromosome cluster classification are required. As for the chromosome preprocessing task,existing methods mainly address the two preprocessing tasks of metaphase image denoising and chromosome straightening. The metaphase image denoising task is solved in a segmentation manner,where the chromosomes are regarded as a whole area that needs to be segmented from the background and impurities present in an image. The existing chromosome straightening methods rely on generative adversarial networks to straighten curved chromosomes and generally follow the image translation or motion transformation framework. Benefiting from the booming development of deep-learning-based image classification networks,the chromosome classification task has also received much attention and development in karyotype-analysis-related tasks. According to their properties,the available methods can be divided into 1)simple CNN-based methods,which redesign the network aiming at chromosome instances instead of directly using the famous CNN model proposed for the ImageNet dataset;2)feature-contrastive-based methods,which extract representative features in a contrastive manner and then classify them through a simple classifier; 3)image-preprocessing-based methods,where super-resolution methods are applied before classification to unify the size of chromosome images or enhance the banding pattern features using different filters;4)global- and local-feature-fusionbased methods,which explicitly crop and extract features of the local but important image parts and then fuse them for final classification;and 5)complex-strategy-based methods,which solve the chromosome classification task by detecting chromosomes from metaphase images and improve performance using the ensemble learning framework. The final reviewed task is chromosome anomaly that includes detection and generation subtasks. Despite being a subject of concern for clinical experts,previous studies can only detect a specific type of chromosome anomaly through basic CNN or roughly discriminate between normal and abnormal chromosomes using the generative adversarial network framework. Meanwhile,the available approaches for generation subtasks are based on generative adversarial networks. At the end of this paper,the various tasks and main methodologies are summarized and reviewed,and then feasible future developments are proposed. First,to fulfill these tasks,multiple advanced solution paradigms,such as multi-modality and image question answering,should be introduced. Second,chromosomal abnormality diagnosis has not been addressed because it involves the extraction of band-level features and relational reasoning. Third,pretraining models in a self-supervised learning manner are worth further research. Despite the unavailability of high-quality labeled data for chromosomes,a large amount of clinically unlabeled data can still reduce the cost of data labeling and improve the performance of downstream tasks through the self-supervised learning paradigm. In sum,deep-learning-based automatic karyotyping methods should be reviewed further to draw additional research interest.
Keywords

订阅号|日报