迁移学习下高分快视数据道路快速提取
摘 要
目的 传统的道路提取方法自动化程度不高,无法满足快速获取道路信息的需求。使用深度学习的道路提取方法多关注精度的提升,网络冗余度较高。而迁移学习通过将知识从源领域迁移到目标领域,可以快速完成目标学习任务。因此,本文利用高分辨率卫星快视数据快速获取的特性,构建了一种基于迁移学习的道路快速提取深度神经网络。方法 采用基于预训练网络的迁移学习方法,可以将本文整个道路提取过程分为两个阶段:首先在开源大型数据库ImageNet上训练源网络,保存此阶段最优模型;第2阶段迁移预训练保存的模型至目标网络,利用预训练保存的权重参数指导目标网络继续训练,此时快视数据作为输入,只做目标任务的定向微调,从而加速网络训练。总体来说,前期预训练是一个抽取通用特征参数的过程,目标训练是针对道路提取任务特化的过程。结果 本文构建的基于迁移学习的快速道路提取网络,迁移预训练模型与不迁移相比验证精度提升6.0%,单幅尺寸为256×256像素的数据测试时间减少49.4%。快视数据测试集平均精度可达88.3%。截取一轨中7 304×6 980像素位于天津滨海新区的快视数据,可在54 s内完成道路提取。与其他迁移模型对比,本文方法在快速预测道路的同时且能达到较高的准确率。结论 实验结果表明,本文针对高分卫星快视数据,提出的利用预训练模型初始化网络能有效利用权重参数,使模型趋于轻量化,使得精度提升的同时也加快了提取速度,能够实现道路信息快速精准获取。
关键词
Rapid road extraction from quick view imagery of high-resolution satellites with transfer learning
Zhang Junjun1,2, Wan Guangtong1, Zhang Hongqun1, Li Shanshan1, Feng Xuxiang1(1.Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, China;2.University of Chinese Academy of Sciences, Beijing 100049, China) Abstract
Objective Quick view data generated by high-resolution satellites provide real-time reception and full resolution for quick view imaging. Such imaging offers a timely source of data for practical applications, such as fire detection, moving window display, disaster observation, and military information acquisition. Road extraction from remote sensing images has been a popular research topic in the field of remote sensing image analysis. Traditional object-oriented methods are not highly automated, and road features require prior knowledge for manual selection and design. These conditions lead to problems in real-time road information acquisition. The popular deep learning road extraction method mainly focuses on the improvement of precision and lacks research on the timeliness of road information extraction. Transfer learning can rapidly complete the task in the target area through weight sharing among different fields and make the model algorithm highly personalized. A transfer learning deep network for rapidly extracting roads is constructed to utilize quick view data from high-resolution satellites. Method First, we propose a least-square fitting method of devignetting to solve the most serious radiation problem of TDICCD (time delay and integration charge coupled devices) vignetting phenomenon appearing in raw quick view data. The results of the preprocessing of the quick view data serve as our training dataset. Then, we choose LinkNet as the target network after comparing the performance among different real-time semantic segmentation networks, such as ENet, U-Net, LinkNet, and D-LinkNet. LinkNet is efficient in computation memory, can learn from a relatively small training set, and allows residual unit ease training of deep networks. The rich bypass links each encoder with decoder. Thus, the networks can be designed with few parameters. The encoder starts with a kernel of size 7×7. In the next encoder block, its contracting path to capture context uses 3×3 full convolution. We use batch normalization in each convolutional layer, followed by ReLU nonlinearity. Reflection padding is used to extrapolate the missing context in the training data for predicting the pixels in the border region of the input image. The input of each encoder layer of LinkNet is bypassed to the output of its corresponding decoder. Lost spatial information about the max pooling can then be recovered by the decoder and its upsampling operations. Finally, we modify LinkNet to keep it consistent with ResNet34 network layer features, the so-called fine tuning, for accelerating LinkNet network training process. Fine tuning is a useful efficient method of transfer learning. The use of ResNet34 weight parameter pretrained on ImageNet initializing LinkNet34 can accelerate the network convergence and lead to improved performance with almost no additional cost. Result In the process of devignetting quick view data, the least-square linear fitting method proposed in this study can efficiently remove the vignetting strip of the original image, which meets practical applications. In our road extraction experiment, LinkNet34 using the pretrained ResNet34 as encoder has a 6% improvement in Dice accuracy compared with that when using ResNet34 not pretrained on the valid dataset. The time consumption of a single test feature map is reduced by 39 ms, and the test Dice accuracy can reach 88.3%. Pretrained networks substantially reduce training time that also helps prevent overfitting. Consequently, we achieve over 88 % test accuracy and 40 ms test time on the quick view dataset. With an input feature map size of 3×256×256 pixels, the data of Tianjin Binhai with a size of 7 304×6 980 pixels take 54 s. The original LinkNet using ResNet18 as its encoder only has a Dice coefficient of 85.7%. We evaluate ResNet50 and ResNet101 as pretrained encoders. The Dice accuracy of the former is not improved, whereas the latter takes too much test time. We compare the performance of LinkNet34 with those of three other popular deep transfer models for classification, namely, U-Net; two modifications of TernausNet and AlubNet using VGG11 (visual geometry group) and ResNet34 as encoders separately; and a modification of D-LinkNet. The two U-Net modifications are likely to incorrectly recognize roads as background or recognize something nonroad, such as tree, as road. D-LinkNet has higher Dice than LinkNet34 on the validation set, but the testing time takes 59 ms more than that of LinkNet34. LinkNet34 avoids the weaknesses of TernuasNet and AlubNet and makes better predictions than them. The small nonroad gap between two roads can also be avoided. Many methods mix the two roads into one. The method proposed in this study generally achieves good connectivity, accurate edge, and clear outline in the case of complete extraction of the entire road and fine location. It is especially suitable for rural linear roads and the extraction of area roads in towns. However, the extraction effect for complex road networks in urban areas is incomplete. Conclusion In this study, we build a deep transfer learning neural network, LinkNet34, which uses a pretrained network, ResNet34, as an encoder. ResNet34 allows LinkNet34 to learn without any significant increase in the number of parameters, solves the problem that the bottom layer features randomly initialized with weights of neural networks are inadequately rich, and accelerates network convergence. Our approach demonstrates the improvement in LinkNet34 by the use of the pretrained encoder and the better performance of LinkNet34 than other real-time segmentation architecture. The experimental results show that LinkNet34 can handle road properties, such as narrowness, connectivity, complexity, and long span, to some extent. This architecture proves useful for binary classification with limited data and realizes fast and accurate acquisition of road information. Future research should consider increasing the quick view database. The pretrained network LinkNet34 trains on the expanded quick view database and then transfers. The “semantic gap” between the source and target networks is reduced, and the data distribute similarly. These features are conducive to model initialization.
Keywords
|