局部关联度最优的手写汉字骨架提取
摘 要
目的 研究手写汉字图像时,骨架是最为常见的切入点之一。利用传统细化算法提取手写汉字骨架,容易在笔画交叉等情况复杂的区域产生形变。针对此问题,提出一种基于局部关联度的手写汉字骨架提取算法。方法 首先对手写汉字图像进行细化以获取原始骨架,按照端点、普通点和复杂点3种类别标注骨架点;利用8邻域窗口扫描相互连通的复杂点,检测并提取复杂区域;删除复杂区域,将原始骨架拆分为若干简单笔画段,形变部分在此过程中被一并移除;提取局部子段,根据笔画段间的方向差异程度和曲率变化程度,计算局部关联度;制定一种局部关联度最优的连接策略,对满足连接条件的笔画段进行插值补偿,从而修正形变,并得到完整的汉字骨架。结果 对于600个实验样本,从骨架直接检测复杂区域所得结果十分接近理想情况,而轮廓法所得数量是理论值的2.5倍;基于局部关联度重组笔画段,绝大多数形变得到修正,重组后的骨架符合真实拓扑结构;以标准骨架为参考,骨架提取准确率达到了98.41%。结论 局部关联度最优的手写汉字骨架提取算法,能够有效检测复杂区域,对形变具有良好的修正作用,提取所得骨架能够正确反映复杂笔画间的位置结构关系,是一种实用有效的骨架提取方法。
关键词
Skeleton extraction algorithm based on optimum local correlation degree for handwritten Chinese characters
Zhou Zhengyang1,2, Zhan Enqi1, Zheng Jianbin1, Hu Huacheng1(1.School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China;2.Key Laboratory of Fiber Optic Sensing Technology and Information Processing(Wuhan University of Technology), Ministry of Education, Wuhan 430070, China) Abstract
Objective Studies on handwritten Chinese characters, such as those on signature verification and text recognition, have been conducted for many years. The skeleton is a key point in these studies. It reduces redundant information but retains a complete topology structure. Using a thinning algorithm to extract a skeleton from a handwritten Chinese character image is a traditional approach. However, distortions exist in the extracted skeleton primarily because the complex areas are not well detected nor processed. Complex areas are the intersections and junctions of strokes. Considering that characters are saved as static images, a computer cannot recognize the existence of these areas with more than one stroke. The computer still regards these areas as an entirety, so the thinning algorithm does not perform well. To solve distortion, this study proposes a skeleton extraction algorithm based on the optimum local correlation degree for handwritten Chinese characters.Method A simple and effective method to extract complex areas is designed. This method uses a thinning algorithm to obtain the original skeleton. The points on the skeleton are classified as end, common, and complex points. Complex areas are extracted by detecting connected complex points with an eight-neighbor window. Afterward, the information on complex areas is used to modify the original skeleton. The modification algorithm is based on a strategy involving split and reconstruction. The skeleton is split into several stroke segments because all complex areas are removed. Distortions are also eliminated in the removal. The reconstruction step focuses on the reconnection of stroke segments; it analyzes the relationship among stroke segments to restore the skeleton. The directional relationship is considered. The slope between two end points of a segment may not accurately represent the correct direction because the stroke segments are not always straight. Sub-segments adjacent to a complex area can provide the required directional information. In most cases, two stroke segments that are originally connected possess similar directions. However, in several situations, obtaining the direction is insufficient when determining whether two stroke segments belong to one natural stroke. Consequently, the curvature relationship should also be considered. A concept of local correlation degree is proposed based on the relationship of direction and curvature between sub-segments. The correlation degree is designed to be sensitive to the change in direction. The correlation degrees of any two stroke segments in one complex area are calculated. When two stroke segments share the optimal local correlation degree, they are regarded as a pair of continuous segments. The connection step uses interpolation to restore the removed part between continuous segments. Discontinuous segments are provided a proper extension to prevent an incorrect connection. By connecting the stroke segments, the split skeleton is reconstructed, and distortions are modified.Result Twenty people are asked to write 600 Chinese character samples for the experiment using different pens. All images are denoised and binarized. The use of the eight-neighbor window to detect complex areas in the skeleton provides a good effect. The number of detected complex zones in the 600 samples is close to the theoretical value, whereas that obtained with the contour method is 2.5 times the theoretical value. Most distortions are modified with the local correlation degree, and the reconstructed skeleton approximates the real topology. With the standard skeleton as a criterion, the accuracy of skeleton extraction is 98.41%.Conclusion The proposed skeleton extraction algorithm for handwritten Chinese characters uses a strategy involving split and reconstruction. Reconstruction is based on the optimum local correlation degree. The proposed method has two main advantages over other methods. First, complex area detection is considerably improved. Other methods detect complex areas mainly through the analysis of turning points on the contour. Unlike these methods, the proposed method implements detection directly from the skeleton. The method is simple and avoids excessive detection. Second, the stroke extraction algorithm provides a good result on distortion modification. Removing complex areas with distortions and reconnecting stroke segments through interpolation provide an efficient solution. The extracted skeletons retain good shapes, and the position relationships among strokes are correct. To conclude, the proposed stroke extraction method demonstrates high accuracy and processing speed. It is an effective and useful method for applications dealing with handwritten Chinese characters.
Keywords
|