Current Issue Cover
一种脱机手写体汉字识别的容错编码方法研究

王建平1, 赵丽欣1, 王金玲1(合肥工业大学电气及自动化工程学院,合肥 230009)

摘 要
手写体汉字识别是字符识别领域中的难点。为了使机器识别汉字适应于手写体汉字的变形等因素,基于人类认识汉字的容错机理,提出了一种用于机器识字的汉字容错编码方法,以提高手写体汉字识别率。该编码方法首先对横竖撇捺笔划形态给出了模糊化表示;然后定义了仿人拆字的字元集,并给出了易混淆笔划字元的多归类容错编码;接着给出了笔划字元的顺序判断规则和归结了36类简单常用字的部首子结构,并给出冗余的容错编码;进而建立了仿人构字的汉字编码规则和具有容错性的多模板字典,并对《新华字典》中收录的10000余个单字汉字进行了标准编码,重码率为0.48%;最后对HCCORG和NKIM手写体汉字库中的100个手写体汉字进行了仿真识别,识别正确率为96%。试验结果表明,这种编码方法可生成多模板字典,不仅对手写体汉字变形具有较好的容错性,且重码率和误识率较低。
关键词
A Study of Chinese Characters Code of Bearable Mistakes Method for Off-line Handwritten Chinese Characters Recognition

()

Abstract
Handwritten Chinese characters recognition is the difficulty of character recognition.Based on the mechanism of apery imitation,a kind of Chinese characters codes for computer cognition is presented in this paper to apply to the deformation factors of handwritten Chinese characters and to improve the recognition rate of Chinese characters.The configurations of horizontal stroke,upright stroke,left-falling stroke and right-falling stroke are defined in a fuzzy way.Elements groups of Chinese characters are made for machine cognition.Bearable mistakes codes of various categories are given to the elements which are easily confused.Rules for judging stroke sequence are given.36 kinds of subsidiary configurations codes and bearable mistakes codes are constructed.The code principles and multi-template dictionary of Chinese characters which agree with apery imitation are established.10 000 Chinese characters in Xin Hua Dictionary are standardized coded,the rate of repeated codes of which is 0.48%.After testing the recognition on 100 handwritten Chinese characters in the handwritten Chinese character library of HCCORG and NKIM,the recognition rate is 96%.Emulational experimental results show that this kind of coding applies to the deformation of handwritten Chinese characters well and the rates of repeated codes and wrong codes are low.
Keywords

订阅号|日报