Current Issue Cover
一种快速有效的印刷体文字识别算法

任金昌1, 赵荣椿1, 张炜1(西北工业大学计算机科学与工程系,西安 710072)

摘 要
为了利用低成本的硬件来实现对印刷体文字的快速识别,提出了一种基于多级分类的印刷体文字快速识别算法,该算法从预处理、特征提取,到模式匹配各个阶段,都对传统方法作了合理的改进.该算法首先是采用36×36,而不是传统的48×48点阵进行归一化,从而有效地减少了计算量和字典容量;其次是采用改进的粗外围特征,并进行二重分割,以提高特征的稳定性;最后在各级分类中采用了不同的判别准则,包括绝对值距离、欧氏距离及相似度准则,以适应于时间、准确性的不同要求.同时用该算法对一级汉字7000个样本进行了实验,其结果表明,实际正确识别率(正识率)达95%,前5位累积正识率可达98%,从而为“电子阅读笔”的开发与研制打下了坚实的理论基础.
关键词
A Fast and Effective Algorithm for Printed Chinese Character Recognition

()

Abstract
In order to achieve fast printed characters recognition under low price hardware implementation, a fast algorithm based on multi-stage classification for Machine Printed Character Recognition is proposed in this paper. From preprocessing, feature extraction to pattern classification, the proposed method has reasonably improved according to traditional ones. Firstly, 36×36 matrix is applied for character normalization, rather than the traditional 48×48 matrix, to reduce the computation complexity for feature matching and spatial requirements for dictionary storage; Secondly, an improved coarse periphery feature with overlapping division is introduced to strengthen the stability of recognition; Thirdly, different judging criteria, including absolute distance, Euclidian distance and similarity matching, are adopted to match different requirements on recognition speed or accuracy. For 7000 samples of first class Chinese characters, the first and first five correct recognition ratio turns out to be 95 percent and 98 percent respectively in our experiments. The investigation has built a solid theory foundation to the research and development of our“Electronic Reading Pen”project.
Keywords

订阅号|日报