Current Issue Cover
基于轮廓投影方法的文本图象偏斜纠正

李存华1(淮海工学院计算机科学系,连云港 222005)

摘 要
印刷文献信息采集处理是文本信息处理应用,特别是数字化图书馆建设中十分繁重而又必须从事的工作。由于目前广泛使用的字符光学识别系统(OCR)无法对具有偏斜角度的扫描文本图象进行自动加工处理,所以需要大量的人工介入,即以手工方法纠正图象偏斜。因为无法有效地进行扫描文本集的批量处理,所以难以提高处理效率。针对这一问题,在讨论文本图象轮廓投影性质的基础上,利用其相关系数与文本偏斜角的统计依赖关系,构造了一个用于文本图象的自动偏斜纠正方法。
关键词
Deflect Correction for Document Image Based on its Schema Histogram

()

Abstract
Using OCR tools to transform scanned document images into editable text files is a important way in printed documents processing, such as those in text retrieving applications and digital library projects. Nevertheless, the OCR systems that we generally employed can not work correctly and efficiently with document images having deflections. Trying to manipulate this deflection correction procedure automatically, We study the properties of the image' s schema histogram and it' s correlation series. The result shows that under a small angle of deflection (less than 8°),the horizontal correlation series varies negative exponentially with the angle of deflection. For this we construct a scheme that can adjust the deflection automatically depend on the image' s histogram pattern.To do this, we first choose a non-deflected sample image from the image set to find its correlation series which is in turn used to construct the negative exponential function. This experiential function can be used to determine the deflection angles of the whole set of document image. Practically, this method has shown very good performance in automatic deflection correction.
Keywords

订阅号|日报