Current Issue Cover
基于模式链分析的文本页面图像的分割与分类

李艳玲1, 王加俊1(苏州大学电子信息学院,苏州 215021)

摘 要
为了能对复杂版式的文本图像(如包含镶嵌在文字中的形状不规则的图片区)的页面进行图文分割与分类,提出了一种新的基于模式链分析的文本页面分割与分类算法。该算法首先使用外接矩形框出图像中的所有黑像素,并且存入矩形框链表中,再组合所有相邻的矩形进而形成模式,最后依据各模式的统计特征分类,输出文字区和图片区两类图像。另外,对大图片模式周围个别不确定的模式,本文采用了上下文分类的算法进行再次分类。实验结果表明,该算法不仅运算速度快,而且能够对复杂版式的页面图像进行正确的图文分割和分类。
关键词
Document Page Segmentation and Classification Based on Pattern-list Analysis

()

Abstract
In this paper, a new algorithm based on pattern-list analysis is introduced for page segmentation and classification of document images with irregular-shaped halftone regions embedded in the text regions. This algorithm is composed of three steps. The first step, all the black pixels are extracted by the bounding-boxes and are stored in a linked rectangle-list. The second step, all connected rectangles are grouped to form patterns and pattern-list. At last, the page images are classified into text regions and halftone regions according to their the statistical features. After above three steps, still uncertain patterns are further classified by the type of contextual patterns. Experimental results show the fastness of the proposed algorithm in segmenting text and halftone regions and its excellent performance for complex document images.
Keywords

订阅号|日报