Abstract:
The classification of text and non-text block is an important problem in document analysis. This paper focuses on text and non-text classification, which plays a major role in the consequent processes of Optical Character Recognition (OCR). The system consists of binarization using Otsu’s method, noise removal using median filter, skew detection and correction using Radon transform, segmentation, feature extraction and text/non-text classification. The proposed method for text and non-text classification is a combination of two techniques: decision rule with density features and Support Vector Machines (SVMs) with Histogram of Oriented Gradients (HOG) features. The text and non-text classification is performed by segmenting the medical prescription image into blocks using a run-length smearing algorithm and projection techniques. Moreover, the classification is performed by using binary SVMs with HOG features and a decision rule by density feature. Experiments have been carried out with a dataset of 50 medical prescription images and achieved classification rates of 92.47% using a decision rule by density feature and SVM with HOG features.