Digital Repository

Feature Extraction from Old Tamil Newspapers Using Histogram Minima

Show simple item record

dc.contributor.author Kasthuri, S.
dc.contributor.author Darsha, M.
dc.contributor.author Ranathunga, L.
dc.date.accessioned 2018-08-09T07:39:48Z
dc.date.available 2018-08-09T07:39:48Z
dc.date.issued 2018
dc.identifier.citation Kasthuri, S., Darsha, M. and Ranathunga, L. (2018). Feature Extraction from Old Tamil Newspapers Using Histogram Minima. 3rd International Conference on Advances in Computing and Technology (ICACT ‒ 2018), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. p4. en_US
dc.identifier.uri http://repository.kln.ac.lk/handle/123456789/18972
dc.description.abstract Archaeological records which provide information about the history of human cultures and past events. Newspapers can be considered as one of the main sources of gathering archaeological data. It can be seen that there exist only a few numbers of systems for the processing of old Tamil newspaper articles. An automated image processing system proposed as a suitable solution to the way of efficient and flexible searching approach, which can be used for old Tamil newspapers. In this paper is presented image processing technique to extract the features such as headlines and sub-headlines from old Tamil newspaper scanned images. Historical newspapers become damaged over time. The images of these newspapers become difficult to read the contents. The quality of the image improved by preprocessing techniques such as grayscale dilation, median filtering, and adaptive binarization. It helps to easily extract needed information on the image. Segment the article and identify the heading of the article will help to improve data manipulation. Feature extraction from old Tamil newspaper images followed these step processes; Horizontal smoothing is necessary to distinguish the paragraphs and empty space between each column; Vertical smoothing is implemented to distinguish between each paragraph and headlines; Logical AND operation combines the outcome of horizontal smoothing and vertical smoothing using AND operation; Height measurement of each block is followed by horizontal projection, that involves scanning of pixels through horizontal arrays to measure the black pixel density against index of each row by using horizontal histogram minima. This step identified horizontals breaking points of individual regions within an article. The four major horizontal regions are headlines, sub-headline, text, and graphics. The irregular block may contain images within texts. Vertical projection can be carried out to distinguish the images among text. In the evaluation process used fifty articles which have different format of paragraph arrangements and also include images. First, identified and got the count of regions manually. After that compared the result from identified regions and got the measurements. The region was identified with articles in the efficiency of 80.09%, headline extraction accuracy was 81.616%. en_US
dc.language.iso en en_US
dc.publisher 3rd International Conference on Advances in Computing and Technology (ICACT ‒ 2018), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. en_US
dc.subject archaeological records en_US
dc.subject image processing en_US
dc.subject headline extraction en_US
dc.subject histogram analysis en_US
dc.title Feature Extraction from Old Tamil Newspapers Using Histogram Minima en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account