Digital Repository

A Comparative Study of Clustering English News Articles Using Clustering Algorithms

Show simple item record

dc.contributor.author Disayiram, N.
dc.contributor.author Rupasingha, R. A. H. M.
dc.date.accessioned 2022-10-31T08:50:09Z
dc.date.available 2022-10-31T08:50:09Z
dc.date.issued 2022
dc.identifier.citation Disayiram N.; Rupasingha R. A. H. M. (2022), A Comparative Study of Clustering English News Articles Using Clustering Algorithms, International Research Conference on Smart Computing and Systems Engineering (SCSE 2022), Department of Industrial Management, Faculty of Science, University of Kelaniya Sri Lanka. 108-113. en_US
dc.identifier.uri http://repository.kln.ac.lk/handle/123456789/25411
dc.description.abstract The news informs us of what is going on in the world. People nowadays read their interesting news on news websites. There are numerous categories of news. Each newsreader has a different preference for news categories. Sportspeople prioritize sports news, whereas technology fans pay attention to the technology segment of the news. At the end of the day, each news category is important. Every day, a large amount of information is released on news websites. News sites usually categorize the news however, not all of the categories are published on those sites. Some categories are given higher attention by news outlets, while others receive less coverage. As a result, finding an appropriate category of news is tough. These issues make it difficult for newsreaders and content seekers to find relevant sections on news websites. The clustering of English news articles by relative category provides solutions to these issues. This research aims to use clustering algorithms to cluster news articles depending on the relevant domain/cluster. We consider five news categories: politics, sports, health, technology, and business. The data collected online was converted into a vector format using the term frequency-inverse document frequency (TF-IDF) vectorization. Then, on the body of the news and the news heading, the three clustering algorithms: Expectation-Maximization (EM), Simple K-means, and Hierarchical Clustering based on an agglomerative approach were applied individually. The Waikato Environment for Knowledge Analysis (WEKA) tool's classes to clusters evaluation model are used to calculate the accuracy. The EM method had the maximum accuracy of 88.5% with the best results in terms of correctly clustered instances. The comparison between the heading of news and the body of news demonstrates that the body of news clustered the news items better than the heading of news. en_US
dc.publisher Department of Industrial Management, Faculty of Science, University of Kelaniya Sri Lanka en_US
dc.subject clustering, domain, Machine Learning, news article en_US
dc.title A Comparative Study of Clustering English News Articles Using Clustering Algorithms en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account