Topic modeling in automatic categorization of news texts


Topic modeling is a text mining method used for discovering underlying semantic structure in large collections of documents. In this paper, we propose a novel approach to automatic text categorization of news texts based on topic modeling techniques in combination with automatic topic label assignment. Topic modeling is performed by means of a series of algorithms including latent Diriсhlet allocation (LDA), non-negative matrix factorization (NMF), and biterm topic modeling (BTM). In addition, we adopt an approach using the ChatGPT language model in order to assign topic labels. Candidate labels are evaluated by means of human assessments. The experiments carried out within our project demonstrate that the proposed algorithm can serve as an effective tool in the task of automatic text categorization. The results obtained may be of interest to experts in the field of applied and computational linguistics, media communications, and science journalism.