<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="ru">
  <front xmlns:xlink="http://www.w3.org/1999/xlink">
    <journal-meta>
      <journal-id journal-id-type="elibrary">80301</journal-id>
      <journal-title-group>
        <journal-title>Terra Linguistica</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Terra Linguistica</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2782-5450</issn>
    </journal-meta>
    <article-meta xmlns:xlink="http://www.w3.org/1999/xlink">
      <article-id pub-id-type="publisher-id">7</article-id>
      <article-id pub-id-type="doi">10.18721/JHSS.14207</article-id>
      <title-group>
        <article-title>Topic modeling in automatic categorization of news texts</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Тематическое моделирование в задаче автоматической рубрикации новостных текстов</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Ten</surname>
            <given-names>Lia</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>lia.ten136@gmail.com</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">St. Petersburg State University</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2023-06-30">
        <day>30</day>
        <month>06</month>
        <year>2023</year>
      </pub-date>
      <volume>14</volume>
      <issue>2</issue>
      <fpage>77</fpage>
      <lpage>91</lpage>
      <self-uri xmlns:xlink="http://www.w3.org/1999/xlink" content-type="pdf" xlink:href="https://human.spbstu.ru/userfiles/files/articles/2023/2/77-91.pdf"/>
      <abstract xml:lang="en">
        <p>Topic modeling is a text mining method used for discovering underlying semantic structure in large collections of documents. In this paper, we propose a novel approach to automatic text categorization of news texts based on topic modeling techniques in combination with automatic topic label assignment. Topic modeling is performed by means of a series of algorithms including latent Diriсhlet allocation (LDA), non-negative matrix factorization (NMF), and biterm topic modeling (BTM). In addition, we adopt an approach using the ChatGPT language model in order to assign topic labels. Candidate labels are evaluated by means of human assessments. The experiments carried out within our project demonstrate that the proposed algorithm can serve as an effective tool in the task of automatic text categorization. The results obtained may be of interest to experts in the field of applied and computational linguistics, media communications, and science journalism.</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>text categorization</kwd>
        <kwd>topic modeling</kwd>
        <kwd>topic label assignment</kwd>
        <kwd>news texts</kwd>
        <kwd>ChatGPT</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
