<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="elibrary">80301</journal-id>
      <journal-title-group>
        <journal-title>Terra Linguistica</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Terra Linguistica</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2782-5450</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">7</article-id>
      <article-id pub-id-type="doi">10.18721/JHSS.14107</article-id>
      <title-group>
        <article-title>Dynamic topic modelling of the russian legal text corpus</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Динамическое тематическое моделирование русскоязычного корпуса юридических документов</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0002-3008-5514</contrib-id>
          <name>
            <surname>Mitrofanova</surname>
            <given-names>Olga A.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>o.mitrofanova@spbu.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Athugodage</surname>
            <given-names>Mark</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>m.athugodage@yahoo.com</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">St. Petersburg State University</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2023-03-31">
        <day>31</day>
        <month>03</month>
        <year>2023</year>
      </pub-date>
      <volume>14</volume>
      <issue>1</issue>
      <fpage>70</fpage>
      <lpage>87</lpage>
      <abstract xml:lang="en">
        <p>The article is devoted to the dynamic topic modelling analysis of legislative acts, decrees of senior officials and resolutions of the Supreme and Constitutional Courts dated 2008–2022, included into the research corpus of Russian legal documents. The article describes the procedures of corpus construction and preprocessing, training of topic models on this corpus. We consider both standard topic model and a dynamic topic model that takes into account changes in topics over time. After training the models in various conditions, a set of optimal training parameters was determined. The BERTopic library was used as the main tool for topic modelling, combining algorithms for constructing topic models and contextualized neural network models of distributed vectors. The research data may be of interest both for specialists in the field of computational linguistics as well as for sociologists, political scientists, lawyers working with legislative documents.</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>topic modelling</kwd>
        <kwd>dynamic topic model</kwd>
        <kwd>BERTopic</kwd>
        <kwd>Russian corpus of legal documents</kwd>
        <kwd>Russian gazette</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
