<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="elibrary">80301</journal-id>
      <journal-title-group>
        <journal-title>Terra Linguistica</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Terra Linguistica</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2782-5450</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">7</article-id>
      <article-id pub-id-type="doi">10.18721/JHSS.14307</article-id>
      <title-group>
        <article-title>Statistical methods in lexicographic research: representing frequency vocabulary</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Статистические методы в лексикографических исследованиях: представление частотной лексики</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0001-9085-0284</contrib-id>
          <contrib-id contrib-id-type="scopus">56088078800</contrib-id>
          <contrib-id contrib-id-type="researcherid">M-9533-2013</contrib-id>
          <name>
            <surname>Khokhlova</surname>
            <given-names>Maria</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>m.khokhlova@spbu.ru</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">St. Petersburg State University</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2023-09-29">
        <day>29</day>
        <month>09</month>
        <year>2023</year>
      </pub-date>
      <volume>14</volume>
      <issue>3</issue>
      <fpage>80</fpage>
      <lpage>93</lpage>
      <abstract xml:lang="en">
        <p>Statistical methods have been used in linguistics for a long time. However, recently, information technologies have boosted the development of statistical tools, which are now more actively used for applied tasks, including processing and presentation of text data. The purpose of the work is to describe a number of statistical metrics used in lexicographic studies, involving a frequency dictionary of the Russian language, text corpora and databases that present information about lexical collocability. These measures are implemented to differentiate vocabulary on different grounds, highlighting key words and phrases characteristic of texts of a certain style or topic. The paper also provides a brief historical overview of the application of quantitative methods to text analysis.</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>statistical methods</kwd>
        <kwd>text corpora</kwd>
        <kwd>frequency dictionaries</kwd>
        <kwd>collocations</kwd>
        <kwd>databases</kwd>
        <kwd>Russian language</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
