Selection of term extractors to identify nominations of language policy concepts in the texts of official documents of the European Union

Applied Linguistics
Authors:
Abstract:

Term extractors are automatic tools that help identify term candidates in a corpus. The diversity of term extractors requires the development of criteria that allow their selection for specific research tasks. The purpose of this article is to carry out a comparative analysis of term extractors in terms of their accessibility and effectiveness when extracting term candidates from the corpora to solve a specific research problem, namely the inventory of nominations of language policy concepts from the texts of official documents of the European Union. The study is based on a set of modern scientific methods, namely taxonomic method, explanatory description, generalization, comparative analysis. The study analyses 5 term extractors, namely, AntConc, fivefilters.org, OneClick Terms, TerMine, Terminology Extraction and corpus query tool Sketch Engine. The taxonomic analysis identified the optimal tools according to the criteria: the online extractor OneClick Terms and the corpus query tool Sketch Engine. These tools were then compared in terms of solving the research problem mentioned above. In order to test the term extractors in terms of their effectiveness, the results were compared with a list of manually extracted terms, which then allowed the application of the criteria of completeness and accuracy traditionally used in information retrieval to compare the performance of the extractors. Given the specific research objective of the term inventory, completeness was the most important characteristic and in this respect the corpus query tool Sketch Engine proved to be the optimal extractor. Thus, this paper presents a comprehensive approach to determining the effectiveness of terminological extractors not in terms of extracting terms that reflect the concepts of a particular subject area, but in terms of their effectiveness for solving a specific research problem.