Searching for multicomponent terms in comparable scientific corpora

Applied Linguistics

The paper suggests the use of full-text parallel/comparable corpora with a “built-in” part of machine translation (MT) results for term extraction, harmonization and translation, since analysis and comparison of these texts will assure the possibility to identify terminological units for dictionary entries. We focus on the complicated and non-parallel structure of English multicomponent terminological noun phrases (NPs), their variants and modifications within the same text, which determine the need for a three-part text corpus, including parallel/comparable texts and their MT translation. The research has proved that multicomponent terminological NPs are not only specific for a scientific text, but they demonstrate ambiguous dependency relations, caused by their syntactic compression, which normally is the result of a sentence or of another NP convolution. These modifications are results of a number of standard procedures described in the paper.