Linguistic features for detecting hidden network communities


Scholars need to improve modern linguistic diagnostic procedures when studying social network texts. One of the unresolved problems is the identification of linguistic features since they are significant for profiling members of hidden communities. The aim of this research is to develop a hybrid algorithm for detecting hidden network communities that takes into account the interests of users, the topics of their posts and is based on contextualized language models. The choice of this approach is due to the fact that algorithms for detecting hidden communities with the help of mathematical methods use formal parameters, but not linguistic ones. This may change the actual number of communities and their properties. The research material is a corpus of VK posts in Russian, which includes more than 10,000 texts. The authors applied the hybrid algorithm and detected 34 hidden communities in the course of the experiment. The current methodology for identifying and profiling hidden communities is of interest to media researchers who study the architecture of social networks. The approach can be implemented into existing automatic group moderation systems and network trend forecasting systems.