Domain-specific Expertise in Economics, Business and Finance of Research Institutions in Poland
DOI:
https://doi.org/10.12775/%20EiP.2025.23Keywords
topic models, text analysis, latent Dirichlet allocation, research topics, research expertiseAbstract
Motivation: The efficacy of research institutions, including those active in the areas of economics, business and finance, is often measured by published high-quality scientific articles. Whether the scientific output depends on thematic specialization is an open question. Scientific profiles can be estimated based on the publications affiliated with the research entities using the topic modelling techniques. The results can be used to compare the research mixes of different institutions and to measure the degree of their specialization.
Aim: We apply the latent Dirichlet allocation model (LDA) to evaluate research diversity in the fields of economics, business and finance of major scientific institutions in Poland. The importance of various research areas is evaluated using two metrics.
Results: The obtained rankings of topics provide information on the distribution of expert knowledge and specialization of various research entities in Poland.
References
Arun, R., Suresh, V., Veni Madhavan, C. E. and Narasimha Murthy, M. N. (2010). On finding the natural number of topics with latent Dirichlet allocation: Some observations, in M. J. Zaki, J. X. Yu, B. Ravindran and V. Pudi (eds), Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 391–402.
Aziz, S., Dowling, M., Hammami, H. and Piepenbrink, A. (2022). Machine learning in finance: A topic modeling approach, European Financial Management 28(3): 744–770.
Bastani, K., Namavari, H. and Shaffer, J. (2019). Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints, Expert Systems with Applications 127: 256–271.
Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation, Journal of Machine Learning Research 3: 993–1022.
Bystrov, V., Naboka-Krell, V., Staszewska-Bystrova, A. and Winker, P. (2025). Analysing the impact of removing infrequent terms on topic quality in LDA models, Central European Journal of Economic Modelling and Econometrics, 17: 61-85.
Bystrov, V., Naboka-Krell, V., Staszewska-Bystrova, A. and Winker, P. (2024a). Comparing links between topic trends and economic indicators in the German and Polish academic literature, Comparative Economic Research. Central and Eastern Europe 2: 7–28.
Bystrov, V., Naboka-Krell, V., Staszewska-Bystrova, A. and Winker, P. (2024b). Choosing the number of topics in LDA models – a Monte Carlo comparison of selection criteria, Journal of Machine Learning Research 25(79): 1–30.
Bystrov, V., Naboka, V., Staszewska-Bystrova, A. and Winker, P. (2022). Cross-corpora comparisons of topics and topic trends, Journal of Economics and Statistics 242(4): 433–469.
Cao, J., Xia, T., Li, J., Zhang, Y. and Tang, S. (2009). A density-based method for adaptive LDA model selection, Neurocomputing 72(7): 1775 – 1781.
Cao, Q., Cheng, X. and Liao, S. (2023). A comparison study of topic modelling based literature analysis by using full texts and abstracts of scientific articles: a case of COVID-19 research, Library Hi Tech 41(2): 543–569.
Chai, C. P. (2023). Comparison of text preprocessing methods, Natural Language Engineering 29(3): 509–553.
Deveaud, R., SanJuan, E. and Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval, Document numérique 17(1): 61–84.
Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics, Proceedings of the National Academy of Sciences 101 (suppl 1): 5228–5235.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure, arXiv preprint arXiv:2203.05794 .
Grün, B. and Hornik, K. (2011). topicmodels: An R package for fitting topic models, Journal of Statistical Software 40: 1–30.
Lewis, C. and Grossetti, F. (2022). A statistical approach for optimal topic model identification, Journal of Machine Learning Research 23: 1–20.
Liu, P.-Y. and Wang, Z. (2024). Finance research over 40 years: What can we learn from machine learning?, International Studies of Economics 19(4): 472–507.
Mimno, D., Wallach, H., Talley, E., Leenders, M. and McCallum, A. (2011). Optimizing semantic coherence in topic models, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, Scotland, UK., pp. 262–272.
Mishra, M., Vishwakarma, S. K., Malviya, L. and Anjana, S. (2024). Temporal analysis of computational economics: a topic modeling approach, International Journal of Data Science and Analytics pp. 1–15.
Mu, Y., Dong, C., Bontcheva, K. and Song, X. (2024). Large language models offer an alternative to the traditional approach of topic modelling, arXiv preprint arXiv:2403.16248 .
Syed, S. and Spruit, M. (2017). Full-text or abstract? Examining topic coherence scores using latent Dirichlet allocation, 2017 IEEE International conference on data science and advanced analytics (DSAA), IEEE, pp. 165–174.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Anna Staszewska-Bystrova, Victor Bystrov

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Stats
Number of views and downloads: 12
Number of citations: 0