The fourth V, as in evolution: How evolutionary linguistics can contribute to data science

Maciej Pokornowski



The paper explores the importance of closer interaction between data science and evolutionary linguistics, pointing to the potential benefits for both disciplines. In the context of big data, the microblogging social networking service – Twitter – can be treated as a source of empirical input for analyses in the field of language evolution. In an attempt to utilize this kind of disciplinary interplay, I propose a model, which constitutes an adaptation of the Iterated Learning framework, for investigating the glossogenetic evolution of sublanguages.



Data science; evolutionary linguistics; natural language processing; Twitter; glossogeny; Iterated Learning framework

Full Text:



Bengfort, B. (2013). Big Data and Natural Language Processing. DOA: 15 Aug. 2014.

Broniatowski, D. A., M. J. Paul, and M. Dredze.(2014). Twitter: Big data opportunities. Inform 49: 255.

Bruns, A. and J. E. Brugess. (2011). The use of Twitter hashtags in the formation of ad hoc publics. In 6th European Consortium for Political Research General Conference, 25–27 August 2011, University of Iceland, Reykjavik.

Cambria, E. and B. White. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational Intelligence Magazine 9.2:48–57.

Chen, L.,C. Zhang., and C. Wilson. (2013). Tweeting under pressure: analyzing trending topics and evolving word choice on sina weibo. In Proceedings of the first ACM conference on Online social networks, 89–100. ACM.

Chen, M., S. Mao, and Y. Liu. (2014). Big Data: A Survey. Mobile Networks and Applications 19.2: 171–209.

Chomsky, N. (2007). Biolinguistic explorations: Design, development, evolution. International Journal of Philosophical Studies 15.1: 1–21.

Cogan, P., M. Andrews, , M. Bradonjic, , W. S. Kennedy,, A. Sala, and G. Tucci. (2012). Reconstruction and analysis of twitter conversation graphs. In Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, 25–31. ACM.

Conte R., N. Gilbert , C. Cioffi-Revilla, G. Deffuant, J. Kertesz, V. Loreto, S. Moat, J.-P. Nadal, A. Sanchez, A. Nowak, A. Flache, M. San Miguel, and D. Helbing. (2012). Manifesto of computational social science. Eur. Phys. J. Special Topics, 214:325–346.

Cunha, E., G. Magno, G. Comarela, V. Almeida, M. A. Gonçalves, and F. Benevenuto. (2011). Analyzing the dynamic evolution of hashtags on twitter: a language-based approach. In Proceedings of the Workshop on Languages in Social Media, 58–65. Association for Computational Linguistics.

Davenport, T. H. and D. J. Patil. (2012). Data Scientist. Harvard Business Review 90: 70–76.

Edwards, C., A. Edwards, P. R. Spence, and A. K. Shelton. (2014). Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on Twitter. Computers in Human Behavior 33: 372–376.

Gouws, S., D. Metzler, C. Cai, and E. Hovy. (2011). Contextual bearing on linguistic variation in social media. In Proceedings of the Workshop on Languages in Social Media, 20–29. Association for Computational Linguistics.

Hilbert, M. (2014). What Is the Content of the World's Technologically Mediated Information and Communication Capacity: How Much Text, Image, Audio, and Video? The Information Society 30.2: 127–143.

Honeycutt, C., and S. C. Herring. (2009). Beyond microblogging: Conversation and collaboration via Twitter. In System Sciences, 2009. HICSS'09. 42nd Hawaii International Conference on, 1–10. IEEE.

Hong, L., G. Convertino, and E. H. Chi. (2011). Language Matters In Twitter: A Large Scale Study. In ICWSM.

Hu, Y., K. Talamadupula, and S. Kambhampati. (2013). Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language. In ICWSM.

Hurford, J. R. (1990). Nativist and functional explanations in language acquisition. Logical issues in language acquisition, 85–136.

Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford University Press.

Jackendoff, R. (2007a). A whole lot of challenges for linguistics. Journal of English Linguistics 35.3: 253–262.

Jackendoff, R. (2007b). Linguistics in cognitive science: The state of the art. The linguistic review 24.4: 347–401.

Jagadish, H. V., J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi. (2014). Big data and its technical challenges. Communications of the ACM, 57.7: 86–94.

Jasiński, A. Iterated learning of ‘evolects’ and the dynamics of (re)production of natural language resources. In preparation.

Kirby, S., K. Smith, and H. Brighton. (2004). From UG to universals: Linguistic adaptation through iterated learning. Studies in Language 28.3: 587–607.

Kirby, S., H. Cornish, K. and Smith. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105.31: 10681–10686.

Kirby, S. and J. Hurford. (2002). The Emergence of Linguistic Structure: An overview of the Iterated Learning Model, In Cangelosi, A. i Parisi, D. (eds.), Simulating the Evolution of Language. London: Springer Verlag, 121–148.

Kirby, S., T. Griffiths, and K. Smith. (2014). Iterated learning and the evolution of language. Current opinion in neurobiology 28: 108–114.

Kwak, H., C. Lee, H. Park, and S. Moon. (2010). What is Twitter, a social network or a news media?” In Proceedings of the 19th international conference on World wide web, 591–600. ACM.

Lazer, D., R. Kennedy, G. King, and A. Vespignani. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Science 343.6176 (March 14): 1203–1205.

Letierce, J., A, Passant, J. Breslin, and S. Decker. (2010). Understanding how Twitter is used to spread scientific messages. In Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26–27th, 2010, Raleigh, NC: US.

Liu, Y., C. Kliman-Silver, and A. Mislove. (2014). The Tweets They are a-Changin’: Evolution of Twitter Users and Behavior. In ICWSM

Macskassy, S. A. (2012). On the Study of Social Interactions in Twitter. In ICWSM.

Pokornowski, M. and K. Rogalska. (2014). “Investigating glossogeny via iterated learning methodology: the effect of entrenched linguistic system(s) in human agents”. Presented at the Languages in Contact 2014, (17 May 2014) Wroclaw, Poland.

Provost, F., and T. Fawcett. (2013). Data science and its relationship to big data and data-driven decision making. Big Data 1.1: 51–59.

Rossi, L., and M. Magnani. (2012). Conversation Practices and Network Structure in Twitter. In ICWSM.

Wacewicz, S. (2013). Ewolucja języka – współczesne kontrowersje. In Stalmaszczyk, P. (ed.) Metodologie językoznawstwa. 1. Ewolucja języka. Ewolucja teorii językoznawczych. [Evolution of Language – contemporary controversies In Stalmaszczyk P. (ed.) Linguistic methodologies: 1. Evolution of Language. The Evolution of Linguistic Theories]. Łódź: Wydawnictwo Uniwersytetu Łódzkiego, 11–26.

Weller., K., E. Dröge, and C. Puschmann. (2011). Citation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. In # MSM, 1–12.

Wu, S., J. M. Hofman, W. A. Mason, and D. J. Watts. (2011). Who says what to whom on twitter. In Proceedings of the 20th international conference on World wide web, 705–714 . ACM.

Zappavigna, M. (2011). Visualizing logogenesis: Preserving the dynamics of meaning. Semiotic Margins: Meaning in Multimodalites. London: Continuum, 211–228.


  • There are currently no refbacks.

ISSN 2392-1196 (online)

Partnerzy platformy czasopism