Registers in Contact

Registers in Contact
Registers in contact: linguistic evolution of specialized scientific registers
(supported by DFG 2010 - 2013)
The topic of the present project is the linguistic evolution of functional variation in highly-specialized scientific domains. The project aims to analyze domain-specific variation emerged through register contact in a corpus of English scientific texts. Our focus lies on scientific domains or disciplines at the boundaries of computer science (i.e. computational linguistics, bioinformatics). The central question of the project is: what are the linguistic means used to create a distinctive identity of disciplines emerged through register contact? From a linguistic point of view the subject of study is a phenomenon of recent language change, not related to the language system, but rather to language use. For this purpose, we use an already existing synchronic corpus (DaSciTex), which consists of English scientific texts of the early 2000s, and expand it diachronically (1970s/1980s) to the SciTex corpus (English Scientific Text Corpus). Our aim is to use the corpus in order to empirically analyze the linguistic evolution of selected scientific registers. Methodologically, the study is based on English register theory. We employ methods already established within corpus linguistics as well as new probabilistic methods for the identification of text similarities.
Corpus structur and corpus size
SciTex contains the following subcorpora:
- computer science (A subcorpus)
- four contact disciplines (B subcorpus)
(computational linguistics, bioinformatics, digital construction and microelectronics) - four disciplines of origin (C subcorpus)
(linguistics, biology, mechanical engineering and electrical engineering)
SciTex is divided into DaSciTex (ealry 2000s) and SaSciTex (1970s/1980s). The corpus as a whole amounts at approx. 34 million tokens.
The corpus consists of:
- two small, cleaned up corpora with approx. 1 million tokens each for grammatical analyses (one for the 70/80s and one for the early 2000s) and
- two big corpora (70/80s and early 2000s) with 17 million tokens each for lexical analyses (the small corpora being included in the big ones)
Project related publications
Teich, E., Degaetano-Ortlieb, S., Fankhauser, P., Kermes, H., and Lapshinova-Koltunski, E. (2014). The Linguistic Construal of Disciplinarity: A Data Mining Approach Using Register Features. Journal of the Association for Information Science and Technology (JASIST). To appear.
Degaetano-Ortlieb, S., Fankhauser, P., Kermes, H., Lapshinova-Koltunski, E., Ordan, N. and Teich, E. (2014). Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers. Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014). Reykjavik, Iceland. URL:
Degaetano-Ortlieb, S. and Teich, E. (2014). Register diversification in evaluative language: The case of scientific writing. In Geoff Thompson and Laura Alba-Juez (eds). Evaluation in Context. John Benjamins Publishing Company, pp. 241-258. URL:
Fankhauser, P., Knappen, J. and Teich, E. (2014). Exploring and Visualizing Variation in Language Resources. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. URL:
Fankhauser, P., Kermes, H. and Teich, E. (2014). Combining Macro- and Microanalysis for Exploring the Construal of Scientific Disciplinarity. Digital Humanities Conference. Lausanne, Switzerland. URL:
Degaetano-Ortlieb, S., Kermes, H., Lapshinova-Koltunski, E. and Teich, E. (2013). SciTex - A Diachronic Corpus for Analyzing the Development of Scientific Registers. In Bennett, P., Durrell, M., Scheible, S. and Whitt, R. J., eds. New Methods in Historical Corpus Linguistics. Corpus Linguistics and Interdisciplinary Perspectives on Language - CLIP, Volume 3. Tübingen, Narr.
Degaetano-Ortlieb, S., Kermes, H. and Teich, E. (2013). The notion of importance in academic writing: detection, linguistic properties and targets. Proceedings of the 2nd Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS). Darmstadt, Germany.
Degaetano-Ortlieb, S., Kermes, H., Lapshinova-Koltunski, E. and Teich, E. (2013). Automatic text classification for diachronic register analysis. 24th European Systemic Functional Linguistics Conference and Workshop (ESFLCW2013). Coventry, UK.
Degaetano-Ortlieb, S., Lapshinova-Koltunski, E., Kermes, H. and Teich, E. (2013). Procedures for Automatic Corpus Enrichment. Corpus Linguistics. Lancaster, UK.
Degaetano-Ortlieb, S. and Teich, E. (2013). A methodology to analyze evaluation across scientific disciplines - feature detection, extraction and annotation. Corpus Linguistics - Workshop Evaluative Language and Corpus Linguistics. Lancaster, UK.
Degaetano-Ortlieb, S. and Teich, E. (2013). Detection, extraction and annotation of evaluative expressions in a corpus of academic writing. Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Sektion Computer Linguistic - Postersession. Potsdam, Germany.
Lapshinova-Koltunski, E., Degaetano-Ortlieb, S., Kermes, H. and Teich, E. (2013). Linguistic evolution of conjunctive relations in emerging scientific registers. In: F. Poppi and W. Cheng (eds). The three waves of globalization: winds of change in Professional, Institutional and Academic Genres. Cambridge Scholars Publishing, Cambridge, UK.
Lapshinova-Koltunski, E., Degaetano-Ortlieb, S., Kermes, H. and Teich, E. (2013). Usefulness of Corpora Enriched with Annotations on Abstract Linguistic Levels. Genre- and Register-related Text and Discourse Features in Multilingual Corpora. Brussels, Belgium.
Teich, E., Degaetano-Ortlieb, S., Kermes, H. and Lapshinova-Koltunski, E. (2013). Scientific registers and disciplinary diversification: a comparable corpus approach. Proceedings of 6th Workshop on Building and Using Comparable Corpora (BUCC). Sofia, Bulgaria.
Degaetano-Ortlieb, S., Lapshinova-Koltunski, E. and Teich, E. (2012). Domain-specific variation of sentiment expressions: exploring a model of analysis for academic writing. 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS) at Konvens2012. Vienna.
Degaetano-Ortlieb, S., Lapshinova-Koltunski, E. and Teich, E. (2012). Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach. Proceedings of the LREC2012. Istanbul.
Kermes, H. (2012). Formulaic expressions: in this paper but where? Proceedings of ICAME 33. Leuven.
Kermes, H. (2012). A methodology for the extraction of information about the usage of formulaic expressions in scientific texts. Proceedings of LREC 2012. Istanbul.
Kermes, H. and Teich, E. (2012). Formulaic expressions in scientific texts: Corpus design, extraction and exploration. Lexicographica, Volume 28(1). De Gruyter, pages 99-120. URL:
Lapshinova-Koltunski, E., Teich, E. and Degaetano-Ortlieb, S. (2012). Tracing 'hybridity' in academic discourse: a corpus-based approach. Proceedings of the European Systemic Functional Linguistics Conference and Workshops (ESFLCW 2012). Bertinoro.
Lapshinova-Koltunski, E., Teich, E. and Degaetano-Ortlieb, S. (2012). Terminology now and then: changes across periods in academic writing. Proceedings of the ICAME 33. Leuven.
Lyding, V., Lapshinova-Koltunski, E., Degaetano-Ortlieb, S., Dittmann, H. and Culy, C. (2012). Visualising Linguistic Evolution in Academic Discourse. Proceedings of the EACL 2012. Avignon.
Teich, E., Degaetano-Ortlieb, S., Lapshinova-Koltunski, E. and Kermes, H. (2012). Register contact: an exploration of recent linguistic trends in the scientific domain. Proceedings of Historical Corpora 2012. Frankfurt.
Bartsch, Sabine, and Elke Teich, 2011. Register profiling of scientific texts: Experiences in linguistic description and corpus-based methods. ICAME32: Trends and Traditions in English Corpus Linguistics, In Honour of Stig Johansson. 127-128. Oslo, Norway. 1-5 June.
Bartsch, Sabine, and Elke Teich, 2011. Register Profiling for Highly Specialized Domains: Methods and Techniques. Anglistentag 2011, Sektion 'Approaches to Linguistic Variation'. Freiburg Brsg., Germany. September.
Bartsch, Sabine, Teich, Elke, Tragl, Christoph, 2011. Patterns of cohesion in informationally dense texts. Corpus Linguistics (CL2011). Birmingham. 20-22 July.
Degaetano-Ortlieb, Stefania, Hannah Kermes, Ekaterina Lapshinova-Koltunski and Elke Teich, 2012. SciTex – A Diachronic Corpus for Analyzing the Development of Scientific Registers. In: P. Bennett, M. Durrell, S. Scheible and R. J. Whitt, editors. Corpus Linguistics and Interdisciplinary Perspectives on Language - CLIP, Vol. 2. New Methods in Historical Corpus Linguistics. Narr. Tübingen, Germany.
Degaetano-Ortlieb, Stefania, Ekaterina Lapshinova-Koltunski, Elke Teich, 2012. Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach. In Proceedings of the LREC-2012. Istanbul. 21-27 May.
Degaetano, Stefania, and Elke Teich, 2011. The lexico-grammar of stance: an exploratory analysis of scientific texts. In: St. Dipper and H. Zinsmeister, editors. Bochumer Linguistische Arbeitsberichte 3 - Beyond Semantics: Corpus-based Investigations of Pragmatic and Discourse Phenomena. 23-25 February.
Degaetano, Stefania, Teich, Elke, 2011. Exploring a semi-automatic approach for the analysis of interpersonal meaning in large corpora. International Systemic Functional Linguistics Conference (ISFC38). Lissabon. 25-29 July.
Degaetano, Stefania, 2011. Evaluative options and their choice - modal adjuncts vs. evaluative patterns in academic writing. International Evaluation Conference 2011 (IntEval). Madrid, Spain. October.
Degaetano, Stefania, 2011. Evaluation across scientic disciplines - a corpus-based analysis. Corpus Linguistics (CL2011). Birmingham. 20-22 July.
Kermes, Hannah, 2012. Methodology for the extraction of information about the usage of formulaic expressions in scientific texts. In Proceedings of the LREC 2012. June.
Kermes, Hannah, 2012. Formulaic expressions: in this paper but where? In Proceedings of the ICAME 33. May.
Kermes, Hannah, 2011. Usage and function of formulaic expressions in scientific texts. Corpus Linguistics 2011. 86-87. Birmingham, UK. July.
Kermes, Hannah and Elke Teich. Formulaic expressions in scientific texts: Corpus design and extraction pipeline. Lexicographica. to appear.
Lapshinova-Koltunski, Ekaterina, Stefania Degaetano-Ortlieb, Elke Teich and Hannah Kermes, 2012. Usefulness of Corpora Enriched with Annotations on Abstract Linguistic Levels. In Proceedings of Genre- and Register-related Text and Discourse Features in Multilingual Corpora. 11-12 January.
Lapshinova-Koltunski, Ekaterina, Elke Teich and Stefania Degaetano-Ortlieb, 2012. Terminology now and then: changes across periods in academic writing. In Proceedings of the ICAME 33. May.
Lapshinova-Koltunski, Ekaterina, Elke Teich and Stefania Degaetano-Ortlieb, 2012. Tracing 'hybridity' in academic discourse: a corpus-based approach. In Proceedings of the ESFLCW2012. July.
Lapshinova, Ekaterina, Degaetano, Stefania, Teich, Elke, 2011. Interdisciplinarity in academic discourse - a corpus-based analysis. Interdisciplinary Linguistics Conference (ILinC2011). Belfast. 14-15 Oktober.
Lyding, Verena, Ekaterina Lapshinova-Koltunski, Stefania Degaetano-Ortlieb, Henrik Dittmann and Christopher Culy, 2012. Visualising Linguistic Evolution in Academic Discourse. In Proceedings of the EACL2012. Avignon, France. 23-27 April.
Teich, Elke, Ekaterina Lapshinova, Stefania Degaetano, 2012. Terminology now and then: changes across periods in academic writing. In Proceedings of the ICAME-33. Leuven. June.
Teich, Elke, Ekaterina Lapshinova, Hannah Kermes and Stefania Degaetano, 2011. Linguistic evolution of emerging scientific registers. Clavier2011. Modena, Italy. November.