Bachelor/Master Thesis

Topics

You can sometimes find open topics posted here - these are topics for which we are actively looking out for students with matching interest. Other topics are also possible, as long as they fit with the research topics of the group. You may get in touch with Professor Demberg or one of the Postdocs or PhDs anyway to inquire about current topics. Or, if you have an own idea, please feel free to suggest it to Prof. Demberg.

Variance of discourse relation interpretation in translation between English and German/French

Discourse relations are semantic links between text segments. They can be marked explicity by a discourse connective such as “because” or “however”, or expressed implicitly, i.e. without a specific discourse marker.  Sometimes the relations are ambiguous and allow for multiple readings, particular for implicit relations.

Consider the relation between the following two sentences, taken from the novel Animal Farm. 
"I have little more to say. I merely repeat, remember always your duty of enmity towards Man and all his ways". 
"Whatever goes upon two legs is an enemy. Whatever goes upon four legs, or has wings, is a friend".
The second sentence can be interpreted as providing detail about the “duty of enmity towards man”, but this detail can also be interpreted as the reason why men are enemy. As a result, some people will call this a "specification" relation, and others call this "reason".

There are various research on English about the interpretation of co-occuring relations, but it is not clear if the findings generalize well on other languages. Particularly, the comparison is difficult because the analysis of each language is often based on different texts.

The recently released DiscoGeM 2.0 corpus contains discourse relation annotation of four languages (English, German, French, Czech) on parallel texts and thus allows more direct cross-lingual comparison. In addition, each instance is annotated by 10 people, thus also allowing comparison of different interpretations within the same language. 

The current thesis call focuses on the analysis of discourse relation interpretation variance WITHIN the same language, specifically German and French. We are looking for German or French native or near-native speakers who are interested in discourse analysis.

Effective reference to groups of objects

When referring to a subgroup of a greater number of objects, we sometimes have multiple options:
We could enumerate the objects one by one (the ball on the upper left and the ball on the upper right), or we can refer to them by a common feature (the red balls). Enumeration of locations has the advantage that the listener knows how many objects he is supposed to attend to and that she does not have to engage in an exhaustive visual search. A common feature, on the other hand, can be expressed more efficiently, i.e. with a shorter utterance. But which type of reference is more effective in the end? How quickly does the listener establish reference to all members of the group for the two types of reference? We have some preliminary results that suggest that group size and the type of common features play a role in this.

You will further investigate effective ways of referring to groups of objects. You will learn how to conduct online and eye-tracking experiments and how to analyse the results to be able to answer your research questions. For this topic, it is beneficial, if you know the basics about experimental design and have some experience using R for inferential analysis.

Guidelines

If you are a MSc student in LST or LCT program, please see these guidelines:

https://www.uni-saarland.de/fileadmin/upload/fachrichtung/lst/pruefsek-science/Checkliste_MSc_01.pdf

Bachelor / Master Seminar and Thesis

  • You need to do the Bachelor/Master Seminar before you can register for the thesis.
  • The seminar is used to further specify the topic of the thesis, perform a literature review, identify suitable methods and formulate the hypotheses you want to test as part of your thesis. It consists of two deliverables:
        – a talk (ca 30 min, followed by questions)
        – a seminar paper including the introduction to your topic, a literature review and the specification of methods and hypotheses (10-20 pages).
  • You should finish your thesis seminar within 6 months (including handing in the seminar paper) from starting to work on your thesis topic. If you think you will not be able to conform to this, please select a different group for doing your thesis.
  • Additionally, you are required to take part in each of our thesis writing sessions (taking place once a month, 3 sessions in total) and participate in at least 5 presentations (Thesis seminars and thesis colloquia) of other MSc and BSc students who are doing their thesis at our chair (3 of them before your own seminar presentation).
  • After having done both parts of the seminar, you (!) need to write an email to sek-vd(at)lst.uni-saarland.de. In this mail, please put Prof. Demberg and your advisor in CC and provide the following information: your full name, your matriculation number, the date of the seminar talk, the title of the talk that you gave. Only then the processing of LSF data is started.
  • LangSci students: for you the seminar has only one deliverable, namely the talk (no paper needed).
    This seminar can be taken instead of the Kolloquium BA Language Science.
  • By choosing to do your thesis with the chair of Computer Science and Computational Linguistics, you agree to us using plagiarism software on your thesis document in order to detect cases of plagiarism.

Thesis and thesis colloquium

  • You need to register your thesis. This is only possible, once you finished the Bachelor / Master seminar and when the data is entered in the LSF.
  • You need to write down your thesis (A German and English template for this can be found on TEAMS).
  • Once you have completed your thesis (shortly before or after handing in the written version), we will schedule the thesis colloquium, where you will defend your thesis.
  • You should announce it yourself to the group and also invite other students if you wish.
  • The presentation should take max 30 minutes, followed by approximately 15 min of discussion. Please do make sure to not overrun the 30 min presentation time!

Etiquette for Presentations

  • Preferably come to presentations (thesis writing seminar and student presentations) in person.
  • If you need to join online, we expect you to turn on your camera.
  • If you are presenting, please make sure to be there ahead of time and test the equipment/online connection beforehand.

Grading Scheme

The following questions are considered (if applicable for the specific thesis topic and further questions might be considered if relevant for it) while grading a thesis. This is aimed at providing you with an overview of aspects important to a thesis. If you have any further questions, refer to your advisor for more information.

  • General
    • Is the thesis topic (as agreed upon initially) properly addressed?
    • Does the thesis show the student implemented appropriate scientific methods (i.e. decisions were made in an informed manner and documented properly, etc.)?
  • Related work
    • Is the selection of related work applicable and comprehensive?
    • Was the feedback on the related work during the bachelor / master seminar properly integrated in the thesis?
    • Is the related work appropriately presented (i.e., it was described in a focused way what constitutes the related work, it was clearly shown why this work is relevant for one's own work and which aspects have flowed into one's own work, etc.)?
    • Are citations used correctly and wherever needed?
    • Is the bibliography complete with consistent formatting?
  • Execution of the written part
    • Does the abstract properly describe the thesis?
    • Is the thesis structured correctly and comprehensibly?
    • Is the motivation of the thesis clearly elaborated on?
    • Does the thesis contain a clear summary of the results achieved?
    • Is there a critical discussion on the performance and the limitations of the work to reflect on the choices made?
    • Is future work thoroughly described and are connections to the own work well presented?
    • Is the language used appropriate without spelling mistakes?
    • Does the thesis follow an internal consistency (e.g., special terms are always written in the same form)?
    • Is the thesis consistent and free of incorrect descriptions (i.e., there are no contradictions within the thesis, etc.)?
    • Is the thesis presented clearly and are the means of presentation appropriate (e.g., short sentences, images are used were reasonable, images are easy to understand, etc.)?
    • Is the layout of the thesis appropriate (i.e., all images are referenced, no widows and orphans, tables are properly formatted, etc.)?
  • Concept
    • Is the concept (in relation to the thesis topic) presented thoroughly in the thesis?
    • Are the hypotheses formulated clearly?
    • Is the chosen and described solution novel?
    • Is the concept appropriately presented with a motivation why this solution is the correct one to target the goal of the work?
  • For theses addressing an NLP task:
    • Has the task been addressed comprehensively?
    • Has the dataset been chosen appropriately?
    • Is the chosen method / algorithm suitable for the task?
    • Was training and testing conducted correctly? hyperparameter choice based on dev set (if applicable) / have different random initializations been tried  (if applicable)?
    • Were evaluation measures chosen appropriately?
    • Is an error analysis provided?
    • Were statistical tests conducted to test whether obtained differences are statistically different from one another?
    • Is the code base made available (github or similar) and is it documented following ACL guidelines / best practices?
    • Are the descriptions in the thesis sufficient to allow for replicability?
    • Have the results of the evaluation been discussed with respect to the hypotheses of the thesis?
  • For thesis in experimental psycholinguistics:
    • Quality and documentation of experimental materials (well designed, no confounds) (if applicable)
    • Is the methodological approach appropriate (addresses the hypotheses, suitable experimental design)
    • Was the experiment implemented correctly (wrt. randomization, counter-balancing, choice of fillers, task instructions, practice trials etc.)
    • Was the number of participants in the study chosen in a well-motivated way?
    • Were the participants selected appropriately?
    • Has the study been pre-registered?
    • Did the study follow ethical guidelines? 
    • Was the data handled in a way that is in line with data protection (pseudonymization or anonymization; appropriate storage etc)?
    • Was the data analysed correction (statistics)?
    • Is the study described well enough so that it could be replicated?
    • Are the results presented clearly, and discussed with respect to the hypotheses?
  • For theses that contain data set collection:
    • Was the pre-processing and/or post-processing of the data performed appropriately and correctly?
    • Were the instructions given to annotators clear (annotation scheme / instructions to crowd-workers)
    • Was the data source chosen in a well-motivated way?
    • Is the quality of the data good and have data quality checks been performed?
    • Is the dataset described using descriptive statistics?