Incorporate Peter's feedback

2024-05-24 11:13:47 +02:00 · 2024-05-24 11:13:47 +02:00 · 92f6138ebf
commit 92f6138ebf
parent d94a9f78fb
3 changed files with 42 additions and 250 deletions
--- a/bibliography.bib
+++ b/bibliography.bib
--- a/book.org
+++ b/book.org
@ -104,7 +104,7 @@ Christophe, Wesley, Frank, Kim en Raija, bedankt!
 Verder zou ik graag Dominique willen bedanken om de rol als voorzitter van mijn jury op te nemen.

 Een eerder atypische bedanking gaat uit naar alle artiesten waarvan ik de muziek gebruikt heb om tijdens het schrijven van mijn doctoraat de concentratie te behouden.[fn::
-Ik limiteer met tot de periode van het schrijven van mijn doctoraat, want als ik alles had opgelijst dat die rol vervuld heeft in de voorbije zes jaar zou dit boek een stuk dikker geworden zijn.
+Ik limiteer me tot de periode van het schrijven van mijn doctoraat, want als ik alles had opgelijst dat die rol vervuld heeft in de voorbije zes jaar zou dit boek een stuk dikker geworden zijn.
 ]
 Dit zijn Anohni, Boygenius[fn::En ook het solo-werk van Lucy Dacus, Phoebe Bridgers en Julien Baker.], Charlotte Cardin, Eliza McLamb, Jan Swerts, Katy Kirby, Marika Hackman, Pinegrove, SOPHIE, Spinvis en Tate McRae.

@ -657,11 +657,12 @@ Early in its development, we met with the Data Protection Officer of Ghent Unive
 We also only keep the data required for running the platform.
 This results in very little personal information being stored; only the users' names, usernames, and email addresses are stored in their profile.
 The only other data stored is data generated in the platform: submissions, evaluations, questions, answers, etc.
-In this case also, we only keep the information required for the correct functioning of these features.
+In this case too, we only keep the information required for the correct functioning of these features.
 The development of Dodona is also done in the open: the platform has been open-source since August 2019.

 The same philosophy has been extended to our research.
-All data used in Chapter\nbsp{}[[#chap:passfail]] was pseudonymized before the analysis was started.
+All data used in Chapter\nbsp{}[[#chap:passfail]] was pseudonymized before the analysis was started and no data was collected specifically to enable this research.
+Conversely, the research was restricted to data that was already collected by Dodona for its regular operations.
 The data used in the study was also not published.
 This is of course not conducive to the verifiability of the research, which is why we were very happy to see that our method could be reproduced in another context.
 The research presented in Chapter\nbsp{}[[#chap:feedback]] also doesn't rely on any personal information: only the IDs and locations of the saved feedback items were used, in addition to the relevant code.
@ -677,7 +678,7 @@ In this chapter, we will give an overview of Dodona's most important features.
 This chapter answers the question what features a platform like Dodona needs.
 The most important feature is automated assessment, but as we show in this chapter, a lot more features than that are needed.

-This chapter is partially based on *Van Petegem, C.*, Maertens, R., Strijbol, N., Van Renterghem, J., Van der Jeugt, F., De Wever, B., Dawyndt, P., Mesuere, B., 2023. Dodona: Learn to code with a virtual co-teacher that supports active learning. /SoftwareX/ 24, 101578. https://doi.org/10.1016/j.softx.2023.101578
+This chapter is partially based on *Van Petegem, C.*, Maertens, R., Strijbol, N., Van Renterghem, J., Van der Jeugt, F., De Wever, B., Dawyndt, P., Mesuere, B., 2023. Dodona: Learn to code with a virtual co-teacher that supports active learning. /SoftwareX/ 24, 101578.
 The work described in this chapter was performed by the whole Dodona team.
 It is difficult to pinpoint who did what.
 The code and its history can be looked at[fn:: https://github.com/dodona-edu/dodona/commits/main/], but it will never give a full view of the true collaborative effort of Dodona.
@ -991,7 +992,7 @@ We start by mentioning some facts and figures, and discussing a user study we pe
 We then explain how Dodona can be used on the basis of a case study.
 This case study also provides insight into the educational context for the research described in Chapters\nbsp{}[[#chap:passfail]]\nbsp{}and\nbsp{}[[#chap:feedback]].

-This chapter is partially based on *Van Petegem, C.*, Maertens, R., Strijbol, N., Van Renterghem, J., Van der Jeugt, F., De Wever, B., Dawyndt, P., Mesuere, B., 2023. Dodona: Learn to code with a virtual co-teacher that supports active learning. /SoftwareX/ 24, 101578. https://doi.org/10.1016/j.softx.2023.101578
+This chapter is partially based on *Van Petegem, C.*, Maertens, R., Strijbol, N., Van Renterghem, J., Van der Jeugt, F., De Wever, B., Dawyndt, P., Mesuere, B., 2023. Dodona: Learn to code with a virtual co-teacher that supports active learning. /SoftwareX/ 24, 101578.
 The course described in this chapter was mostly developed by prof. Peter Dawyndt, but has also seen numerous contributions by teaching assistents.

 ** Facts and figures
@ -1373,7 +1374,7 @@ Given that cohort sizes are large enough, historical data from a single course e

 Dodona has grown into a widely used automated assessment platform.
 As we have shown in this chapter, both students and teachers alike appreciate its extensive feature set and user-friendliness.
-By exploiting all Dodona features, it is possible to build out a highly activating course.
+By exploiting all Dodona features, it is possible to design and implement a highly activating course.
 While there is still a lot of time invested in running a course like this, the time Dodona saves can be reinvested in hands-on guidance of students and giving manual feedback on evaluations and examinations.

 * Under the hood: technical architecture and design
@ -1900,7 +1901,7 @@ Further work then developed this proof of concept into the full judge we will pr

 We will expand on TESTed using an example exercise.
 In this exercise, students need to rotate a list.
-For example, in Python, ~rotate([0, 1, 2, 3, 4], 2)~ should return ~[3, 4, 0, 1, 2]~.
+For example, in Python, ~rotate([0, 1, 2, 3, 4], 2)~ should return the list ~[3, 4, 0, 1, 2]~.
 The goal is that teachers can write their exercises as in Listing\nbsp{}[[lst:technicaltesteddsl]].

 #+CAPTION: Example of a TESTed test plan, showing statements and expressions.
@ -2191,8 +2192,8 @@ The infrastructure and tooling required for supporting the assessment of many su

 We now shift to the chapters where we make use of the data provided by Dodona to perform educational data mining research.

-This chapter is based on *Van Petegem, C.*, Deconinck, L., Mourisse, D., Maertens, R., Strijbol, N., Dhoedt, B., De Wever, B., Dawyndt, P., Mesuere, B., 2022. Pass/Fail Prediction in Programming Courses. /Journal of Educational Computing Research/, 68–95. https://doi.org/10.1177/07356331221085595
-It also briefly discusses the work reproduction of this research performed in Zhidkikh, D., Heilala, V., *Van Petegem, C.*, Dawyndt, P., Järvinen, M., Viitanen, S., De Wever, B., Mesuere, B., Lappalainen, V., Kettunen, L., & Hämäläinen, R., 2024. Reproducing Predictive Learning Analytics in CS1: Toward Generalizable and Explainable Models for Enhancing Student Retention. /Journal of Learning Analytics/, 1-21. https://doi.org/10.18608/jla.2024.7979
+This chapter is based on *Van Petegem, C.*, Deconinck, L., Mourisse, D., Maertens, R., Strijbol, N., Dhoedt, B., De Wever, B., Dawyndt, P., Mesuere, B., 2022. Pass/Fail Prediction in Programming Courses. /Journal of Educational Computing Research/, 68–95.
+It also briefly discusses the work reproduction of this research performed in Zhidkikh, D., Heilala, V., *Van Petegem, C.*, Dawyndt, P., Järvinen, M., Viitanen, S., De Wever, B., Mesuere, B., Lappalainen, V., Kettunen, L., & Hämäläinen, R., 2024. Reproducing Predictive Learning Analytics in CS1: Toward Generalizable and Explainable Models for Enhancing Student Retention. /Journal of Learning Analytics/, 1-21.

 The work presented in this chapter was part of the master thesis by Louise Deconinck, with the reproduction being led by Denis Zhidkikh.

@ -2822,6 +2823,7 @@ We will then expand on some further experiments using data mining techniques we
 Section\nbsp{}[[#sec:feedbackprediction]] is based on an article that is currently being prepared for submission.

 Comments and evaluations were added to Dodona by myself.
+Niko Strijbol implemented the addition of grades to evaluations.
 Jorg Van Renterghem finalized the addition of feedback reuse.
 The work on feedback prediction was started by myself and further developed in collaboration with Kasper Demeyere during his master's thesis.

@ -3493,8 +3495,8 @@ A skill profile would be more complicated though, since we would want some kind
 This leads right into another possibility for future research: exercise recommendation.
 Right now, learning paths in Dodona are static, determined by the teacher of the course the student is following.
 Dodona has a rich library of extra exercises, which some courses point to as opportunities for extra practice, but it is not always easy for students to know what exercises would be good for them.
-Using a skill profile, we could recommend exercise that only contain one skill where the student is behind on, allowing them to focus their practice on that skill specifically.
-We would again need to infer what skills are tested by exercises, but this was already required for the skill estimation itself as well.
+Using a skill profile, we could recommend exercises that only contain one skill the student has not fully attained, allowing them to focus their practice on that skill specifically.
+We would again need to infer what skills are tested by exercises, but this was already required for the skill estimation itself.

 The research from Chapter\nbsp{}[[#chap:passfail]] could also be used to help solve this problem in another way.
 If we know a student has a higher chance of failing the course, we might want to recommend some easier exercises.
@ -3503,13 +3505,13 @@ Estimating the difficulty of an exercise is a problem unto itself though (and ho

 The use of LLMs in Dodona could also be an opportunity.
 As mentioned in Section\nbsp{}[[#subsec:feedbackpredictionconclusion]], a possibility for using LLMs could be to generate feedback while grading.
-By feeding an LLM with the student's code, an indication of the failed test cases (although doing this in a good format is an issue to solve in itself) and the type of issues that the teacher wants to remark upon it should be able to give a good starting point for the feedback.
+By feeding an LLM with the student's code, an indication of the failed test cases (although doing this in a good format is an issue to solve in itself) and the type of issues that the teacher wants to address, it should be able to give a good starting point for the feedback.
 This could also kickstart the process explained in Section\nbsp{}[[#subsec:feedbackpredictionconclusion]].
 By making generated feedback reusable, the given feedback can still remain consistent and fair.

 Another option is to integrate an LLM as an AI tutor (as, for example, Khan Academy has done with Khanmigo[fn:: https://www.khanmigo.ai/]).
 This way, it could interactively help students while they are learning.
-Instead of tools like ChatGPT or Bard which are typically used to get a correct answer immediately, an AI tutor can guide students to find the correct answer to an exercise by themselves.
+Instead of tools like ChatGPT or Bard which are typically used to get a correct answer immediately, an AI tutor can guide students to find the correct answer to an exercise gradually by giving hints.

 The final possibility we will present here is to prepare suggestions for answers to student questions on Dodona.
 At first glance, LLMs should be quite good at this.
@ -3562,7 +3564,7 @@ The same might be necessary when learning to program: to learn the basics, stude
 :END:

 In this appendix, we give an overview of the most important Dodona releases, and the changes they introduced, organized per academic year.
-This is not a full overview of all Dodona releases, and does not mention all changes in a particular release.[[fn::
+This is not a full overview of all Dodona releases, and does not mention all changes in a particular release.[fn::
 A full overview of all Dodona releases, with their full changelog, can be found at https://github.com/dodona-edu/dodona/releases/.
 ]

--- a/rebuttal.md
+++ b/rebuttal.md
@ -1,10 +1,12 @@
 # Rebuttal

-## Common remarks
+To view the actual changes made in response to the comments, see the version of the PhD that shows the difference with the version submitted in March.
+
+## Recurring remarks

 > The global research question should be more clearly stated in the text. This should also help with connecting the two parts of the dissertation.

-The section "Structure of this dissertation" has been edited to explicitly mention a global research question, and a research question for each chapter.
+The section "Structure of this dissertation" in the introduction has been edited to explicitly mention a global research question, and a research question for each chapter.

 > There should be more focus on the lessons learned when building an assessment platform like Dodona.

@ -12,11 +14,17 @@ By adding a research question and conclusion to chapters 2, 3, and 4, this remar

 > Chapter 6 is of lower quality than the rest of the chapters.

-The work on the included article was continued after the dissertation was submitted, and that article has now been submitted to the Journal of Artificial Intelligence in Education. This version of the article is now included, which should solve the remarks on this chapter.
+The work on the included article was continued after the dissertation was submitted, and that article has now been submitted to the Journal of Artificial Intelligence in Education (currently under review). 
+The submitted version of the article is now included, which should solve the remarks on this chapter.
+In addition to textual improvements, the accuracy and the performance of the model have also been improved.

 > It would be good to compare Dodona and other modern platforms, perhaps via a comparative table.

-This comparative table has already been created by Sven Strickroth at https://systemscorpus.strickroth.net/. I tried including a selection from this table in the dissertation, but found it did not add much to the text.
+Such comparative table has already been created by Sven Strickroth at https://systemscorpus.strickroth.net/. 
+I tried including a selection from this table in the dissertation, but found it did not add much to the text.
+As a result, I finally decided not to include such a table.
+In general, a comparison should not merely be done based on the major features of most platforms (as most platforms share more ore less the same features).
+Differences appear in the detailed ways these features are implemented, which require a much more in-depth discussion than simply a comparative table.

 > The final chapter is short and could be expanded upon.

@ -74,13 +82,17 @@ This suggestion was applied.

 > It may be useful to include an appendix with an example that illustrates the templating system adopted by TESTed.

-While this would be interesting, TESTed is not the main focus of this PhD. For more detail on this, I would like to refer to the PhD of my friend and colleague Niko Strijbol, which goes much more in detail on TESTed.
+While this would be interesting, TESTed is not the main focus of this PhD. 
+For more detail on this, I would like to refer to the PhD of my friend and colleague Niko Strijbol, which goes much more in detail on TESTed.
+It is also discussed in detail in an article that is currently under review (Strijbol et. al, 2024).

 ### Chapter 5

 > The fact that the fourth series is an exception can perhaps also be attributed to students facing the combined use of conditional and repetitive execution for the first time, which challenges their skills in terms of keeping track of the flow of execution.

-While true that this is challenging for students, most exercises in the third series (while repetitive execution is introduced) also contain conditional execution. Therefore I did not change the text related to this.
+While true that this is challenging for students, most exercises in the third series (while repetitive execution is introduced) also contain conditional execution. 
+What is definitely new in the fourth series are nested loops, which challenges the students.
+Therefore I did not change the text related to this.

 ## Frank Neven

@ -94,7 +106,10 @@ This has been done.

 > In the list of publications on pages ix and x it is not very clear what the difference between the different publications is (1 and 6, 2 and 8, 4 and 5). Some seem to have the same title but published elsewhere? Is it really a different publication then?

-Some of these were publications of conference posters; these have been removed. The others are in fact different publications.
+Number 5 was the publication of a conference poster; this has been removed. 
+The others are in fact different publications.
+1 and 6 are two publications on TESTed, but on very different aspects of it.
+2 and 8 are two publications on Dolos, but also present different aspects of it.

 > In chapter 4 before explaining all implementation choices and details of TESTed in section 4.4 maybe first show a motivating example first to keep the reader's attention. (For example the example of Listing 4.5 could come much earlier.)

@ -111,11 +126,13 @@ The mathematics has been taken out of the running text and placed in its own env
 > LLMs are becoming more important, and the possible link with auto-graders is clear. You do mention them to some extent in chapters 6 and 7 but a slightly more thorough section on that would have been preferable. (This remark is a suggestion and not a blocking factor.)

 The possibilities for future work have been expanded upon in the final chapter, including the LLM sections.
+This will also be the focus of the PhD of a new PhD student joining Team Dodona in September, Thomas Van Mullem.

 ## Rajia Hämäläinen

 > The differences and similarities between learning analytics and educational data mining could be more clearly described, both theoretically and in the practical section.

+This was indeed an oversight.
 Definitions of LA and EDM have been added to the introduction to make clear what is meant by these terms in this PhD.

 > It would be interesting to know more about the division of labor between Van Petegem and the rest of the Dodona team.