Spelling and dash fixes

This commit is contained in:
Charlotte Van Petegem 2023-12-05 09:48:09 +01:00
parent 510a9922f7
commit 8dc8ecdbc8
No known key found for this signature in database
GPG key ID: 019E764B7184435A

View file

@ -234,7 +234,7 @@ While working on a programming assignment, students will also see a clear warnin
Courses also provide an iCalendar link that students can use to publish course deadlines in their personal calendar application.
Because Dodona logs all student submissions and their metadata, including feedback and grades from automated and manual assessment, we use that data to integrate reports and learning analytics in the course page\nbsp{}[cite:@fergusonLearningAnalyticsDrivers2012].
We also provide export wizards that enable the extraction of raw and aggregated data in CSV-format for downstream processing and educational data mining\nbsp{}[cite:@romeroEducationalDataMining2010; @bakerStateEducationalData2009].
We also provide export wizards that enable the extraction of raw and aggregated data in CSV format for downstream processing and educational data mining\nbsp{}[cite:@romeroEducationalDataMining2010; @bakerStateEducationalData2009].
This allows teachers to better understand student behaviour, progress and knowledge, and might give deeper insight into the underlying factors that contribute to student actions\nbsp{}[cite:@ihantolaReviewRecentSystems2010].
Understanding, knowledge and insights that can be used to make informed decisions about courses and their pedagogy, increase student engagement, and identify at-risk students\nbsp{}[cite:@vanpetegemPassFailPrediction2022].
@ -272,7 +272,7 @@ Where automatic assessment and feedback generation is outsourced to the judge li
This frees judge developers from putting effort in feedback rendering and gives a coherent look-and-feel even for students that solve programming assignments assessed by different judges.
Because the way feedback is presented is very important\nbsp{}[cite:@maniBetterFeedbackEducational2014], we took great care in designing how feedback is displayed to make its interpretation as easy as possible (Figure\nbsp{}[[fig:whatfeedback]]).
Differences between generated and expected output are automatically highlighted for each failed test\nbsp{}[cite:@myersAnONDDifference1986], and users can swap between displaying the output lines side-by-side or interleaved to make differences more comparable.
We even provide specific support for highlighting differences between tabular data such as CSV-files, database tables and data frames.
We even provide specific support for highlighting differences between tabular data such as CSV files, database tables and data frames.
Users have the option to dynamically hide contexts whose test cases all succeeded, allowing them to immediately pinpoint reported mistakes in feedback that contains lots of succeeded test cases.
To ease debugging the source code of submissions for Python assignments, the Python Tutor\nbsp{}[cite:@guoOnlinePythonTutor2013] can be launched directly from any context with a combination of the submitted source code and the test code from the context.
Students typically report this as one of the most useful features of Dodona.
@ -468,7 +468,7 @@ Each course edition has a fixed structure, with 13 weeks of educational activiti
The final exam at the end of the term evaluates all topics covered in the entire course.
Students who fail the course during the first exam in January can take a resit exam in August/September that gives them a second chance to pass the exam.
#+CAPTION: *Top*: Structure of the Python course that runs each academic year across a 13-week term (September-December).
#+CAPTION: *Top*: Structure of the Python course that runs each academic year across a 13-week term (September--December).
#+CAPTION: Programming assignments from the same Dodona series are stacked vertically.
#+CAPTION: Students submit solutions for ten series with six mandatory assignments, two tests with two assignments and an exam with three assignments.
#+CAPTION: There is also a resit exam with three assignments in August/September if they failed the first exam in January.
@ -871,12 +871,12 @@ To ensure that the system is robust to sudden increases in workload and when ser
More specifically, the web server, database (MySQL) and caching system (Memcached) each run on their own machine.
In addition, a scalable pool of interchangeable worker servers are available to automatically assess incoming student submissions.
The deployment of the Python Tutor also saw a number of changes over the years.
The Python Tutor itself is written in Python, so could not be part of Dodona itself
The Python Tutor itself is written in Python, so could not be part of Dodona itself.
It started out as a Docker container on the same server as the main Dodona web application.
Because it is used mainly by students who made mistakes, the service responsible for running student code could become overwhelmed and in extreme cases even make the entire server unresponsive.
After we identified this issue, the Python tutor was moved to its own server.
This did not fix the Tutor itself becoming overwhelmed however, which meant that students that depended on the Tutor were sometimes unable to use it.
This of course happened more during periods were the Tutor was being used a lot, such as evaluations and exams.
This of course happened more during periods where the Tutor was being used a lot, such as evaluations and exams.
One can imagine that the experience for students who are already quite stressed out about the exam they are taking when the Tutor suddenly failed was not very good.
In the meantime, we had started to experiment with running Python code client-side in the browser (see section\nbsp{}[[Papyros]] for more info).
Because these experiments were successful, we migrated the Python Tutor from its own server to being run by students in their own browser using Pyodide.
@ -1104,7 +1104,7 @@ The results are discussed from a methodological and educational perspective with
:END:
This study uses data from two introductory programming courses (referenced as course A and course B) collected during 3 editions of each course in academic years 2016--2017, 2017--2018 and 2018--2019.
Both courses run once per academic year across a 12-week semester (September-December).
Both courses run once per academic year across a 12-week semester (September--December).
They have separate lecturers and teaching assistants, and are taken by students of different faculties.
The courses have their own structure, but each edition of a course follows the same structure.
Table\nbsp{}[[tab:passfailcoursestatistics]] summarizes some statistics on the course editions included in this study.
@ -1187,8 +1187,8 @@ We did not use the actual source code submitted by students, but the status desc
Comparison of student behaviour between different editions of the same course is enabled by computing snapshots for each edition at series deadlines.
Because course editions follow the same structure, we can align their series and compare snapshots for corresponding series.
Corresponding snapshots represent student performance at intermediate points during the semester and their chronology also allows longitudinal analysis.
Course A has snapshots for the five series on topics covered in the first unit (labelled S1-S5), a snapshot for the evaluation of the first unit (labelled E1), snapshots for the five series on topics covered in the second unit (labelled S6-S10), a snapshot for the evaluation of the second unit (labelled E2) and a snapshot for the exam (labelled E3).
Course B has snapshots for the first ten lab sessions (labelled S1-S10), a snapshot for the first evaluation (labelled E1), snapshots for the next series of seven lab sessions (labelled S11-S17), a snapshot for the second evaluation (labelled E2), snapshots for the last three lab sessions (S18-S20) and a snapshot for the exam (labelled E3).
Course A has snapshots for the five series on topics covered in the first unit (labelled S1--S5), a snapshot for the evaluation of the first unit (labelled E1), snapshots for the five series on topics covered in the second unit (labelled S6--S10), a snapshot for the evaluation of the second unit (labelled E2) and a snapshot for the exam (labelled E3).
Course B has snapshots for the first ten lab sessions (labelled S1--S10), a snapshot for the first evaluation (labelled E1), snapshots for the next series of seven lab sessions (labelled S11--S17), a snapshot for the second evaluation (labelled E2), snapshots for the last three lab sessions (S18--S20) and a snapshot for the exam (labelled E3).
A snapshot of a course edition measures student performance only from information available when the snapshot was taken.
As a result, the snapshot does not take into account submissions after its timestamp.
@ -1222,7 +1222,7 @@ To investigate the impact of deadline-related features, we also made predictions
:END:
We evaluated four classification algorithms to make pass/fail predictions from student behaviour: stochastic gradient descent\nbsp{}[cite:@fergusonInconsistentMaximumLikelihood1982], logistic regression [cite:@kleinbaumIntroductionLogisticRegression1994], support vector machines [cite:@cortesSupportVectorNetworks1995], and random forests [cite:@svetnikRandomForestClassification2003].
We used implementations of the algorithms from scikit-learn\nbsp{}[cite:@pedregosaScikitlearnMachineLearning2011] and optimized model parameters for each algorithm by cross-validated grid-search over a parameter grid.
We used implementations of the algorithms from =scikit-learn=\nbsp{}[cite:@pedregosaScikitlearnMachineLearning2011] and optimized model parameters for each algorithm by cross-validated grid-search over a parameter grid.
Readers unfamiliar with machine learning can think of these specific algorithms as black boxes, but we briefly explain the basic principles of classification for their understanding.
Supervised learning algorithms use a dataset that contains both inputs and desired outputs to build a model that can be used to predict the output associated with new inputs.
@ -1387,14 +1387,14 @@ The second feature type we want to highlight is =correct_after_15m=: the number
Note that we can't directly measure how long students work on an exercise, as they may write, run and test their solutions on their local machine before their first submission to the learning platform.
Rather, this feature type measures how long it takes students to find and remedy errors in their code (debugging), after they start getting automatic feedback from the learning platform.
For exercise series in the first unit of course A (series 1-5), we generally see that the features have a positive impact (red).
For exercise series in the first unit of course A (series 1--5), we generally see that the features have a positive impact (red).
This means that students will more likely pass the course if they are able to quickly remedy errors in their solutions for these exercises.
The first and fourth series are an exception here.
The fact that students need more time for the first series might reflect that learning something new is hard at the beginning, even if the exercises are still relatively easy.
Series 4 of course A covers strings as the first compound data type of Python in combination with nested loops, where (non-nested) loops themselves are covered in series 3.
This complex combination might mean that students generally need more time to debug the exercises in series 4.
For the series of the second unit (series 6-10), we observe two different effects.
For the series of the second unit (series 6--10), we observe two different effects.
The impact of these features is zero for the first few snapshots (grey bottom left corner).
This is because the exercises from these series were not yet published at the time of those snapshots, where all series of the first unit were available from the start of the semester.
For the later snapshots, we generally see a negative (blue) weight associated with the features.