I was sorry to have missed Raymond Pettit, John Homer and Roger Gee presenting the latest installment of what is becoming the enhanced compiler error messages saga at SIGCSE earlier this month. Their paper “Do Enhanced Compiler Error Messages Help Students?: Results Inconclusive”  was two-pronged. It contrasted the work of Denny et al.  (which provided evidence that compiler error enhancement does not make a difference to students) and my SIGCSE 2016 paper  (which provided evidence that it does). It also provided fresh evidence that supports the work of Denny et al. I must say that I like Pettit et al.’s subtitle: “Results Inconclusive”. I don’t think that this is the final chapter in the saga. We need to do a lot more work on this.
Early studies on this often didn’t include much quantifiable data. More recent studies haven’t really been measuring the same things – and they have been measuring these things in different ways. In other words, the metrics, and the methodologies differ. It’s great to see work like that of Pettit et al. that is more comparable to previous work like that of Denny et al.
One of the biggest differences between my editor, Decaf, and Pettit et al.’s tool, Athene, is that Decaf was used by students for all of their programming – practicing, working on assignments, programming for fun, even programming in despair. For most of my students it was the only compiler they used – so they made a lot of errors, and they all were logged. Unlike Denny et al., my students did not receive skeleton code – they were writing programs, often from scratch. On the other hand, Athene was often utilized by students after developing their code on their own local (un-monitored) compilers. Thus, many errors generated by the students in the Pettit et al. study were not captured. Often, the code submitted to Athene was already fairly refined. Pettit et al. even have evidence from some of their students that at times the code submitted to Athene only contained those errors that the students absolutely could not rectify without help.
As outlined in this post, Denny et al. and I were working towards the same goal but measuring different things. This may not be super apparent at first read, but under the hood comparing studies like these is often a little more complicated than it first looks. Of course these differences have big implications when trying to compare results. I’m afraid that the same is true comparing my work with Pettit et al. – we are trying to answer the same question, but measuring different things (in different ways) in order to do so.
Specifically, Pettit el al. measured:
- the number of non-compiling submissions; similar to did Denny et al., but unlike me
- the number of successive non-compiling submissions that produced the same error message; Denny et al. measured the number of consecutive non-compiling submissions regardless of why the submission didn’t compile, and I measured the number of consecutive errors generating the same error message, on the same line of the same file
- the number of submission attempts (in an effort to measure student progress)
- time between submissions; neither Denny et al. nor I measured time-based metrics
I also did a fairly detailed comparison between my work and Denny et al. in  (page 10). In that study we directly compared some effects of enhanced and non-enhanced error messages:
In this study we directly distinguish between two sets of compiler error messages (CEMs), the 30 that are enhanced by Decaf and those that are not. We then explore if the control and intervention groups respond differently when they are presented with these. For CEMs enhanced by Decaf the control and intervention groups experience different output. The intervention group, using Decaf in enhanced mode, see the enhanced and raw javac CEMs. The control group, using Decaf in pass-through mode, only see the raw javac CEMs. Thus for CEMs not enhanced by Decaf, both groups see the same raw CEMs. This provides us with an important subgroup within the intervention group, namely when the intervention group experiences errors generating CEMs not enhanced by Decaf. We hypothesized that there would be no significant difference between the control and intervention groups when looking at these cases for which both groups receive the same raw CEMs. On the other hand, if enhancing CEMs has an effect on student behavior, we would see a significant difference between the two groups when looking at errors generating the 30 enhanced CEMs (due to the intervention group receiving enhanced CEMs and the control group receiving raw CEMs).
As mentioned, the metrics used by Pettit et al. and Denny et al. are more common to each other than to mine. Pettit et al. and Denny et al. both used metrics based on submissions (that is, programs) submitted by students, or the number of submission attempts. This certainly makes comparing their studies more straight-forward. However it is possible that these metrics are too ‘far from the data’ to be significantly influenced by enhanced error messages. It is possible that metrics simply based on the programming errors committed by students, and the error messages generated by these errors are more ‘basic’ and more sensitive.
Another consideration when measuring submissions is that just because a submission compiles, does not mean that it is correct or does what was intended. It is possible that some students continue to edit (and possibly generate errors) after their first compiling version, or after they submit an assignment. These errors should also be analyzed. I think that in order to measure if enhancing error messages makes a difference to students we should focus on all programming activity. I’m afraid that otherwise, the results may say more about the tool (that enhances error messages) and the way that tool was used by students, than about the effects of enhanced error messages themselves. I am sure that in some of my research this is also true – after all my students were using a tool also, and this tool has its own workings which must generate effects. Isolating the effects of the tool from the effects of the messages is challenging.
I am very glad to see more work in this area. I think it is important, and I don’t think it is even close to being settled. I have to say I really feel that the community is working together to do this. It’s great! In addition there may be more to do than determine if enhanced compiler errors make a difference to students. We have overwhelming evidence that syntax poses barriers to students. We have a good amount of evidence that students think that enhancing compiler error messages makes a positive difference. Some researchers think it should too. If enhancing compiler error messages doesn’t make a difference, we need to find out why, and we need to explain the contradiction this would pose. On the other hand, if enhancing compiler error messages does make a difference we need to figure out how to do it best, which would also be a significant challenge.
I hope to present some new evidence on this soon. I haven’t analyzed the data yet, and I don’t know which way this study is going to go. The idea for this study came from holding my previous results up to the light and looking at them from quite a different angle. I feel that one of the biggest weaknesses in my previous work was that the control and treatment groups were separated by a year – so that is what I eliminated. The new control and treatment groups were taking the same class, on the same day – separated only by lunch break. Fortuitously, due to a large intake CP1 was split into two groups for the study semester, but was taught by the same lecturer in the exact same way – sometimes things just work out!
I will be at ITiCSE 2017 and SIGCSE 2018 (and 2019 for that matter – I am happy to be serving a two year term as workshop co-chair). I hope to attend some other conferences also but haven’t committed yet. I look forward to continuing the discussion on the saga of enhancing compiler error messages with anyone who cares to listen! In the meantime here are a few more posts where I discuss enhancing compiler error messages – comments are welcome…
 Raymond S. Pettit, John Homer, and Roger Gee. 2017. Do Enhanced Compiler Error Messages Help Students?: Results Inconclusive.. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’17). ACM, New York, NY, USA, 465-470. DOI: https://doi.org/10.1145/3017680.3017768
 Paul Denny, Andrew Luxton-Reilly, and Dave Carpenter. 2014. Enhancing syntax error messages appears ineffectual. In Proceedings of the 2014 conference on Innovation & technology in computer science education (ITiCSE ’14). ACM, New York, NY, USA, 273-278. DOI: http://dx.doi.org/10.1145/2591708.2591748
 Brett A. Becker. 2016. An Effective Approach to Enhancing Compiler Error Messages. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE ’16). ACM, New York, NY, USA, 126-131. DOI: https://doi.org/10.1145/2839509.2844584
full-text available to all with link available at www.brettbecker.com/publications
 Brett A. Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle Goslin, Catherine Mooney. 2106. Effective Compiler Error Message Enhancement for Novice Programming Students, Computer Science Education 26(2-3), pp. 148-175; http://dx.doi.org/10.1080/08993408.2016.1225464
full-text available to all at www.brettbecker.com/publications