The enhancing compiler error messages saga: the saga continues

I was sorry to have missed Raymond Pettit, John Homer and Roger Gee presenting the latest installment of what is becoming the enhanced compiler error messages saga at SIGCSE earlier this month. Their paper “Do Enhanced Compiler Error Messages Help Students?: Results Inconclusive” [1] was two-pronged. It contrasted the work of Denny et al. [2] (which provided evidence that compiler error enhancement does not make a difference to students) and my SIGCSE 2016 paper [3] (which provided evidence that it does). It also provided fresh evidence that supports the work of Denny et al. I must say that I like Pettit et al.’s subtitle: “Results Inconclusive”. I don’t think that this is the final chapter in the saga. We need to do a lot more work on this.

Early studies on this often didn’t include much quantifiable data. More recent studies haven’t really been measuring the same things – and they have been measuring these things in different ways. In other words, the metrics, and the methodologies differ. It’s great to see work like that of Pettit et al. that is more comparable to previous work like that of Denny et al.

One of the biggest differences between my editor, Decaf, and Pettit et al.’s tool, Athene, is that Decaf was used by students for all of their programming – practicing, working on assignments, programming for fun, even programming in despair. For most of my students it was the only compiler they used – so they made a lot of errors, and they all were logged. Unlike Denny et al., my students did not receive skeleton code – they were writing programs, often from scratch. On the other hand, Athene was often utilized by students after developing their code on their own local (un-monitored) compilers. Thus, many errors generated by the students in the Pettit et al. study were not captured. Often, the code submitted to Athene was already fairly refined. Pettit et al. even have evidence from some of their students that at times the code submitted to Athene only contained those errors that the students absolutely could not rectify without help.

As outlined in this post, Denny et al. and I were working towards the same goal but measuring different things. This may not be super apparent at first read, but under the hood comparing studies like these is often a little more complicated than it first looks. Of course these differences have big implications when trying to compare results. I’m afraid that the same is true comparing my work with Pettit et al. – we are trying to answer the same question, but measuring different things (in different ways) in order to do so.

Specifically, Pettit el al. measured:

  1. the number of non-compiling submissions; similar to did Denny et al., but unlike me
  2. the number of successive non-compiling submissions that produced the same error message; Denny et al. measured the number of consecutive non-compiling submissions regardless of why the submission didn’t compile, and I measured the number of consecutive errors generating the same error message, on the same line of the same file
  3. the number of submission attempts (in an effort to measure student progress)
  4. time between submissions; neither Denny et al. nor I measured time-based metrics

I also did a fairly detailed comparison between my work and Denny et al. in [4] (page 10). In that study we directly compared some effects of enhanced and non-enhanced error messages:

In this study we directly distinguish between two sets of compiler error messages (CEMs), the 30 that are enhanced by Decaf and those that are not. We then explore if the control and intervention groups respond differently when they are presented with these. For CEMs enhanced by Decaf the control and intervention groups experience different output. The intervention group, using Decaf in enhanced mode, see the enhanced and raw javac CEMs. The control group, using Decaf in pass-through mode, only see the raw javac CEMs. Thus for CEMs not enhanced by Decaf, both groups see the same raw CEMs. This provides us with an important subgroup within the intervention group, namely when the intervention group experiences errors generating CEMs not enhanced by Decaf. We hypothesized that there would be no significant difference between the control and intervention groups when looking at these cases for which both groups receive the same raw CEMs. On the other hand, if enhancing CEMs has an effect on student behavior, we would see a significant difference between the two groups when looking at errors generating the 30 enhanced CEMs (due to the intervention group receiving enhanced CEMs and the control group receiving raw CEMs).

As mentioned, the metrics used by Pettit et al. and Denny et al. are more common to each other than to mine. Pettit et al. and Denny et al. both used metrics based on submissions (that is, programs) submitted by students, or the number of submission attempts. This certainly makes comparing their studies more straight-forward. However it is possible that these metrics are too ‘far from the data’ to be significantly influenced by enhanced error messages. It is possible that metrics simply based on the programming errors committed by students, and the error messages generated by these errors are more ‘basic’ and more sensitive.

Another consideration when measuring submissions is that just because a submission compiles, does not mean that it is correct or does what was intended. It is possible that some students continue to edit (and possibly generate errors) after their first compiling version, or after they submit an assignment. These errors should also be analyzed. I think that in order to measure if enhancing error messages makes a difference to students we should focus on all programming activity. I’m afraid that otherwise, the results may say more about the tool (that enhances error messages) and the way that tool was used by students, than about the effects of enhanced error messages themselves. I am sure that in some of my research this is also true – after all my students were using a tool also, and this tool has its own workings which must generate effects. Isolating the effects of the tool from the effects of the messages is challenging.

I am very glad to see more work in this area. I think it is important, and I don’t think it is even close to being settled. I have to say I really feel that the community is working together to do this. It’s great! In addition there may be more to do than determine if enhanced compiler errors make a difference to students. We have overwhelming evidence that syntax poses barriers to students. We have a good amount of evidence that students think that enhancing compiler error messages makes a positive difference. Some researchers think it should too. If enhancing compiler error messages doesn’t make a difference, we need to find out why, and we need to explain the contradiction this would pose. On the other hand, if enhancing compiler error messages does make a difference we need to figure out how to do it best, which would also be a significant challenge.

I hope to present some new evidence on this soon. I haven’t analyzed the data yet, and I don’t know which way this study is going to go. The idea for this study came from holding my previous results up to the light and looking at them from quite a different angle. I feel that one of the biggest weaknesses in my previous work was that the control and treatment groups were separated by a year – so that is what I eliminated. The new control and treatment groups were taking the same class, on the same day – separated only by lunch break. Fortuitously, due to a large intake CP1 was split into two groups for the study semester, but was taught by the same lecturer in the exact same way – sometimes things just work out!

I will be at ITiCSE 2017 and SIGCSE 2018 (and 2019 for that matter – I am happy to be serving a two year term as workshop co-chair). I hope to attend some other conferences also but haven’t committed yet. I look forward to continuing the discussion on the saga of enhancing compiler error messages with anyone who cares to listen! In the meantime here are a few more posts where I discuss enhancing compiler error messages – comments are welcome…

[1] Raymond S. Pettit, John Homer, and Roger Gee. 2017. Do Enhanced Compiler Error Messages Help Students?: Results Inconclusive.. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’17). ACM, New York, NY, USA, 465-470. DOI: https://doi.org/10.1145/3017680.3017768

[2] Paul Denny, Andrew Luxton-Reilly, and Dave Carpenter. 2014. Enhancing syntax error messages appears ineffectual. In Proceedings of the 2014 conference on Innovation & technology in computer science education (ITiCSE ’14). ACM, New York, NY, USA, 273-278. DOI: http://dx.doi.org/10.1145/2591708.2591748

[3] Brett A. Becker. 2016. An Effective Approach to Enhancing Compiler Error Messages. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (SIGCSE ’16). ACM, New York, NY, USA, 126-131. DOI: https://doi.org/10.1145/2839509.2844584

full-text available to all with link available at www.brettbecker.com/publications

[4] Brett A. Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle Goslin, Catherine Mooney. 2106. Effective Compiler Error Message Enhancement for Novice Programming Students, Computer Science Education 26(2-3), pp. 148-175; http://dx.doi.org/10.1080/08993408.2016.1225464

full-text available to all at www.brettbecker.com/publications

‘Supercomputing’ in the curriculum

A recent article on ComputerWeekly.com is calling for supercomputing to be put ‘in the curriculum’. In it, Tim Stitt, head of scientific computing at the Earlham Institute, a life science institute in Norwich, UK, says children should be learning supercomputing and data analysis concepts from a young age.

Although I agree in principle, the article doesn’t specify a particular curricula although it does seem to be aimed at pre-university ages. In the article, Stitt claims that current initiatives such as the new computing curriculum introduced in the UK in 2014 which makes it mandatory for children between the ages of five and 16 to be taught computational thinking, may “compound the issue”, as children will be taught serial rather than parallel programming skills, making supercomputing concepts harder to learn later on. Again, I can agree in principle, but the extent to which learning parallel programming after learning ‘normal’ sequential programming is debatable, and will certainly vary considerably from student to student.

I have mixed feeling about the word supercomputing. I can imagine someone saying “Really? You are going to teach supercomputing to kids? Don’t you think that’s a bit much?” I couldn’t blame them for being skeptical. The word itself sounds, well, super. Personally I think that High Performance Computing (HPC) is more down to earth, but I also concede that that may still sound a little ‘super’. I have some experience with this. I am one of many that maintain the Irish Supercomputer List. That project didn’t start off as the Irish Supercomputer List, but we changed the name in order to, quite frankly, be more media ‘friendly’. (Side note – interesting discussion on disseminating scientific work to the media here).  Additionally, the Indian and Russian lists also have the word supercomputing in their names and/or URLs. The Top500 list also used the word supercomputing before they rebranded a few years back. Anyway…

So, what we are really talking about is putting Parallel Computing (or parallel programming) in the curriculum, and therefore opening the door to supercomputing, as almost all HPC installations require parallel programming. In fact the current Top500 Supercomputer List is composed entirely of clusters (86.2%) or Massively Parallel Processors (MPPs – 13.8%). Clusters are parallel computer systems comprising an integrated collection of independent nodes, each of which is a system in its own right, capable of independent operation and derived from products developed and marketed for other stand-alone purposes [1]. MPPs (such as the IBM Blue Gene) on the other hand, are more tightly-integrated. Individual nodes cannot run on their own and they are frequently connected by custom high-performance networks. They key here is that in both cases memory is distributed (as are the cores), thus requiring parallel algorithms (and therefore parallel programming).  Before switching gears I would like to return to the point I opened this paragraph with – we are talking about parallel programming – not necessarily supercomputing – although learning parallel computing is indeed the essential requirement to eventually program supercomputers.

At the university level, there is more than an awareness of the issues that form the core of the argument which is the focus of the article that I started this post with. In particular there are two conferences/workshops that directly address HPC education at university level:

  1. Workshop on Education for High-Performance Computing (EduHPC-16), held in conjunction with SC-16: The International Conference on High Performance Computing, Networking, Storage, and Analysis
  2. The Parallel and Distributed Computing Education for Undergraduate Students Workshop (Euro-EDUPAR 2016), held in conjunction with Euro-Par 2016, the 22nd International European Conference on Parallel and Distributed Computing.

[1] Dongarra, J., Sterling, T., Simon, H. and Strohmaier, E., 2005. High-performance computing: clusters, constellations, MPPs, and future directions. Computing in Science and Engineering, 7(2), pp.51-59

ITiCSE 2016 mini report & photos

It’s the final day of the 21st Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE) here in Arequipa, Peru. It has been a really informative conference in a stunning location. There was a great amount of diversity this year with attendees from 37 countries, and the first time that ITiCSE was held outside Europe. I met a lot of Latin/South American delegates who normally can’t make it to ITiCSE in Europe which was great. Alison, Ernesto, and the rest of the committee really pulled out all the stops. From top to bottom – venue, organization, amazingly thought out excursions, and the spectacular conference dinner – they left covered every detail. And of course the program was packed full of great CSEd research.

The venue is the Universidad Católica San Pablo, who have also put a load of pictures up on their Facebook page, and have a nice article on their webpage (in Espanol) – Headline translation: Experts from more than 35 countries meet in Arequipa to analyze education in Computer Science.

For most of the conference I have been occupied with my working group (WG6) – see here for abstract. I also presented a paper A New Metric to Quantify Repeated Compiler Errors for Novice Programmers. Being pretty busy I haven’t been able to compile much in terms of an in person report of many papers, but here I present my take on three before we get to the photos. All three of these talks really impacted (and at times challenged) my perceptions of some CSED topics.

Mehran Sahami‘s keynote Statistical Modeling to Better Understand CS Students (abstract here) was very insightful. It considered developing statistical models to give insight into the dynamics of student populations. The first case study focused on gender balance and demonstrated that focusing on simple metrics such as percentages can be misleading, and that there are better ways to capture how program changes are impacting the dynamics of gender balance. The second looked at the performance of populations that are experiencing rapid growth. This case study showed one answer to the common statement/observation “the number of weak students is increasing”, and the answer was somewhat surprising – performance during a stage of unprecedented growth was quite stable.

Andrew Luxton-Reilly gave an excellent talk on his paper Learning to Program is Easy. This really challenges the notion that ‘programming is hard’ which is upheld by much of the literature and he argues, the community. This talk really caused me to reconsider my own beliefs about teaching programming, and to question my own expectations, assessment, module learning outcomes, and even program learning outcomes.

Mark Zarb presented a paper he authored with Roger McDermott, Mats Daniels, Asa Cajander and Tony Clear titled Motivation, Optimal Experience and Flow in First Year Computing Science. The authors examined motivation from the perspective of Self Determinism Theory and also considered the optimal state known as Flow – also colloquially known as being in the zone. After discussion how these concepts can be measured they presented preliminary results looking at motivation and flow in a first year computing class. The results were extremely encouraging and made me realize that there is a lot we don’t know about how students think ‘unconsciously’ while learning.

And now, the promised pictures. Don’t forget that there are many more here. I also included some photos from the conference bus tour which to say was all-encompassing is an understatement. We stopped about a half-dozen times at places like town squares, miniature farms (with alpacas of course) and a charming restored colonial-era mansion.

See you in Bolognia for ITiCSE 2017!

A1
Welcome
A8
Mehran Sahami’s keynote
WG6
Working Group 6!
A7
Main auditorium
A2
Lunch tent
A3
Lunch tent
A4
Lunch tent
A6
View from outside the lunch tent with the help of a little zoom
A5
View from outside the lunch tent with the help of a little zoom
C1
Bus tour
C2
Bus tour
C3
Bus tour
C4
Bus tour
C5
Bus tour
C6
Bus tour – La Mansion del Fundador
C7
Bus tour – La Mansion del Fundador
C8
Llama? Alpaca? I really should know by now…
D1
Tour of Santa Catalina Monastery
D2
Tour of Santa Catalina Monastery
D3
Conference Dinner and Peruvian entertainment
E1
Basílica Catedral de Arequipa
E2
Basílica Catedral de Arequipa
E3
Plaza de Armas de Arequipa
E4
Plaza de Armas de Arequipa
E5
Alley off Plaza de Armas de Arequipa
F1
Conference closing – more Peruvian entertainment!
F2
Conference closing – the conference committee joins in the dancing
F3
Adios! Off to Cusco and Machu Picchu!

Update – I couldn’t resist adding a few photos I took in Cusco today:

DSCN0325

DSCN0391

DSCN0554

DSCN0587

DSCN0419

DSCN0686

DSCN0790

And a parting shot from Machu Picchu Mountain…

ITiCSE Machu

Misleading, cascading Java error messages

I have been working with enhancing Java error messages for a while now, and I have stared at a lot of them. Today I came across one that I don’t think I’ve consciously seen before, and it’s quite a doozy if you are a novice programmer. Below is the code, with a missing bracket on line 2:

public class Hello {
       public static void main(String[] args)  //missing {
              double i;
              i = 1.0;
              System.out.println(i);
       }
}

The standard Java output in this case is:

C:\Users\bbecker\Desktop\Junk\Hello.java:2: error: ';' expected
       public static void main(String[] args)
                                             ^

C:\Users\bbecker\Desktop\Junk\Hello.java:4: error: <identifier> expected
              i = 1.0;
               ^

C:\Users\bbecker\Desktop\Junk\Hello.java:5: error: <identifier> expected
              System.out.println(i);
                                ^

C:\Users\bbecker\Desktop\Junk\Hello.java:5: error: <identifier> expected
              System.out.println(i);
                                  ^

C:\Users\bbecker\Desktop\Junk\Hello.java:7: error: class, interface, or enum expected
}
^

5 errors

Process Terminated ... there were problems.

Amazing. This is telling the student that there were 5 errors (not one), and none of the five reported errors are even close to telling the student that there is a missing bracket on line 2. If the missing bracket is supplied, all five “errors” are resolved.

During my MA in Higher Education I developed an editor that enhances some Java error messages, and I have recently published some of this work at SIGCSE (see brettbecker.com/publications). I hope to get some more work on this front  soon, and in addition I would like to look more deeply at what effects cascading error messages have on novices. I can imagine that if I had no programming experience, was learning Java, and came across the above I would probably be pretty discouraged.

The enhanced error that my editor would provide for the above code, which would be reported side-by-side with the above Java error output is:

Looks like a problem on line number 2.

Class Hello has 1 fewer opening brackets '{' than closing brackets '}'.