Analysis While the legal and ethical implications of assistive AI models like Github’s Copilot continue to be sorted out, computer scientists continue to find uses for large language models and urge educators to adapt.
Brett A. Becker, assistant professor at University College Dublin in Ireland, provided The Register with pre-publication copies of two research papers exploring the educational risks and opportunities of AI tools for generating programming code.
The papers have been accepted at the 2023 SIGCSE Technical Symposium on Computer Science Education, to be held March 15 to 18 in Toronto, Canada.
In June, GitHub Copilot, a machine learning tool that automatically suggests programming code in response to contextual prompts, emerged from a year long technical preview, just as concerns about the way its OpenAI Codex model was trained and the implications of AI models for society coalesced into focused opposition.
Beyond the unresolved copyright and software licensing issues, other computer scientists, such as University of Massachusetts Amherst computer science professor Emery Berger, have raised the alarm about the need to reevaluate computer science pedagogy in light of the expected proliferation and improvement of automated assistive tools.
In “Programming Is Hard – Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation” [PDF], Becker and co-authors Paul Denny (University of Auckland, Australia), James Finnie-Ansley (University of Auckland), Andrew Luxton-Reilly (University of Auckland), James Prather (Abilene Christian University, USA), and Eddie Antonio Santos (University College Dublin) argue that the educational community needs to deal with the immediate opportunities and challenges presented by AI-driven code generation tools.
They say it’s safe to assume that computer science students are already using these tools to complete programming assignments. Hence, policies and practices that reflect the new reality have to be hashed out sooner rather than later.
“Our view is that these tools stand to change how programming is taught and learned – potentially significantly – in the near-term, and that they present multiple opportunities and challenges that warrant immediate discussion as we adapt to the use of these tools proliferating,” the researchers state in their paper.
These tools stand to change how programming is taught and learned – potentially significantly – in the near-term
The paper looks at several of the assistive programming models currently available, including GitHub Copilot, DeepMind AlphaCode, and Amazon CodeWhisperer, as well as less publicized tools such as Kite, Tabnine, Code4Me, and FauxPilot.
Observing that these tools are moderately competitive with human programmers – eg, AlphaCode ranked among the top 54 percent of the 5,000 developers participating in Codeforces programming competitions – the boffins say AI tools can help students in various ways. This includes generating exemplar solutions to help students check their work, generating solution variations to expand how students understand problems, and improving student code quality and style.
The authors also see advantages for educators, who could use assistive tools to generate better student exercises, to generate explanations of code, and to provide students with more illustrative examples of programming constructs.
In addition to potential opportunities, there are challenges that educators need to address. These problem-solving, code-emitting tools could help students cheat more easily in assignments; the private nature of AI tool usage reduces some of the risk of enlisting a third party to do one’s homework.
The researchers also observe that how we think about attribution – central to the definition of plagiarism – may need to be revised because assistive options can provide varying degrees of help, making it difficult to separate allowable from excessive assistance.
“In other contexts, we use spell-checkers, grammar-checking tools that suggest rewording, predictive text and email auto-reply suggestions – all machine-generated,” the paper reminds us. “In a programming context, most development environments support code completion that suggests machine-generated code.
We use spell-checkers, grammar-checking tools that suggest rewording…
“Distinguishing between different forms of machine suggestions may be challenging for academics, and it is unclear if we can reasonably expect introductory programming students who are unfamiliar with tool support to distinguish between different forms of machine-generated code suggestions.”
The authors say this raises a key philosophical issue: “How much content can be machine-generated while still attributing the intellectual ownership to a human?”
They also highlight how AI models fail to meet the attribution requirements spelled out in software licenses and fail to answer ethical and environmental concerns about the energy used to create them.
The benefits and harms of AI tools in education need to be addressed, the researchers conclude, or educators will lose the opportunity to influence the evolution of this technology.
And they have little doubt it’s here to stay. The second paper, “Using Large Language Models to Enhance Programming Error Messages,” [PDF] offers an example of the potential value of large language models like Open AI’s Codex, the foundation of Copilot.
Authors Juho Leinonen (Aalto University), Arto Hellas (Aalto University), Sami Sarsa (Aalto University), Brent Reeves (Abilene Christian University), Paul Denny (University of Auckland), James Prather (Abilene Christian University), and Becker have applied Codex to typically cryptic computer error messages and found that the AI model can make errors easier to understand, by offering a plain English description – which benefits both teachers and students.
“Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability,” the boffins state in their paper.
For example, Python might emit the error message: “SyntaxError: unexpected EOF while parsing.” Codex, given the context of the code involved and the error, would suggest this description to help the developer: “The error is caused because the block of code is expecting another line of code after the colon. To fix the issue, I would add another line of code after the colon.”
However, the findings of this study say more about promise than present utility. The researchers fed broken Python code and corresponding error messages into the Codex model to generate explanations of the issues, and evaluated those descriptions for: comprehensibility; unnecessary content; having an explanation; having a correct explanation; having a fix; the correctness of the fix; and value added from the original code.
The results varied significantly across these categories. Most were comprehensible and contained an explanation, but the model offered correct explanations for certain errors far more successfully than others. For example, the error “can’t assign function call” got explained correctly 83 percent of the time while “unexpected EOF] while parsing” was explained properly only 11 percent of the time. And the average overall error message fix was correct only 33 percent of the time.
“Altogether, the evaluators considered that the Codex-created content, i.e. the explanation of the error message and the proposed fix, were an improvement over the original error message in slightly over half of the cases (54 percent),” the paper states.
The researchers conclude that while programming error message explanations and suggested fixes generated by large language models are not yet ready for production use and may mislead students, they believe AI models could become adept at addressing code errors with further work.
Expect that work to occupy the tech industry, academia, government, and other interested parties for years to come. ®