The topics below are possible student projects for a coursework masters (MIT program) or MRes degree. Contact the relevant academic (Anthony Sloane, Matt Roberts or Dominic Verity as noted) if you would like to discuss the possibility of undertaking one of these projects. Also contact us if you are interested in a programming languages project but do not see one here that is of interest.
Kiama and Scala-related projects
Kiama is a Scala-based library for software language processing that has been under development at Macquarie since 2008. The aim of Kiama is to make it easier to build software such as compilers, interpreters, static analysis tools, or other software that processes structured information.
These projects would suit students with an interest in languages and programming. Good skills in object-oriented programming and background in language implementation (for example, from COMP332 Programming Languages) will be useful. Scala and Kiama are the implementation platform used in COMP332.
A better CTags via Island Grammars in Kiama (Roberts)
Island grammars are a way of robustly extracting information from source code which may not be completely correct and where only part of the structure of the source code is required. CTags is a tool for extracting the location of object/class/variable definitions from source code, which is most often used for source code navigation in advanced text editors. CTags suffers from a number of deficiencies due to its reliance on regular expressions to find source code features. In this project you would implement island grammars in Kiama for a few important languages (at least including Scala) and build a tool around those grammars which could be used to:
- find the definition of objects/classes/variables in source code (as in CTags).
- return the complete definition of those source features when asked (which CTags can't do).
Optionally, you can integrate this into an existing editor such as Vim, TextMate, Emacs or Sublime Text for demonstration purposes, but even without this the tool can be demonstrated using "reverse literate programming", which Matt can explain more if you are interested.
Abstract State Machines (Sloane)
Abstract State Machines (http://www.eecs.umich.edu/gasm/) are a formal method for specifying computation and have been applied to a wide range of problem areas. From our perspective one of the most interesting applications is to the specification of the Java language and the Java virtual machine, published in a landmark 2001 book by Stark, Schmid, and Borger (http://www.eecs.umich.edu/gasm/papers/jbook.html). This project will use a Scala embedding of abstract state machines that we have developed for Kiama to encode all or some of the Java/JVM specification.
Attributes for HTML layout (Roberts)
We have a prototype implementation of HTML rendering using attribute grammars, but we don't even come close to covering all HTML's capabilities. There are still outstanding research questions related to layout and rendering of floats, positioned elements, including a z-index, etc. This project will attempt to find attribute grammars to implement each of the features which exist in HTML, but don't yet exist in our prototype. The project is done in Scala with the Kiama attribute grammar library.
Bottom-up Rewriting (Sloane)
There is a quite a bit of theory and some nice tools built around the idea of bottom-up rewriting of trees (sometimes called tree parsing). The idea is to match patterns against a tree in order to "cover" it in the least possible way (relative to some cost measure). One application of this is to select assembly language instructions in a compiler, where the tree is a representation of an expression and the patterns encode the instructions that are available. The cost measure in this case might by the estimated execution time of the chosen instructions. This project will develop and evaluate an extension to the Kiama library that supports bottom-up rewriting.
Concrete object syntax for Scala (Sloane)
When processing abstract syntax trees in Kiama, we must use standard Scala notations for building trees and pattern matching against them to extract components. While the Scala notations are quite terse and convenient, for some applications it is desirable to use concrete syntax to express these trees. For example, instead of saying
Add (Var ("v"), Num (1)) to express an integer addition node in the tree, we might prefer to say something like
v + 1. This kind of notation is called concrete object syntax because it allows us to use the concrete syntax for some language (in this case, the language of arithmetic expressions) to express objects in another language (in this case, Scala). The brackets
... are used in this case to indicate that the code enclosed is in the expression language. Alternative notations are possible. This project will develop an approach and implementation for concrete object syntax support for Scala in Kiama and evaluate it using typical Kiama examples.Most likely the best approach will involve new features of Scala including macros and string interpolation.
Declarative Debugging with Kiama Profiling (Roberts)
Debugging declarative computations (such as pure functional, attribute grammars or term rewriting) is difficult because evaluation order is implicit. It can be difficult, or even impossible, for the developer to determine the actual flow of information in a particular execution. Kiama has sophisticated profiling capabilities which can be used to show the execution flow. In this project, you would expand the capabilities of Kiama's profiling to include
- animations of run-time behaviour
- break-point debugging of run-time behaviour
- different types of static representations of run time behaviour.
These tools will greatly enhance the ability of term-rewriters and attribute grammar authors to debug their code, and would be best-of-breed debugging tools for declarative representations of computation.
Execution Monitoring for Kiama
Execution monitoring is a general term for processes that watch the execution of a program and report about that execution. Included in the term are debugging and profiling. This project will design and build a modern execution monitoring system for the Kiama library. The aim would be to present information about the execution of Kiama-based programs in domain-specific terms, i.e., at the level of the specifications such as grammars and tree structure that a Kiama programmer uses, rather than at the level of basic Scala data and code.
We have previous support with building this sort of system for the Eli language processing system in the Noosa tool. Initially the plan would be to analyse that previous system and decide which ideas can transfer to the Kiama setting. Then we would extend the ideas as necessary to develop a full spectrum tool. Since we use Kiama in our undergraduate unit COMP332 Programming Languages we can use that unit as a testbed for the ideas developed by this project.
Indentation Sensitive Packrat Parsers (Roberts)
Michael Adams has shown how to better integrate indentation sensitivity into a range of parsers. His latest work shows how to do this for PEG parsers. However, this only applies to PEG parsers without memorisation. In this project the student will seek to extend Adams' work to include such "packrat" parsers. The student will implement their work in the Kiama language processing library.
Polymorphic Recursion and Higher Ranked types for Generic Programming (Roberts)
We believe that polymorphic recursion and higher ranked types are necessary to achieve generic programming in statically typed functional programming languages, but no one has conclusively shown this to be so. This project would survey existing approached to generic programming and attempt to discern where both these features are used. If we get a 100% success rate for the chosen approaches we will have created some evidence of our hypothesis.
Scala-based dependently-typed programming language (Sloane)
Interest in so-called dependently-typed programming languages is rising (see also projects under the Software Verification heading). These languages allow many program constraints to be expressed in powerful types so that programs can be statically verified to obey them.
At the moment dependent typing is still a niche concept but languages that support it (e.g., Agda and Idris) are gaining increasing exposure. Dependent typing is also at the core of many proof assistants including Coq and Isabelle. To date, the majority of dependently-typed languages have had a functional programming background and their syntax and semantics is often based on an existing functional language such as Haskell or ML. This project will investigate bringing these ideas more into the mainstream by developing a dependently-typed language based on the ideas of the popular Scala programming language. At least aspects of the Scala syntax would be adopted but each of the main semantic features of Scala would be considered for use in the new language where appropriate. Of particular interest would be an investigation of the utility of an object-oriented view of data as seen in Scala for a dependently-typed system.
Constraining types in an explicitly typed calculus (Roberts)
System Fc is a very influential abstract machine for expressing functional programs with strong typing constraints. It is used as the core language of GHC and there has been proven to be an extremely successful way of encoding explicitly typed functional programs. The aim of this project is to experiment with alternative ways of encoding the type equality constraints in Fc. These alternate encodings are likely to have uses in data type generic programming in particular.
Software Development Practices and Tools
Author annotation effectiveness in code reviews (Roberts)
Previous work has shown that author annotation can reduce the number of defects found during code review. This project aims to answer the question "To what extent is this because the annotations reduce the efficacy of the code review and to what extent is this because the code itself has fewer defects?"
Lightweight Integrated Development Environments (Sloane)
Integrated development environments (IDEs) are widely used in software development. Unfortunately, it is non-trivial to extend an IDE to add support for new languages, analyses, or code generators. Usually such extensions can only be done by large groups with significant resources or with special access to the internals of the IDE framework such as Eclipse, Netbeans or Intellij IDEA. It is common for IDE support for one language to be largely unusable in the IDE support for another language without significant effort, even if the languages are similar.
This project will develop techniques for building lightweight IDEs where the key components of the IDE are easily replaceable and adapted instead of being part of a large, complex infrastructure. The aim is to make it easy for developers to add their own IDE operations that cooperate with others without having to invest large amounts of time becoming experts in a large IDE infrastructure. A prototype will be built to enable our Kiama language processing library to be used to help build these IDE components.
Formal Verification of Programs and Languages
Formal verification involves using discrete mathematical notations to specify what software is supposed to do. Tools such as proof assistant, checkers or theorem provers are then used to formally check that the software specifications have desired properties and establish a connection between the specification and an implementation.
These projects aim to conduct real verification of non-trivial software. Some prior experience with software verification of some kind is useful. For example, students wishing to undertake these projects should strongly consider taking the MRes unit current called COMP796 Advanced Topics in Theory and Practice of Software which teaches the Coq proof assistant.
Verification of Software Language Processing (Sloane)
Many systems and tools exist to build software language processors such as formatter, compilers and static analysis tools from high-level notations. E.g., we can use techniques such as attribute grammars to describe semantic analyses such as name and type analysis. We are developing our Kiama library to implement these kind of notation (see above).
But how do we know that our attribute grammars etc are doing the right thing? We can test, but can we prove that they are correct ? What does it mean for them to be correct? What is the specification? For name analysis we might specify that various static properties of name scoping would be obeyed such as constraints on the way in which nested scopes are handled.
This topic will investigate this area by developing specification techniques for language processing tasks. We will connect these specifications to a formal proof system such as Coq to enable us to write language processing implementations that can be proved to be correct. Ideally we would obtain a way to formally prove that Kiama-based components were doing the correct things.
Evaluating Dependently Typed Programming in AGDA (Verity)
On the one hand, AGDA is a dependently typed functional programming language, which owes much of its design to popular mainstream strongly typed languages such as ML and Haskell. On the other, AGDA is a proof assistant in which users may formulate mathematical propositions and construct machine verified proofs.
In the past dependently typed languages have mainly been used to construct formal mathematical proofs. For example the Coq language language has even been used (by Georges Gonthiers of Microsoft Research) to provide a completely machine checked proof of the infamous four colour theorem. Consequently, they have tended to be designed with the needs of mathematicians rather than programmers in mind. The AGDA designers, however, have consciously taken a different approach and have designed their language to meet the needs of functional programmers first and foremost. As such, it is claimed that AGDA provides a more comfortable and flexible environment for the development of formally verified programs.
Now this is an important claim, since formal program verification is known to be a very difficult art to master while being a vitally important component of mission critical applications development. After all, who wants to live next to a nuclear reactor whose software has not been engineered to the very highest standard or trust their life to an aircraft whose fly-by-wire systems have not undergone the most rigorous formal analysis?
We would like to examine the extent to which AGDA lives up to its design goals. In particular, we propose to put AGDA through its paces by using it to develop some simple verified software examples drawn from a range of different application areas.
The outcome of such a project would be a fully documented and verified piece of example software written in AGDA. Beyond the development of this code, you will be required to document your expected outcomes, comment fully upon any engineering decisions you made or pitfalls you may have encountered and describe the extent to which your code met your initial expectations. Examples of the kinds of example developments that we are thinking of may be found at http://web.student.chalmers.se/groups/datx02-dtp and at http://web.student.chalmers.se/~stevan/types.
If you would like to find out more about AGDA itself before picking this project, you might like to download its implementation from http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Main.HomePage and have a go at working through a little of Ulf Norell's tutorial at http://www.cse.chalmers.se/~ulfn/papers/afp08/tutorial.pdf.
Implementing Functional Programming Languages
Quantify the benefits of maximal letrec sharing (Roberts)
Maximal sharing in letrec expressions (video) claims to be able to "speed up execution". While we don't doubt this claim at all, compiler optimisations are routinely exposed to real world benchmark suites to help quantify the extent to which they are effective and in what types of programs they are most effective. In this project the student will implement maximal letrec sharing in the dgen compiler and run standard benchmark programs to generate just this kind of quantitative data.