Skip to main content

Student Projects

Undergraduate students in the third and fourth years of the Final Honour School of Computer Science, and students for the MSc in Computer Science are required to undertake a project. Mathematics & Computer Science undergraduates are required to undertake a Computer Science project or a Mathematics dissertation in their fourth year. Computer Science & Philosophy undergraduates may choose to undertake a Computer Science project or a Philosophy thesis in their fourth year.

This is your chance to work on your own project, something that interests and inspires you. We have put together a brief guidance document on how to do it; please click on the link below. On this site, you will also find a sortable and searchable list of projects and potential supervisors. You are welcome to choose one of the projects on the list, or approach a potential supervisor and negotiate your own topic.

You will be expected to make arrangements with potential supervisors between weeks 5 and 7 in Hilary Term, for project work for the following year. Please don't contact potential supervisors outside this window.

Limited equipment is available for use in projects, namely LEAP equipment. If you have any requests, please contact the Head of Academic Administration.

Important Deadlines are:

All:
Monday of Week 7, Hilary Term : your online project registration survey must be completed. You will be sent a link.
MSc students:
Monday of Week 1, Trinity Term : submission deadline for your project proposal.
3rd/4th year students:
12pm on Monday of Week 4 of Trinity Term : Submission deadline.
MSc students:
Tuesday of Week -5 (minus 5), Michaelmas Term : Submission deadline

Project writing handbook

Sample projects

3rd year

4th year

List of projects

Suitable for

Research themes

Supervisors

Project Supervisors Themes MSc  Description
Aggregation of Photovoltaic Panels Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc

The increased relevance of renewable energy sources has modified the behaviour of the electrical grid. Some renewable energy sources affect the network in a distributed manner: whilst each unit has little influence, a large population can have a significant impact on the global network, particularly in the case of synchronised behaviour. This work investigates the behaviour of a large, heterogeneous population of photovoltaic panels connected to the grid. We employ Markov models to represent the aggregated behaviour of the population, while the rest of the network (and its associated consumption) is modelled as a single equivalent generator, accounting for both inertia and frequency regulation. Analysis and simulations of the model show that it is a realistic abstraction, and quantitatively indicate that heterogeneity is necessary to enable the overall network to function in safe conditions and to avoid load shedding. This project will provide extensions of this recent research. In collaboration with an industrial partner.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking

Analysis and verification of stochastic hybrid systems Alessandro Abate Automated Verification C MSc

Stochastic Hybrid Systems (SHS) are dynamical models that are employed to characterize the probabilistic evolution of systems with interleaved and interacting continuous and discrete components.

Formal analysis, verification, and optimal control of SHS models represent relevant goals because of their theoretical generality and for their applicability to a wealth of studies in the Sciences and in Engineering.

In a number of practical instances the presence of a discrete number of continuously operating modes (e.g., in fault-tolerant industrial systems), the effect of uncertainty (e.g., in safety-critical air-traffic systems), or both occurrences (e.g., in models of biological entities) advocate the use of a mathematical framework, such as that of SHS, which is structurally predisposed to model such heterogeneous systems.

In this project, we plan to investigate and develop new analysis and verification techniques (e.g., based on abstractions) that are directly applicable to general SHS models, while being computationally scalable.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification

Automated verification of complex systems in the energy sector Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc

Smart microgrids are small-scale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks.

These systems present heterogeneity and complexity coming from 1. local and volatile renewables generation, and 2. the presence of nonlinear dynamics both over continuous and discrete variables. These factors calls for the development of proper quantitative models. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is in particular to focus on the energy production via renewables, such as photovoltaic panels.

The project can benefit form a paid visit/internship to industrial partners.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed.

Bayesian Reinforcement Learning: Robustness and Safe Training Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C In this project we shall build on recent work on ``Safe Learning'' [2], which frames classical RL algorithms to synthesise policies that abide by complex tasks or objectives, whilst training safely (that is, without violating given safety requirements). Tasks/objectives for RL-based synthesis can be goals expressed as logical formulae, and thus be richer than standard reward-based goals. We plan to frame recent work by OXCAV [2] in the context of Bayesian RL, as well as to leverage modern robustness results, as in [3]. We shall pursue both model-based and -free approaches. [2] M. Hasanbeig, A. Abate and D. Kroening, ``Cautious Reinforcement Learning with Logical Constraints,'' AAMAS20, pp. 483-491, 2020. [3] B. Recht, ``A Tour of Reinforcement Learning: The View from Continuous Control,'' Annual Reviews in Control, Vol. 2, 2019.
Development of software for the verification of MPL models Alessandro Abate Automated Verification C MSc

This project is targeted to enhance the software tollbox VeriSiMPL (''very simple''), which has been developed to enable the abstraction of Max-Plus-Linear (MPL) models. MPL models are specified in MATLAB, and abstracted to Labeled Transition Systems (LTS). The LTS abstraction is formally put in relationship with its MPL counterpart via a (bi)simulation relation. The abstraction procedure runs in MATLAB and leverages sparse representations, fast manipulations based on vector calculus, and optimized data structures such as Difference-Bound Matrices. LTS can be pictorially represented via the Graphviz tool and exported to PROMELA language. This enables the verification of MPL models against temporal specifications within the SPIN model checker.

Courses: Computer-Aided Formal Verification, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with dynamical systems, working knowledge of MATLAB and C

Formal verification of a software tool for physical and digital components Alessandro Abate Automated Verification MSc

We are interested in working with existing commercial simulation software that is targeted around the modelling and analysis of physical, multi-domain systems. It further encompasses the integration with related software tools, as well as the interfacing with devices and the generation of code. We are interested in enriching the software with formal verification features, envisioning the extension of the tool towards capabilities that might enable the user to raise formal assertions or guarantees on models properties, or to synthesise correct-by-design architectures. Within this long-term plan, this project shall target the formal generation of faults warnings, namely of messages to the user that are related to ``bad (dynamical) behaviours'' or to unwanted ``modelling errors''. The student will be engaged in developing algorithmic solutions towards this goal, while reframing them within a formal and general approach. The project is inter-disciplinary in dealing with hybrid models involving digital and physical quantities, and in connecting the use of formal verification techniques from the computer sciences with more classical analytical tools from control engineering

Courses: Computer-Aided Formal Verification, Software Verification. Prerequisites: Knowledge of basic formal verification.

Innovative Sensing and Actuation for Smart Buildings Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc Sensorisation and actuation in smart buildings and the development of smart HVAC (heat, ventilation and air-conditioning) control strategies for energy management allow for optimised energy usage, leading to the reduction in power consumption or to optimised demand/response strategies that are key in a rather volatile market. This can further lead to optimised maintenance for the building devices. Of course the sensitisation of buildings leads to heavy requirements on the overall infrastructure: we are interested in devising new approaches towards the concept of using ``humans as sensors''. Further, we plan to investigate approaches to perform meta-sensing, namely to extrapolate the knowledge from physical sensors towards that of virtual elements (as an example, to infer the current building occupancy from correlated measurements of temperature and humidity dynamics). On the actuation side, we are likewise interested in engineering non-invasive minimalistic solutions, which are robust to uncertainty and performance-certified. The plan for this project is to make the first steps in this direction, based on recent results in the literature. The project can benefit from a visit to Honeywell Labs (Prague). Courses: Computer-Aided Formal Verification. Prerequisites: Some familiarity with dynamical systems.
Model learning and verification Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc

This project will explore connections of techniques from machine learning with successful approaches from formal verification. The project has two sides: a theoretical one, and a more practical one: it will be up to the student to emphasise either of the two sides depending on his/her background and/of interests. The theoretical part will develop existing research, for instance in one of the following two inter-disciplinary domain pairs: learning & repair, or reachability analysis & Bayesian inference. On the other hand, a more practical project will apply the above theoretical connections on a simple models setup in the area of robotics and autonomy.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Precise simulations and analysis of aggregated probabilistic models Alessandro Abate Automated Verification C MSc

This project shall investigate a rich research line, recently pursued by a few within the Department of CS, looking at the development of quantitative abstractions of Markovian models. Abstractions come in the form of lumped, aggregated models, which are beneficial in being easier to simulate or to analyse. Key to the novelty of this work, the proposed abstractions are quantitative in that precise error bounds with the original model can be established. As such, whatever can be shown over the abstract model, can be as well formally discussed over the original one.

This project, grounded on existing literature, will pursue (depending on the student's interests) extensions of this recent work, or its implementation as a software tool.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Reinforcement Learning for Space Operations Alessandro Abate, Licio  Romao Artificial Intelligence and Machine Learning, Automated Verification C Oxcav has an ongoing collaboration with the European Space Agency (ESA) that involves applying Reinforcement Learning (RL) algorithms to a satellite called OPS-SAT, which has been launched in 2019 and is a flying laboratory that allows ESA’s partners to test and validate new techniques (more information can be found at https://www.esa.int/Enabling_Support/Operations/OPS-SAT). This project aims at designing controllers that will be used to perform nadir pointing and sun-tracking of OPS-SAT, while meeting some specifications (e.g., admissible nadir pointing errors). The focus will be on data-driven methods that leverage available sensors (gyroscopes, GPS, fine sun sensor, magnetometer) and actuators data using a RL architecture to come up with a safe policy that can yield an adequate performance. The main tasks of the project will consist in (1) exploring an ESA platform called MUST to collect all the necessary data and (2) implementing a RL scheme that will be later deployed in the satellite. Throughout the project you will have the opportunity to work with state-of-the-art data-driven techniques that have been developed at Oxcav, under the supervision of Prof. Alessandro Abate and Dr. Licio Romao.
Safe Reinforcement Learning Alessandro Abate Automated Verification C MSc

Reinforcement Learning (RL) is a known architecture for synthesising policies for Markov Decision Processes (MDP). We work on extending this paradigm to the synthesis of ‘safe policies’, or more general of policies such that a linear time property is satisfied. We convert the property into an automaton, then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the automaton. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is `constrained' by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP. We evaluate the performance of the algorithm on numerous numerical examples. This project will provide extensions of these novel and recent results.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Safety verification for space dynamics via neural-based control barrier functions Alessandro Abate Automated Verification C Barrier functions are Lyapunov-like functions that serve as certificates for the safety verification of dynamical and control models. The OXCAV group has recently worked on the automated and sound synthesis of barrier functions structured as neural nets, with an approach that uses SAT modulo theory. In this project, we shall pursue two objectives: 1. Apply recent results [1] on sound and automated synthesis of barrier certificates on models that are pertinent to the Space domain, such as models for attitude dynamics. Airbus will support this part. 2. Develop new results that extend theory and algorithms to models encompassing uncertainty, such as probabilistic models or models that are adaptive to sensed data. Airbus will support this effort providing environments and models for experiments and testing. [1] A. Abate, D. Ahmed and A. Peruffo, ``Automated Formal Synthesis of Neural Barrier Certificates for Dynamical Models,'' TACAS, To appear, 2021.
Software development for abstractions of stochastic hybrid systems Alessandro Abate Automated Verification C MSc

Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reach-avoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states.

The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.).

Courses: Probabilistic Model Checking, Probability and Computing, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with stochastic processes, working knowledge of MATLAB and C

Using transformers to recommend complementary products based on transactional data. Damien Arnol MSc "Recommender systems are a classic field for the application of Machine Learning techniques. Approaches developed in this area over the 10+ years elapsed since the Netflix Prize have relied almost exclusively on collaborative filtering models, which learn a latent representation of products and customers that best reconstruct observed customer-product interactions. We believe that recently developed Transformer architectures represent a unique opportunity to revisit the domain, as these models, applied directly on purchase data, may enable to learn a more powerful contextual representation of products. In a recent project, we have successfully demonstrated the capabilities of this type of approach in an application related to product substitution based on product embeddings generated by a BERT -like model. We propose to explore this approach in the context of a different application: autocompleting the content of a shopping cart in the manner of a text generation task. Objectives The student will extend our existing Transformer-based product embedding package with a new model architecture and / or training method enabling the generation of relevant and non-trivial complementary products given a customer basket. Depending on the student’s interest and progress, directions of research may include the implementation of transformer architectures better suited for generation (e.g., GPT-like), the implementation of fine-tuning methods better suited to generate basket complements and the implementation of loss functions encouraging the prediction of less frequent and more niche products complementing specific items within a basket. The student will then study the limitations of the proposed model to assess whether it could be used for real world applications and to identify possible areas of further improvements. Experiments will be run on publicly available data such as the Instacart dataset as well as a much larger dataset belonging to a major retail brand, client of J2-Reliance. Computational resources will be available for model training and tuning on Google Cloud Platform. Prerequisites The project requires a good mastery of object-oriented programming (preferably in Python), solid foundations in Deep Learning and a basic understanding of Transformer models. Learning objectives and skills involved The student will enhance their creativity in Machine Learning and their capacity to solve original and concrete problems. They will also acquire an advanced understanding of Transformer models and their application to representation learning outside of the NLP realm. Finally, they will strengthen their coding skills and be exposed to the challenges related to productionising Machine Learning models. Supervision Company Supervisor: Damien Arnol, PhD – LinkedIn profile – damien.arnol@j2reliance.co.uk. Damien has a PhD in Machine Learning and Bioinformatics from the University of Cambridge and an expertise in unsupervised representation learning in contexts such as dimensionality reduction and recommender systems. In the past three years, he has been leading multiple R&D projects for big groups in the retail industry and advised digital and tech start-ups developing AI and data-driven products. Weekly meetings will be organised between the industry supervisor and the student, either remotely or in-person depending on the student’s preference. The student will also be added to the company’s Slack group and encouraged to interact as much as needed with the company’s engineers to receive guidance both from a ML perspective and from a coding perspective. Company: J2-Reliance Ltd. Founded by two machine learning experts (Cambridge PhD and Mila PhD) and a former industry executive, J2-Reliance is an R&D consultancy with an obsession for transforming data and technologies into business value. We are experts in identifying what and how technology can solve a specific business problem and we develop the corresponding solutions end-to-end. Our philosophy is to leverage cutting-edge technologies in an ambitious and creative way and rely on technical insights to adapt innovations from various research fields to solve hard business problems. In the retail space, we have previously used Language Modelling techniques (Transformers) to analyse customer behaviours, learn meaningful product representations and as a result develop powerful recommender systems. Bidirectional Encoder Representations from Transformers (Devlin et al 2018) For example, avoiding the trap of recommending frequently purchased products irrespective of the content of the input basket "
Complex Object Querying and Data Science Michael Benedikt Data and Knowledge MSc "We will look at query languages for transforming nested collections (collections that might contain collections). Such languages can be useful for preparing large scale feature data for machine learning algorithms. We have a basic implementation of such a language that we implement on top of the big-data framework Spark. The goal of the project is to extend the language with iteration. One goal will be to look at how to adapt processing techniques for nested data to support iteration. Another, closer to application is to utilize iteration to support additional steps of a data science pipeline, such as sampling. "
Genomic analysis using machine learning and large scale data management techniques Michael Benedikt Data and Knowledge MSc

We will investigate novel risk analysis -- likelihood
of a patient having some medical condition -- using statistical analysis of
a variety of genomics data sources. This will make use of some new infrastructure for
data management -- a query language for nested data -- along with the use
of the SPARK framework, coupled with some basic statistics
and machine learning algorithms. No background in genomics or statistics
is necessary, but the project
does require knowledge of the basics of data management (e.g. undergrad database
course or some experience with SQL) and good programming skills.

Interpolation Michael Benedikt Data and Knowledge C MSc

Let F1 and F2 be sentences (in first-order logic, say) such that F1 entails F2: that is, any model of F1 is also a model of F2. An interpolant is a sentence G such that F1 entails G, and G entails F2, but G only uses relations and functions that appear in *both* F1 and F2.

The goal in this project is to explore and implement procedures for constructing interpolants, particularly for certain decidable fragments of first-order logic. It turns out that finding interpolants like this has applications in some database query rewriting problems.

Prerequisites: Logic and Proof (or equivalent)

Optimization of Web Query Plans Michael Benedikt Data and Knowledge MSc

This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions.

We will look at logic-based methods for analyzing query plans, taking into account integrity constraints that may exist on the data.

Optimized reasoning with guarded logics Michael Benedikt Data and Knowledge C MSc "Inference in first-order logic is undecidable, but a number of logics have appeared in the last decade that achieve decidability. Many of them are guarded logics, which include the guarded fragment of first-order logic. Although the decidability has been known for some decades, no serious implementation has emerged. Recently we have developed new algorithms for deciding some guarded logics, based on resolution, which are more promising from the perspective of implementation. The project will pursue this both in theory and experimentally." Prerequisites A knowledge of first-order logic, e.g. from the Foundations of CS or Knowledge Representation and Reasoning courses, would be important.
Frontiers in Graph Representation Learning Ismail Ilkan Ceylan Artificial Intelligence and Machine Learning C MSc

There are various projects available in the theme of ‘Graph Representation Learning’ (e.g., graph neural networks), please get in touch for the most up-to-date information. There are multiple projects aiming for (i) designing new graph neural network models to overcome some well-known existing limitations of graph neural networks (e.g., oversmoothing, oversquashing, inexpressiveness), (ii) improving the inductive bias of graph neural networks for machine learning over different types of graphs (e.g., directed, relational graphs), (iii) exploring tasks beyond widely studied ones, such as graph generation, and (iv) novel applications of graph represenation learning.

Improving the Relational Inductive Bias of Graph Neural Networks Ismail Ilkan Ceylan, Ralph Abboud Artificial Intelligence and Machine Learning C MSc

Graph neural networks (GNNs) have achieved state-of-the-art results on a wide range of tasks e.g., node and graph classification, on various types of graphs, owing to their natural encoding of graph structure and properties. However, their potential remains yet to be unlocked for multi-relational data (i.e., graphs where edges can have multiple labels). In particular, while these models account for relation types during message passing (see, e.g., rGCN [1]), they typically do not incorporate any explicit relation-specific inductive bias, e.g., inherent relational properties such as symmetries, relational hierarchies. The goal of this project is to design a novel GNN model with relation-specific inductive bias and study its relational properties, both theoretically and empirically.

Prerequisites: Good programming skills, experience with Pytorch/Tensorflow

References: A classical base GNN model for handling relational data is rGCN [1], an extension of GCN [2] with relation-specific message passing.

[1] Schlichtkrull et al. Modeling Relational Data with Graph Convolutional Networks. ESWC 2018.

[2] Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks. ICLR 2017.

Temporal Reasoning with Graph Neural Networks Ismail Ilkan Ceylan, Ralph Abboud Artificial Intelligence and Machine Learning C MSc

Graph neural networks (GNNs) are a popular class of models for learning over relational data. The goal of this project is to develop novel GNN models which can handle relational data evolving over time. While the applications are manifold, the incorporation of termporal information brings many additional challenges to GNNs. In this project, the student will explore the space of possible models (including existing ones) and develop a model which can learn adequate continuous time representations that can extrapolate from existing data. The student will drive a rigorous analysis of resulting model properties in comparison with existing literature, and conduct a detailed empirical evaluation.

Prerequisites: Good programming skills, experience with Pytorch/Tensorflow

References: A time-aware message-passing neural network [1], and an interesting way of incorporating time [2]

[1] Wu et al. TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion. EMNLP 2020

[2] Han et al. Temporal Knowledge Graph Forecasting with Neural ODE, arXiv 2101.05151

General game playing using inductive logic programming Andrew Cropper Artificial Intelligence and Machine Learning B C MSc In the general game playing competition, an agent is given the rules of a game (described as a logic program) and then starts playing the game (i.e. the agent generates traces of behaviour). This project will invert the task so that an agent is given traces of behaviour and then has to learn a set of rules that could produce the behaviour. The project will use techniques from inductive logic programming, a form of machine learning which learns computer programs from input/output examples. This work is mainly implementation and experimentation. Prerequisites: familiarity with logic programming (Prolog)
Learning to program by debugging Andrew Cropper Artificial Intelligence and Machine Learning MSc Abstract: The goal of program synthesis/induction is to develop machines that can write programs for us. For instance, given examples of unsorted/sorted lists, the goal is to induce a sorting algorithm. This project focuses on inductive logic programming (ILP) [1,2], a form of machine learning based on mathematical logic. Given examples and background knowledge (BK), the goal of ILP is to induce a logic program (a set of logical rules) that with the BK generalises the examples. The goal of this project is to develop techniques to learn programs by debugging faulty programs. The project is based on a generate/test/constrain loop approach to ILP [3], where the constrain stage derives constraints from faulty programs to guide future generate stages. This project will to modify recent work [4] to apply to a Datalog setting, where we can extract more information from faulty programs, such as identifying unsatisfiable pairs of literals. This project is a mix of theory, implementation, and experimentation. Prerequisites: First-order logic, and preferably logic programming concepts (Prolog and/or Answer Set Programming), or at least an interest to learn about them. [1] Andrew Cropper, Sebastijan Dumancic: Inductive logic programming at 30: a new introduction. CoRR abs/2008.07912 (2020) [2] Andrew Cropper, Sebastijan Dumancic, Stephen H. Muggleton: Turning 30: New Ideas in Inductive Logic Programming. IJCAI 2020: 4833-4839 [3] Andrew Cropper, Rolf Morel: Learning programs by learning from failures. Mach. Learn. 110(4): 801-856 (2021) [4] Rolf Morel, Andrew Cropper: Learning Logic Programs by Explaining Failures. CoRR abs/2102.12551 (2021)
Machine learning efficient time-complexity programs Andrew Cropper Artificial Intelligence and Machine Learning C MSc "Metaopt [1,2] is an inductive logic programming (ILP) system that learns efficient logic programs from examples. For instance, given input/output examples of unsorted/sorted lists Metaopt learns quicksort (rather than bubblesort). However, Metaopt does not identify the complexity class of learned programs. The goals of this project are to (1) develop methods to identify the complexity class of a program during the learning, and (2) see whether the complexity information can improve the proof search. This work is a mix of theory, implementation, and experimentation. [1] A. Cropper and S.H. Muggleton. Learning efficient logic programs. Machine learning (2018). https://doi.org/10.1007/s10994-018-5712-6 [2] A. Cropper and S.H. Muggleton. Learning efficient logical robot strategies involving composable objects. In Proceedings of the 24th International Joint Conference Artificial Intelligence (IJCAI 2015), pages 3423-3429. IJCAI, 2015." Prerequisites: familiarity with general machine learning (and preferably computational learning theory) and ideally logic programming
Projects on logic-based machine learning (inductive logic programming) Andrew Cropper Artificial Intelligence and Machine Learning B C MSc Description: Inductive logic programming (ILP) [1,2] is a form of machine learning based on mathematical logic. Given examples and background knowledge (BK), the goal of ILP is to induce a logic program (a set of logical rules) that with the BK generalises the examples. For instance, given examples of unsorted/sorted lists, the goal of ILP is to induce a sorting algorithm. I am happy to supervise projects on ILP. These projects will particularly suit students interested in constraint satisfaction, symbolic machine learning, and knowledge representation. Prerequisites: ideally you will have taken the courses Logic and Proof, Intelligent Systems, and Knowledge Representation & Reasoning
Relevancy of background knowledge in inductive logic programming Andrew Cropper Artificial Intelligence and Machine Learning B C MSc Inductive logic programming (ILP) is a form of machine learning which learns computer programs from input/output examples of a target program. To improve learning efficiency, ILP systems use background knowledge (i.e. auxiliary functions such as partition and append). However, most ILP systems cannot handle large amounts of background knowledge, and overcoming this limitation is a key challenge in ILP. The goal of this project is to explore techniques to identify relevant background knowledge. There is much freedom with this project, where one could focus on logical aspects, such as finding logically redundant background knowledge, or one could instead focus on statistical aspects, such as finding background knowledge most likely to be relevant for a given task. This work is a mix of theory, implementation, and experiments. Prerequisites: familiarity with statistics, statistical machine learning, and ideally logic programming
REASONING ABOUT TEMPORAL KNOWLEDGE Bernardo Cuenca Grau, Przemysław Wałęga Artificial Intelligence and Machine Learning, Data and Knowledge MSc

"Our world produces nowadays huge amounts of time stamped data, say measurements from meteorological stations, recording of online payments, GPS locations of your mobile phone, etc. To reason on top of such massive temporal datasets effectively we need to provide a well-structured formalisation of temporal knowledge and to devise algorithms with good computational properties. This, however, is highly non-trivial; in particular logical formalisms for temporal reasoning often have high computational complexity.

This project provides an opportunity to join the Knowledge Representation and Reasoning group and participate in exciting EPSRC-funded research on temporal reasoning, temporal knowledge graphs, and reasoning over streaming data. There are opportunities to engage both in theoretically-oriented and practically-oriented research in this area. For example, in recent months, we have been investigating the properties and applications of DatalogMTL---a temporal extension of the well-known Datalog rule language which is well-suited for reasoning over large and frequently-changing temporal datasets; the project could focus on analysing the theoretical properties of DatalogMTL and its fragments such as its complexity and expressiveness, or alternatively in more practical aspects such as optimisation techniques for existing reasoning algorithms. There are many avenues for research in this area and we would be more than happy to discuss possible concrete alternatives with the student(s)."

The theoretical part of the project requires good understanding of logics (completing Knowledge Representation and Reasoning course could be beneficial), whereas the practical part is suited for those who have programming skills.

Analysis of Schelling segregation models. Edith Elkind Artificial Intelligence and Machine Learning MSc In Schelling's model of strategic segergation, agents are placed on a highly regular graph (such as a line of a grid), and each agent belongs to one of k types. Agents have a preference towards being surrounded by agents who belong to their own type, and may change locations if they are not happy at their current location (by moving to an empty location or swapping with another discontent agent). Many variants of this basic model have been considered over the years. The goal of this project is to investigate, theoretically and empirically, the degree of diversity of stable outcomes in Schelling's model, as well as to explore novel variants of the model where agents's preferences may evolve over time.   
Stable roommates problem with structured preferences Edith Elkind Artificial Intelligence and Machine Learning MSc In the stable roommates problem, there are k rooms of varying sizes and n agents, who need to be allocated places in these rooms; it is sometimes assumed that the total number of places is exactly n. Agents may have preferences over rooms as well as potential rommates and may move between rooms so as to improve their assignment. The goal of the project is to understand the complexity of finding stable outcomes in such settings, assuming that agents' preferences over assignments have a simple structure (e.g., each agent wants to maximize the number of friends in their room), and to compare the best social welfare that can be achieved by a centralizes assignment and that achievable in a stable assignment.
Topics in Randomised Algorithms and Computational Complexity Andreas Galanis Algorithms and Complexity Theory C MSc Description: Andreas Galanis is willing to supervise projects in the areas of randomised algorithms and computational complexity. Problems of interest include (i) the analysis of average case instances of hard combinatorial problems (example: can we satisfy a random Boolean formula?), and (ii) the computational complexity of counting problems (example: can we count approximately the number of proper colourings of a graph G?). The projects would suit mathematically oriented students, especially those with an interest in applying probabilistic methods to computer science.
Data Science and Machine Learning Techniques for Model Parameterisation in Biomedical and Environmental Applications David Gavaghan, Martin Robinson, Michael Clerx Computational Biology and Health Informatics B C MSc "Time series data arise as the output of a wide range of scientific experiments and clinical monitoring techniques. Typically the system under study will either be undergoing time varying changes which can be recorded, or the system will have a time varying signal as input and the response signal will be recorded.  Familiar everyday examples of the former include ECG and EEG measurements (which record the electrical activity in the heart or brain as a function of time), whilst examples of the latter occur across scientific research from cardiac cell modelling to battery testing. Such recordings contain valuable information about the underlying system under study, and gaining insight into the behaviour of that system typically involves building a mathematical or computational model of that system which will have embedded within in key parameters governing system behaviour. The problem that we are interested in is inferring the values of these key parameter through applications of techniques from machine learning and data science. Currently used methods include Bayesian inference (Markov Chain Monte Carlo (MCMC), Approximate Bayesian Computation (ABC)), and non-linear optimisation techniques, although we are also exploring the use of other techniques such as probabilistic programming and Bayesian deep learning. We are also interested in developing techniques that will speed up these algorithms including parallelisation, and the use of Gaussian Process emulators of the underlying models Application domains of current interest  include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. Application domains of current interest include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. " Prerequisites: some knowledge of Python
Topics in Continuous Maths, Discrete Maths, Linear Algebra, Object-oriented Programming, Machine Learning and Probability and Computing David Gavaghan Computational Biology and Health Informatics B C Prof Dave Gavaghan is willing to supervise in the area of Topics in Continuous Maths, Discrete Maths, Linear Algebra, Object-oriented Programming, Machine Learning and Probability and Computing.
Cake-cutting with low envy Paul Goldberg Algorithms and Complexity Theory C Cake-cutting refers to the design of protocols to share a divisible good amongst a collection of agents. A standard starting-point for cake-cutting protocols is the classical "I cut, you choose" rule. This rule is said to be "envy-free" since each player can ensure that they value their own share at least as much as they value the other player's share. Well-known further work has extended this idea to more than 2 players. In the paper at the URL below, we identify various classes of protocols, and show that one can convert protocols from one kind to another so as to maintain the worst-case level of envy that results. https://arxiv.org/abs/2108.03641 The project is to look at the computational complexity of evaluating the worst-case level of envy that can result from using a given protocol, and related questions about protocols belonging to classes of interest. The project is mainly based on mathematical analysis as opposed to computational experiment. But there is scope for some computational experiment, for example in searching for value functions that result in a high envy.
Learning probabilistic automata Paul Goldberg Algorithms and Complexity Theory C Probabilistic automata are a means of producing randomly-generated strings that are accepted/recognised by a finite automaton. (They are conceptually similar to hidden Markov models.) The general topic of the project is to find algorithms to reconstruct an automaton based on observations of strings generated by that automaton. It's a topic that has led to a steady stream of papers in the research literature, and the project would focus on one of the many variants of the problem. For example, "labelled automata" in which partial information is provided about the current state of the automaton in the form of random labels associated with states. Another general issue is how to efficiently extract extra information present in long strings generated by the unknown automaton. Within this project topic, there is scope for focusing either on experiments, or on mathematical analysis of algorithms. In the latter case, it would be helpful to know the basic concepts of computational learning theory.
Medical information extraction with deep neural networks Paul Goldberg, Alejo Nevado-Holgado Algorithms and Complexity Theory MSc

Background. Electronic Health Records (EHRs) are the databases used by hospital and general practitioners to daily log all the information they record from patients (i.e. disorders, taken medications, symptoms, medical tests…). Most of the information held in EHRs is in the form of natural language text (written by the physician during each session with each patient), making it inaccessible for research. Unlocking all this information would bring a very significant advancement to biomedical research, multiplying the quantity and variety of scientifically usable data, which is the reason why major efforts have been relatively recently initiated towards this aim (e.g. I2B2 challenges https://www.i2b2.org/NLP/ or the UK-CRIS network of EHRs https://crisnetwork.co/uk-cris-programme)

Project. Recent Deep Neural Networks (DNN) architectures have shown remarkable results in traditionally unsolved NLP problems, including some Information Extraction tasks such as Slot Filling [Mesnil et al] and Relation Classification [dos Santos et al]. When transferring this success to EHRs, DNNs offer the advantage of not requiring well formatted text, while the problem remains of labelled data being scarce (ranging on the hundreds for EHRs, rather than the tens of thousands used in typical DNN studies). However, ongoing work in our lab has shown that certain extensions of recent NLP-DNN architectures can reproduce the typical remarkable success of DNNs in situations with limited labelled data (paper in preparation). Namely, incorporating interaction terms to feed forwards DNN architectures [Denil et al] can rise the performance of relation classification in I2B2 datasets from 0.65 F1 score to 0.90, while the highest performance previously reported with the same dataset was 0.74.

We therefore propose to apply DNNs to the problem of information extraction in EHRs, using I2B2 and UK-CRIS data as a testbed. More specifically, the DNNs designed and implemented by the student should be able to extract medically relevant information, such as prescribed drugs or diagnoses given to patients. This corresponds to some of the challenges proposed by I2B2 during recent years (https://www.i2b2.org/NLP/Medication/), and are objectives of high interest in UK-CRIS which have sometimes been addressed with older techniques such as rules [Iqbal et al, Jackson et al]. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Python’s Keras, Theano, Tensorflow, or PyTorch.

Bibliography

G. Mesnil et al., Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding, IEEEACM Trans. Audio Speech Lang. Process. 23 (2015) 530–539. doi:10.1109/TASLP.2014.2383614.

C.N. dos Santos, B. Xiang, B. Zhou, Classifying Relations by Ranking with Convolutional Neural Networks, CoRR. abs/1504.06580 (2015). http://arxiv.org/abs/1504.06580.

M. Denil, A. Demiraj, N. Kalchbrenner, P. Blunsom, N. de Freitas, Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network, CoRR. abs/1406.3830 (2014). http://arxiv.org/abs/1406.3830.

E. Iqbal, R. Mallah, R.G. Jackson, M. Ball, Z.M. Ibrahim, M. Broadbent, O. Dzahini, R. Stewart, C. Johnston, R.J.B. Dobson, Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register, PLOS ONE. 10 (2015) e0134208. doi:10.1371/journal.pone.0134208.

R.G. Jackson MSc, M. Ball, R. Patel, R.D. Hayes, R.J. Dobson, R. Stewart, TextHunter – A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research, AMIA. Ann. Symp. Proc. 2014 (2014) 729–738.

Neural network genomics Paul Goldberg, Alejo Nevado-Holgado Algorithms and Complexity Theory MSc Background. For more than 10 years, GWAS studies have represented a revolution in the study of human disorders and human phenotypes. By measuring how your risk of suffering any given disease changes according to SNP mutations, GWAS studies can measure how relevant each gene is to the disease under study. However, this SNP-disease association is calculated “one SNP at a time”, and ignores all gene-gene interactions that are often crucially important to the causation of the disorder. If the interactions between a number of genes (e.g. insulin and its receptor in type 2 diabetes; or APP and alpha-, beta- and gamma- secretase in Alzheimer…) is what produces a given disorder, this interaction will not be detected in a GWAS analysis. This shortcoming may not be a problem in monogenetic hereditable disorders, such as Huntington disease, where mutations in a single gene by itself are enough for causing the disease. However, GWAS studies will likely not uncover the mechanisms of complex disorders, where the disease emerges from the interaction of a number of genes. This is likely the source of the so called “missing hereditability” problem observed in most complex traits or diseases, where all the SNP variations taken together cannot account for most of the hereditability of a given trait or disease [1]. In addition, it has been demonstrated that complex traits such as height and BMI are clearly and strongly hereditable [2], but GWAS studies simply cannot detect most of this hereditability. In summary, GWAS analyses detect simple individual genetic factors, but not interactions between genetic factors. Project. In this project, we propose to identify the interacting genetic factors behind Alzheimer’s disease and other neurodegenerations with neural networks, which are known for exploiting the interactions present in the to-be analysed data. While the linear models used in GWAS studies are able to identify only linear and monovariated contributions of each gene to a disorder, neural networks can analyse how genes interact with each other to explain the studied disorder. This versatility is what has allowed neural networks to find such widespread use in industry in the last decade, where they are revolutionizing image, sound and language analysis [3–5]. In our laboratories we already have the hardware and the expertise required to build neural networks, and we have used them in other research areas relevant to Alzheimer’s disease. In addition, we have access and experience using UK Biobank, which is the ideal dataset to implement this project. In UK Biobank they have measured wide array SNPs, all disorders and demographics in ~500,000 participants between the ages of 37 and 73, and more than 5000 of them suffer from Alzheimer’s disease, Parkinson’s disease or other neurodegenerations. We propose the MSc student to build a neural network to predict either diagnoses or disease-related endophenotypes (i.e. brain volumes of affected areas, cognitive scores…) of each one of these participants, using only the information present in the wide array SNPs and relevant demographics. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Python’s Keras (https://keras.io/), Theano (http://deeplearning.net/software/theano/), Tensorflow (https://www.tensorflow.org/), or PyTorch (http://pytorch.org/). Bibliography [1] T.A. Manolio, F.S. Collins, N.J. Cox, D.B. Goldstein, L.A. Hindorff, D.J. Hunter, M.I. McCarthy, E.M. Ramos, L.R. Cardon, A. Chakravarti, J.H. Cho, A.E. Guttmacher, A. Kong, L. Kruglyak, E. Mardis, C.N. Rotimi, M. Slatkin, D. Valle, A.S. Whittemore, M. Boehnke, A.G. Clark, E.E. Eichler, G. Gibson, J.L. Haines, T.F.C. Mackay, S.A. McCarroll, P.M. Visscher, Finding the missing heritability of complex diseases, Nature. 461 (2009) 747–753. doi:10.1038/nature08494. [2] D. Cesarini, P.M. Visscher, Genetics and educational attainment, Npj Sci. Learn. 2 (2017). doi:10.1038/s41539-017-0005-6. [3] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature. 521 (2015) 436–444. doi:10.1038/nature14539. [4] A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature. 542 (2017) 115–118. doi:10.1038/nature21056. [5] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning, Nature. 518 (2015) 529–533. doi:10.1038/nature14236.
A Conceptual Model for Assessing Privacy Risk Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc Privacy is not a binary concept, the level of privacy enjoyed by an individual or organisation will depend upon the context within which it is being considered; the more data at attacker has access to the more potential there may be for privacy compromise. We lack a model which considers the different contexts that exist in current systems, which would underpin a measurement system for determining the level of privacy risk that might be faced. This project would seek to develop a prototype model – based on a survey of known privacy breaches and common practices in data sharing. The objective being to propose a method by which privacy risk might be considered taking into consideration the variety of (threat and data-sharing) contexts that any particular person or organisation might be subjected to. It is likely that a consideration of the differences and similarities of the individual or organisational points of view will need to be made, since the nature of contexts faced could be quite diverse.
Computer Vision for Physical Security Michael Goldsmith, Sadie Creese Security B C MSc

Computer Vision allows machines to recognise objects in real-world footage. In principle, this allows machines to flag potential threats in an automated fashion based on historical and present images in video footage of real-world environments. Automated threat detection mechanisms to help security guards identify threats would be of tremendous help to them, especially if they have to temporarily leave their post, or have to guard a significant number of areas. In this project, students are asked to implement a system that is able to observe a real environment over time and attempt to identify potential threats, independent of a number of factors, e.g. lighting conditions or wind conditions. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. One approach might be to implement a camera system using e.g. a web camera or a Microsoft Kinect to conduct anomaly detection on real-world environments, and flag any issues related to potential threats.

Requirements: Programming skills required

Considering the performance limiters for cybersecurity controls Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

The behaviour, and so effectiveness, of security controls is dependent upon how they are used. One obvious example being the protection afforded by a firewall is dependent upon the maintenance of the rules that determine what the firewall stops and what it does not. The benefit (to the organisation or user seeking to manage risk) of various technical controls in an operational context lacks good evidence and data, so there is scope to consider the performance of controls in a lab environment. This mini-project would select one or more controls from the CIS Top 20 Critical Security Controls (CSC) (version 6.1) and seek to develop laboratory experiments (and implement them) to gather data on how the effectiveness of the control is impacted by its deployment context (including, for example, configuration, dependence on other controls, nature of the threat faced).

Requirements: Students will need an ability to develop a test-suite and deploy the selected controls.

Cybersecurity visualization Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

Cybersecurity visualization helps analysts and risk owners alike to make better decisions about what to do when the network is attacked. In this project the student will develop novel cybersecurity visualizations. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the visualizations they develop. These projects tend to have a focus on network traffic visualization, but the student is encouraged to visualize datasets they would be most interested in. Other interesting topics might for instance include: host-based activities (e.g. CPU/RAM/network usage statistics), network scans (incl. vulnerabilities scanning). Past students have visualized network traffic patterns and android permission violation patterns. Other projects on visualizations are possible (not just cybersecurity), depending on interest and inspiration.

Requirements: Programming skills required.

Detecting disguised processes using Application-Behaviour Profiling Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security MSc

In order to avoid detection, malware can disguise itself as a legitimate program or hijack system processes to reach its goals. Commonly used signature-based Intrusion Detection Systems (IDS) struggle to distinguish between these processes and are thus only of limited use to detect these kind of attacks. They also have the shortcoming that they need to be updated frequently to possess the latest malware definitions. This makes them inherently prone to missing novel attacks. Misuse detection IDSs however overcome this problem by maintaining a ground truth of normal application behaviour and reporting deviations as anomalies. In this project, students will be tasked to investigate how a process’ behaviours can be profiled in the attempt to identify whether it is behaving anomalously and if it can be correctly identified as malware. This project will build on existing research in this area.

Requirements: Programming skills required.

Experimenting with anomaly detection features for detecting insider attacks Michael Goldsmith, Ioannis Agrafiotis, Sadie Creese, Arnau Erola Security B C MSc

This project will use an anomaly detection platform being developed by the Cyber Security Analytics Group to consider relative detection performance using different feature sets, and different anomalies of interest, in the face of varying attacks. This research would be experimental in nature and conducted with a view to exploring the minimal sets that would result in detection of a particular threat. Further reflection would then be given to how generalisable this result might be and what we might determine about the critical datasets required for this kind of control to be effective.

Eyetracker data analysis Michael Goldsmith, Ivan Martinovic Security MSc

Eyetracking allows researchers to identify where observers look on a monitor. A challenge in analysing eyetracker output however is identifying patterns of viewing behaviour from the raw eyetracker logs over time and what they mean in a research context, particularly as no semantic information about the pixel being viewed is considered. In practice, this means that we know where someone is looking, but nothing about what it means to look at that pixel or the order of pixels. From a research perspective, being able to tag areas of interest, and what those areas mean, or conduct other statistical analysis on viewing patterns would be of significant use to analysing results, particularly if these results could be automated. The purpose of this project is to conduct an eyetracker experiment, and design and implement useful methods to analyse eyetracker data. We will be able to provide training for the student, so they are able to use the eyetracking tools themselves. Other eyetracking projects also possible, depending on interest and inspiration.

Skills: Programming, Statistics

Formal threat and vulnerability analysis of a distributed ledger model Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

This project would utilise the process algebra CSP and associated model checker FDR to explore various types of threat and how they might successfully compromise a distributed ledger. This type of modelling would indicate possible attacks on a distributed ledger, and could guide subsequent review of actual designs and testing strategies for implementations. The modelling approach would be based on the crypto-protocol analysis techniques already developed for this modelling and analysis environment, and would seek to replicate the approach for a distributed ledger system. Novel work would include developing the model of the distributed ledger, considering which components are important, formulating various attacker models and also formulating the security requirements / properties to be assessed using this model-checking based approach.

Requirements: In order to succeed in this project students would need to have a working knowledge of the machine readable semantics of CSP and also the model checker FDR. An appreciation of threat models and capability will need to be developed.

High Dynamic Range Imaging for Documentation Applications Michael Goldsmith, Sadie Creese Security B C MSc High dynamic range imaging (HDRI) allows more accurate information about light to be captured, stored, processed and displayed to observers. In principle, this allows viewers to obtain more accurate representations of real-world environments and objects. Naturally, HDRI would be of interest to museum curators to document their objects, particularly, non-opaque objects or whose appearance significantly alter dependent on amount of lighting in the environment. Currently, few tools exist that aid curators, archaeologists and art historians to study objects under user-defined parameters to study those object surfaces in meaningful ways. In this project the student is free to approach the challenge as they see fit, but would be expected to design, implement and assess any tools and techniques they develop. The student will then develop techniques to study these objects under user-specified conditions to enable curators and researchers study the surfaces of these objects in novel ways. These methods may involve tone mapping or other modifications of light exponents to view objects under non-natural viewing conditions to have surface details stand out in ways that are meaningful to curators.
Intelligent user activity timelines Michael Goldsmith Security B C MSc "Operating system store temporal data in multiple locations. Digital investigators are often tasked with reconstructing timelines of user activities. Timeline generation tools such as log2timeline can aid in extracting temporal data, similarly, 'Professonal' tools such as Encase and Autopsy build and visualise low level timelines. Collectively, these tools: (1) provide (often high levels of) low level data, and (2) are not able to apply any form of reasoning. This project involves the extraction of temporal data and the application of reasoning algorithms to develop reliable event sequences of interest to an investigator. Useful references: Olsson, J. and Boldt, M., 2009. Computer forensic timeline visualization tool. digital investigation, 6, pp.S78-S87. Buchholz, F.P. and Falk, C., 2005, August. Design and Implementation of Zeitline: a Forensic Timeline Editor. In DFRWS."
International Cybersecurity Capacity Building Initiatives Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc There is a large investment being made by the international community aimed at helping nations and regions to develop their capacity in cybersecurity. The work of the Global Cyber Security Capacity Centre (based at the Oxford Martin School) studies and documents this: https://www.sbs.ox.ac.uk/cybersecurity-capacity/content/front. There is scope to study in more detail the global trends in capacity building in cybersecurity, the nature of the work and the partnerships that exist to support it. An interesting analysis might be to identify what is missing (through comparison with the Cybersecurity Capacity Maturity Model, a key output of the Centre), and also to consider how strategic, or not, such activities appear to be. An extension of this project, or indeed a second parallel project, might seek to perform a comparison of the existing efforts with the economic and technology metrics that exist for countries around the world, exploring if the data shows any relationships exist between those metrics and the capacity building activities underway. This analysis would involve regression techniques.
Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs Michael Goldsmith Security B C MSc (Joint with Sadie Creese) Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue.
Organisational Resilience and Self-* re-planning Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security MSc

Key Performance Indicator (KPI) is a type of performance measurement in organisations. They evaluate the success of activities in which an organisation or system engages. Often success is measured in terms of whether an activity has been able to complete and able to repeat (e.g. zero defects, customer satisfaction or similar), other times it is measured in terms of making progress towards a strategic goal. Well-chosen KPIs allows us to reflect upon performance of an organisation, and possibly identify potential future degradation issues in systems. In this project, students are tasked to investigate how KPIs can be used to measure and improve cyber-resilience in an organisation. Particularly, we are interested in investigating how the performance of an organisation, particularly with respects to security as well as mission performance. After identifying which aspects of an organisation or the mission are prone to error, it may be beneficial to propose solutions for how these issues can be addressed. With "self*-re-planning", we believe it would be possible for a system to suggest and automate certain aspects of how the mission, infrastructure or organisation can be repurposed to not fail. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. For instance, the project may be network-security focused, mission-processes focused or otherwise. It may also investigate more formal approaches to self-* re-planning methods.

Requirements: Programming and/or Social Science Research Methods.

Penetration testing for harm Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

Current penetration testing is typically utilised for discovering how organisations might be vulnerable to external hacks, and testing methods are driven by using techniques determined to be similar to approaches used by hackers. The result being a report highlighting various exploitable weak-points and how they might result in unauthorised access should a malign entity attempt to gain access to a system. Recent research within the cybersecurity analytics group has been studying the relationship between these kinds of attack surfaces and the kinds of harm that an organisation might be exposed to. An interesting question would be whether an orientation around intent, or harm, might result in a different test strategy? Would a different focus be given to the kinds of attack vectors explored in a test if a particular harm is aimed at? This mini-project would aim to explore these and other questions by designing penetration test strategies based on a set of particular harms, and then seek to consider potential differences with current penetration practices by consultation with the professional community. Requirements: Students will need to have a working understanding of penetration testing techniques.

Photogrammetry Michael Goldsmith Security B C MSc Photogrammetry is a set of techniques that allows for 3D measurements from 2D photographs, especially those measurements pertaining to geometry or surface colours. The purpose of this project is to implement one or more photogrammetry techniques from a series of 2D photographs. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the tool they develop.
Predicting exposure to risk for active tasks Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

Prior research has been considering how we might better understand and predict the consequences of cyber-attacks based on knowledge of the business processes, people and tasks and how they utilise the information infrastructure / digital assets that might be exposed to specific attack vectors. However, this can clearly be refined by moving to an understanding of those tasks live or active at the time of an attack propagating across a system. If this can be calculated, then an accurate model of where risk may manifest and the harm that may result can be constructed. This project would explore the potential for such a model through practical experimentation and development of software monitors to be placed on a network aimed at inferring the tasks and users that are active based from network traffic. If time allows then host-based sensors might also be explored (such as on an application server) to further refine the understanding of which users and live on which applications etc.

Requirements: Students must be able to construct software prototypes and have a working knowledge of network architectures and computer systems.

Procedural Methods in Computer Graphics Michael Goldsmith Security B C MSc

Procedural methods in computer graphics help us develop content for virtual environments (geometry and materials) using formal grammars. Common approaches include fractals and l-systems. Examples of content may include the creation of cities, planets or buildings. In this project the student will develop an application to use create content procedurally. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the methods they develop. These projects tend to have a strong focus designing and implementing existing procedural methods, but also includes a portion of creativity. The project can be based on reality - e.g. looking at developing content that has some kind of basis on how the real world equivalent objects were created (physically-based approaches), or the project may be entirely creative in how it creates content. Past students for instance have built tools to generate cities based on real world examples and non-existent city landscapes, another example include building of procedural planets, including asteroids and earth-like planetary bodies.

Resilience – metrics and tools for understanding organisational resilience Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security MSc

Resilience in the face of cyber-attack is considered a key requirement for organisations. Prior work within the cybersecurity analytics group has been developing a resilience model for organisations, but there remains a commonly agreed set of metrics that organisations can use to determine just how resilient they are (at a given point in time and continuously). This mini-project would seek to propose a set of metrics, and if time allows tools for applying them (although the latter may well better suit a following DPhil). The approach will be to consider the development of metrics to measure the various characteristics or organisational capabilities / behaviours that are determined to be necessary in order to be resilient. It will be necessary to consider how the metrics might vary according to, or take account of, different threat environments. It may also be interesting to consider if there are refinements of the resilience requirements that apply to different types of organisations. Data analytics approaches will need to be considered, in order to apply the metrics, and students might consider the use of visualisation to help with this. The resulting metrics will be validated in consultation with security professionals and organisations possessing resilience related experience. In the context of the mini-project this is most likely achievable via a small number of interviews, with a more detailed validation and iterative design approach being supported by a longitudinal study working with 2 or 3 organisations which might adopt and test the metrics

Smartphone security Michael Goldsmith Security B C MSc

(Joint with Sadie Creese) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.

Prerequisites:

Concurrency, Concurrent Programming, Computer Security all possibly an advantage.

Technology-layer social networks Michael Goldsmith Security C MSc (Joint with Sadie Creese) Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc.
Trip-wires Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc At Oxford we have developed a framework for understanding the components of an attack, and documenting known attack patterns can be instrumental in developed trip-wires aimed at detecting the presence of insiders in a system. This project will seek to develop a library of such trip-wires based on a survey of openly documented and commented upon attacks, using the Oxford framework. There will be an opportunity to deploy the library into a live-trial context which should afford an opportunity to study the relative utility of the trip-wires within large commercial enterprises. The mini-project would also need to include experimenting with the trip-wires in a laboratory environment, and this would involve the design of appropriate test methods
Understanding Enterprise Infrastructure Dependencies Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

There are many tools available for detecting and monitoring cyber-attacks based on network traffic and these are accompanied by a wide variety of tools designed to make alerts tangible to security analysts. By comparison, the impact of these attacks on an organisational level has received little attention. An aspect that could be enhanced further is the addition of a tool facilitating management and updating of our understanding of business processes, but also how those processes are dependent on a network infrastructure. This tool could facilitate the mapping between company strategies, activities needed to accomplish company goals and map these down to the network and people assets. At the top of the hierarchy lies the board, responsible for strategic decisions. These decision are interpreted in the managerial level and could be captured and analysed with business objective diagrams. These diagrams in return could be refined further to derive business processes and organisational charts, ensuring that decision made in the top level will be enforced in the lower levels. The combination of business processes and organisation charts could eventually provide the network infrastructure. For this project we suggest a student could develop novel algorithms for mapping of business processes to network infrastructures in an automated way (given the updated business process files). That said, the student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. Other projects on business process modelling also possible, depending on interest and inspiration.

Vulnerability analysis and threat models for Distributed Ledgers Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

This project would seek to study the general form of distributed ledgers to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. A number of different approaches might be considered, for example: formal modelling of protocols used to establish agreement or resolve conflicts; design of test suites using attack-graphs and then validation in a laboratory setting.

Bioinformatics Projects Jotun Hein Computational Biology and Health Informatics B C MSc

Jotun Hein has been in Oxford since 2001 and has supervised many students from Computer Science. His main interest is computational biology and combinatorial optimization problems in this area, especially from phylogenetics, biosequence analysis, population genetics, grammars and the origin of life. Some idea for projects can be obtained by browsing these pages:

https://heingroupoxford.com/learning-resources/topics-in-computational-biology/

https://heingroupoxford.com/previous-projects/

A student is also welcome to propose his or her own project.

Distributional prior in Inductive Logic Programming Celine  Hocquette MSc Inductive Logic Programming is a form (ILP) of Machine Learning based on Computational Logic. Given examples and some background knowledge, the goal of an ILP learner is to search for a hypothesis that generalises the examples. However, the search space in ILP is a function of the size of the background knowledge. All predicates are treated as equally likely, and current ILP systems cannot make use of distributional assumptions to improve the search. This project focuses on making use of an assumed given prior probability over the background predicates to learn more efficiently. The idea is to order subsets of the background predicates in increasing order of probability of appearance. This project is a mix of theory, implementation, and experimentation. Prerequisites: logic programming, probability theory, or interest to learn about it
Identifying relevant background knowledge in Inductive Logic Programming Celine  Hocquette MSc

Inductive Logic Programming (ILP) is a form of Machine Learning based on Computational Logic. Given examples and some background knowledge, the goal of an ILP learner is to search for a hypothesis that generalises the examples. The search space in ILP is a function of the size of the background knowledge. To improve search efficiency, we want to identify relevant areas of the search space as relevant background knowledge predicates. We propose to evaluate and compare several relevance identification methods such as compression of the examples or statistical approaches. This project is a mix of theory, implementation, and experimentation.

Prerequisites: logic programming, statistical learning, or interest to learn about it

Compilation of a CSP-like language Geraint Jones Programming Languages B C MSc

This is a compiler project, also requiring familiarity with concurrency.

The parallel programming language occam is essentially an implementable sublanguage of CSP. The aim of this project is to produce a small portable implementation of a subset of occam; the proposed technique is to implement a virtual machine based on the inmos transputer, and a compiler which targets that language.

One of the aims of the project is to implement an extension to occam which permits recursion; more ambitious projects might implement a distributed implementation with several communicating copies of the virtual machine. Other possibilities are to produce separate virtual machines, optimised for displaying a simulation, or for efficiency of implementation, or translating the virtual machine code into native code for a real machine.

Logic circuit workbench Geraint Jones Programming Languages B C MSc

This is an interactive programming project with some logic simulation behind it.

The idea is to produce a simulator for a traditional logic breadboard. The user should be able to construct a logic circuit by drawing components from a library of standard gates and latches and so on, and connecting them together in allowable ways. It should then be possible to simulate the behaviour of the circuit, to control its inputs in various ways, and to display its outputs and the values of internal signals in various ways.

The simulator should be able to enforce design rules (such as those about not connecting standard outputs together, or limiting fan-out) but should also cope with partially completed circuits; it might be able to implement circuits described in terms of replicated sub-circuits; it should also be able to some sort of standard netlist.

It might be that this would make a project for two undergraduates: one implementing the interface, the other implementing a simulator that runs the logic.

Modelling of arithmetic circuits Geraint Jones Programming Languages B C MSc

This is a project in the specification of hardware, which I expect to make use of functional programming.

There is a great deal of knowledge about ways (that are good by various measures) of implementing standard arithmetic operations in hardware. However, most presentations of these circuits are at a very low level, involving examples, diagrams, and many subscripts, for example these descriptions.

The aim of this project is to describe circuits like this in a higher-level way by using the higher order functions of functional programming to represent the structure of the circuit. It should certainly be possible to execute instances of the descriptions as simulations of circuits (by plugging in simulations of the component gates), but the same descriptions might well be used to generate circuit netlists for particular instances of the circuit, and even to produce the diagrams.

Toys for Animating Mathematics Geraint Jones Programming Languages B C

The aim is to take some mathematics that would be within the grasp of a mathematically-inclined sixth-former and turn it into some attention-grabbing web pages. Ideally this reveals a connection with computing science. I imagine that this project necessarily involves some sort of animation, and I have visions of Open University television maths lectures.

The programming need not be the most important part of this project, though, because some of the work is in choosing a topic and designing problems and puzzles and the like around it. There's a lot of this sort of thing about, though, so it would be necessary to be a bit original.

Think of GeomLab (and then perhaps think of something a little less ambitious). It might involve logic and proof, it might be about sequences and series, it might be about graphs, it might be about the mathematics of cryptography... you might have something else in mind.

Time-Bandits: Varun Kanade, Thomas Orton Artificial Intelligence and Machine Learning C Consider the following problem: at each time step, you receive a dataset D_t you would like to fit (using e.g. a neural network or statistical model). You need to choose a hyper-parameter p_t for the model, and then you can run an optimization procedure (e.g. stochastic gradient descent) for time T_{p_t} with this hyper-parameter. Afterwards, you can measure the validation loss l_{t} of the fitted model on the dataset. If you are happy with the validation loss, you can move to the next time step. Otherwise, you can try to choose a new hyper-parameter (spending more compute time) and re-fit the model until you have an acceptable validation loss. > > The field of Multi Armed Bandits ( MAB ) is a rich research area which broadly answers the question of how to make decisions while minimizing regret. This has many applications in Machine Learning, Reinforcement Learning, Portfolio Optimization and Economics. However, there is an obstacle in applying solutions to the MAB for the above problem. This is because MAB is typically formalized by considering a loss function over actions, and considering how to pick a single action at each time step to minimize regret. In contrast, in our problem you may pick as many actions as you like (provided you pay for them with time), and you receive the minimum loss over the actions which you took at each time step. The idea of modelling time trade-offs is also a fundamental issue in practical machine learning which has only more recently been receiving attention in a theoretical context. > > The following project is ideal for a mathematically inclined 4th year student with the following objectives: > > 1. Develop some practically motivated formalisms for modelling time resources in the MAB setting. > 2. Develop a good understanding of existing MAB literature and how it relates to the time-resource MAB variants considered. > 3. Provide algorithmic solutions and theoretical lower-bounds to the time-resource MAB variants considered.
A fast numerical solver for multiphase flow David Kay Computational Biology and Health Informatics B C MSc

The Allen-Cahn equation is a differential equation used to model the phase separation of two, or more, alloys. This model may also be used to model cell motility, including chemotaxis and cell division. The numerical approximation, via a finite difference scheme, ultimately leads to a large system of linear equation. In this project, using numerical linear algebra techniques, we will develop a computational solver for the linear systems. We will then investigate the robustness of the proposed solver.

Efficient solution of one dimensional airflow models David Kay Computational Biology and Health Informatics B C MSc In this project we will investigate the use of numerical and computational methods to efficiently solve the linear system of equations arising from a one dimensional airflow model within a network of tree like branches. This work will build upon the methods presented within the first year Linear Algebra and Continuous Maths courses. All software to be developed can be in the student’s preferred programming language.  General area: Design and computational implementation of fast and reliable numerical models arising from biological phenomena.
General topics in the area of Steganography and Cryptography Andrew Ker Security B C Supervision of general topics in the area of Steganography and Cryptography?
Bots for a Board Game Stefan Kiefer Automated Verification B C MSc The goal of this project is to develop bots for the board game Hex. In a previous project, an interface was created to allow future students to pit their game-playing engines against each other. In this project the goal is to program a strong Hex engine. The student may choose the algorithms that underly the engine, such as alpha-beta search, Monte-Carlo tree search, or neural networks. The available interface allows a comparison between different engines. It is hoped that these comparisons will show that the students' engines become stronger over the years. The project involves reading about game-playing algorithms, selecting promising algorithms and datastructures, and design and development of software (in Java or Scala).
Small matrices for irrational nonnegative matrix factorisation Stefan Kiefer Automated Verification B C MSc Recently it was shown (https://arxiv.org/abs/1605.06848) that there is a 6x11-matrix M with rational nonnegative entries so that for any factorisation M = W H (where W is a 6x5-matrix with nonnegative entires, and H is a 5x11-matrix with nonnegative entries) some entries of W and H need to be irrational. The goal of this project is to explore if the number of columns of M can be chosen below 11, perhaps by dropping some columns of M. This will require some geometric reasoning. The project will likely involve the use of tools (such as SMT solvers) that check systems of nonlinear inequalities for solvability. Prerequisites: The project is mathematical in nature, and a good recollection of linear algebra is needed. Openness towards programming and the use of tools is helpful.
Using Virtual Reality to predict how we use memory in natural behaviour: collaborative interdisciplinary projects. Stefan Kiefer Automated Verification B C MSc

Co-supervised with Dr Dejan Draschkow, dejan.draschkow@psy.ox.ac.uk;

Although human knowledge and memories represent the past, we form and use them to support future behavior (Nobre & Stokes, 2019). Understanding which factors contribute to learning about the world and successfully finding the learned information in mind is of critical importance for developing methods for supporting this behavior in healthy individuals, but also in individuals with a range of neurocognitive and psychiatric conditions, such as stroke, Alzheimer’s, and Parkinson’s disease. Our novel virtual reality task (Draschkow, Kallmayer, & Nobre, 2021) combines the ecological validity, experimental control, and sensitive measures required to investigate the naturalistic interplay between memory and perception and opens the doors to investigating and supporting complex cognitive functions (https://www.youtube.com/watch?v=GT2kLkCJQbY). In the proposed interdisciplinary projects, computer science and experimental psychology students will be able to develop and validate sophisticated virtual reality protocols for measuring and supporting complex cognitive mechanism. Specific projects will focus on selected sub-topics and vary in scope, depending on students' interests and what kind of project it is (3rd, 4th, or MSc). These include: • Programming and refining game-like cognitive VR tasks in C# (Unity) • Developing protocols for online-based assessments of cognitive functions in C#/JavaScript • Developing algorithms for detecting markers of neurocognitive symptoms (such as tremor for Parkinson’s disease) in multivariate VR data (R/Python) • Developing proof-of-concept multimodal (voice, visual, and touch) protocols for supporting learning and memory in VR (with implications for supporting dementia patients) (C#/JavaScript/R/Python) The projects are suitable for students who feel comfortable with highly interdisciplinary work/teams and have experience with (or be open to learn) scientific programming in C#/JavaScript/R/Python. Students will be fully integrated in a successful and collaborative research group and get hands-on experience with an interactive product-development cycle, including multiple stakeholders. Further related readings are: (Ballard et al., 1997; Hayhoe, 2017; Hayhoe & Ballard, 2014) Relevant readings from cognitive science: Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. In Behavioral and Brain Sciences (Vol. 20, Issue 4, pp. 723–767). https://doi.org/10.1017/S0140525X97001611 Draschkow, D., Kallmayer, M., & Nobre, A. C. (2021). When Natural Behavior Engages Working Memory. Current Biology, 31(4), 869-874.e5. https://doi.org/10.1016/j.cub.2020.11.013 Hayhoe, M. M. (2017). Vision and Action. Annual Review of Vision Science, 3(1), 389–413. https://doi.org/10.1146/annurev-vision-102016-061437 Hayhoe, M. M., & Ballard, D. (2014). Modeling task control of eye movements. Current Biology : CB, 24(13), R622-8. https://doi.org/10.1016/j.cub.2014.05.020 Nobre, A. C. (Kia), & Stokes, M. G. (2019). Premembering Experience: A Hierarchy of Time-Scales for Proactive Attention. Neuron, 104(1), 132–146. https://doi.org/10.1016/j.neuron.2019.08.030

Truthful scheduling for graphs Elias Koutsoupias Algorithms and Complexity Theory B C MSc The aim of the project is to advance our understanding of the limitations of mechanism design for the scheduling problem, the "holy grail" of algorithmic mechanism design. Specifically, we will consider the graphical scheduling problem, in which every task can be only allocated to two machines, and study the approximation ratio of mechanisms for this setting. The aim is to prove both lower and upper bounds. Both directions are hard problems, and we plan to try to gain some understanding by experimentally searching for lower bounds or trying to verify potentially useful statements about the upper bound. Of particular importance is the case of trees and their special case of stars, i.e., when every task can be given either to the root or to a particular leaf. We plan to study not only the standard objective of the makespan, but the more general class of objectives in which the mechanism minimizes the L^p norm of the times of the machines. The case L^infinity is to minimize the makespan, L^1 is to minimize the welfare, and the case L^0 corresponds to the Nash Social Welfare problem, all of which are interesting problems. Further possible directions include considering fractional scheduling and mechanisms without money. Bibliography: George Christodoulou, Elias Koutsoupias, Annamária Kovács: Truthful Allocation in Graphs and Hypergraphs. ICALP 2021: 56:1-56:20 (https://arxiv.org/abs/2106.03724)
Truthful scheduling for graphs Elias Koutsoupias Algorithms and Complexity Theory B C MSc The aim of the project is to advance our understanding of the limitations of mechanism design for the scheduling problem, the "holy grail" of algorithmic mechanism design. Specifically, we will consider the graphical scheduling problem, in which every task can be only allocated to two machines, and study the approximation ratio of mechanisms for this setting. The aim is to prove both lower and upper bounds. Both directions are hard problems, and we plan to try to gain some understanding by experimentally searching for lower bounds or trying to verify potentially useful statements about the upper bound. Of particular importance is the case of trees and their special case of stars, i.e., when every task can be given either to the root or to a particular leaf. We plan to study not only the standard objective of the makespan, but the more general class of objectives in which the mechanism minimizes the L^p norm of the times of the machines. The case L^infinity is to minimize the makespan, L^1 is to minimize the welfare, and the case L^0 corresponds to the Nash Social Welfare problem, all of which are interesting problems. Further possible directions include considering fractional scheduling and mechanisms without money. Bibliography: George Christodoulou, Elias Koutsoupias, Annamária Kovács: Truthful Allocation in Graphs and Hypergraphs. ICALP 2021: 56:1-56:20 (https://arxiv.org/abs/2106.03724)
Probabilistic Modelling Checking Marta Kwiatkowska Automated Verification B C MSc

Professor Marta Kwiatkowska is happy to supervise projects involving probabilistic modelling, verification and strategy synthesis. This is of interest to students taking the Probabilistic Model Checking course and/or those familiar with probabilistic programming.

Below are some concrete project proposals, but students’ own suggestions will also be considered:

  • Synthesis of driver assistance strategies in semi-autonomous driving. Safety of advanced driver assistance systems can be improved by utilising probabilistic model checking. Recently (http://qav.comlab.ox.ac.uk/bibitem.php?key=ELK+19) a method was proposed for correct-by-construction synthesis of driver assistance systems. The method involves cognitive modelling of driver behaviour in ACT-R and employs PRISM. This project builds on these techniques to analyse complex scenarios of semi-autonomous driving such as multi-vehicle interactions at road intersections.
  • Equilibria-based model checking for stochastic games. Probabilistic model checking for stochastic games enables formal verification of systems where competing or collaborating entities operate in a stochastic environment. Examples include robot coordination systems and the Aloha protocol. Recently (http://qav.comlab.ox.ac.uk/papers/knps19.pdf) probabilistic model checking for stochastic games was extended to enable synthesis of strategies that are subgame perfect social welfare optimal Nash equilibria, soon to be included in the next release of PRISM-games (www.prismmodelchecker.org). This project aims to model and analyse various coordination protocols using PRISM-games.

  • Probabilistic programming for affective computing. Probabilistic programming facilitates the modelling of cognitive processes (http://probmods.org/). In a recent paper (http://arxiv.org/abs/1903.06445), a probabilistic programming approach to affective computing was proposed, which enables cognitive modelling of emotions and executing the models as stochastic, executable computer programs. This project builds on these approaches to develop affective models based on, e.g., this paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=PK18).
Safety Assurance for Deep Neural Networks Marta Kwiatkowska Automated Verification B C MSc

Safety Assurance for Deep Neural Networks

Professor Marta Kwiatkowska is happy to supervise projects in the area of safety assurance and automated verification for deep learning, including Bayesian neural networks. For recent papers on this topic see  http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19, http://qav.comlab.ox.ac.uk/bibitem.php?key=RHK18 and http://qav.comlab.ox.ac.uk/bibitem.php?key=CKLPPW+19, and also https://www.youtube.com/watch?v=XHdVnGxQBfQ.

Below are some concrete project proposals, but students’ own suggestions will also be considered:

  • Robustness of attention-based sentiment analysis models to substitutions. Neural network models for NLP tasks such as sentiment analysis are susceptible to adversarial examples. In a recent paper (https://www.aclweb.org/anthology/D19-1419/) a method was proposed for verifying robustness of NLP tasks to symbol and word substitutions. The method was evaluated on CNN models. This project aims to develop similar techniques for attention-based NLP models (www-nlp.stanford.edu/pubs/emnlp15_attn.pdf).
  • Attribution-based safety testing of deep neural networks. Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In a recent paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19) a game-based method was proposed for robustness evaluation, which can be used to provide saliency analysis. This project aims to extend these techniques with the attribution method (http://arxiv.org/abs/1902.02302) to produce a methodology for computing the causal effect of each feature and evaluate it on image data.
  • Uncertainty quantification for end-to-end neural network controllers. NVIDIA has created a deep learning system for end-to-end driving called PilotNet (http://devblogs.nvidia.com/parallelforall/explaining-deep-learning-self-driving-car/). It inputs camera images and produces a steering angle. The network is trained on data from cars being driven by real drivers, but it is also possible to use the Carla simulator. In a recent paper (http://arxiv.org/abs/1909.09884) a robustness analysis with statistical guarantees for different driving conditions was carried out for a Bayesian variant of the network. This project aims to develop a methodology based on these techniques and semantic transformation of weather conditions (see http://proceedings.mlr.press/v87/wenzel18a/wenzel18a.pdf) to evaluate the robustness of PilotNet or similar end-to-end controllers in a variety of scenarios.

 

Dashcam analysis Harjinder Lallie B C

There exist no current guidelines and very few tools to aid the investigation of a dashcam device, particularly for the purpose of extracting and mapping geospatial data contained therein. This project aims to extract, map and chart geospatial data from a dashcam device and provide insights into the routes and speeds taken by a passenger.You will be provided with a number of dashcam forensic images (.E01 files which can easily be extracted back to MOV/MP4 files). You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. You might consider using an open-source tool such as Exiftool to process the EXIF data. You could choose to take this project to a somewhat different angle by choosing to extract watermark data from the dashcam footage and map that instead.

To aid in understanding the background to this project, please see the reference below and the video here: https://warwick.ac.uk/fac/sci/wmg/mediacentre/events/dfls/previousevents/dashcamforensics/

Lallie, H.S., 2020. Dashcam forensics: A preliminary analysis of 7 dashcam devices. Forensic Science International: Digital Investigation, 33, p.200910.

Prerequisite. You may want to use tools such as EXIFTOOL, an open-source EXIF data extraction tool. Although EXIFTOOL might serve your needs, you will most probably want to develop an excellent knowledge of the EXIF standard. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. I will also provide you with sufficient background in the topic prior to you beginning the study. You will need good programming skills, preferably in Python.

Keyword searching audio/video files Harjinder Lallie B C Digital forensic investigators often search for the existence of keywords on hard disks or other storage medium. Keywords are easily searchable in PDF/word/ text/other popular formats, however, current digital forensic tools do not allow for keyword searching through movies/audio. This is essential in cases which involve dashcam footage, recorded conversations etc. The aim of this project is to produce a tool which auto-transcribes audio data and then performs a keyword search on the transcribed file – pinpointing the point(s) in the file where the keyword(s) appear. You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. Prerequisite. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. You are likely to use the Python SpeechRecognition and possibly the PyAudio libraries. For convenience and demonstration of practicality, you may want to integrate the solution with the open-source forensics tool – Autopsy – and hence will need to develop a good understanding of this tool particularly the keyword search facility.
Reasoning with Causality in Digital Forensic Cases. Harjinder Lallie B C Unlike traditional 'wet' forensics, digital forensics is heavily reliant on circumstantial evidence. Quite often a digital investigator is tasked with outlining the sequence of events that took place on a computer system to establish causality and a chain of events. These event sequences are reasonably straightforward to map conceptually, however, there is no solution which 'auto-links' these events. This project applies probabilistic techniques using an attack tree (or alternative) methodology to determining probable event sequences from sequence fragments available on a computer system. You will be supplied with a digital forensic image of a used hard disk, or you may choose to develop your own. You will most likely organize event sequences into categories (e.g. software installation, execution, file download, browsing etc) and then test your proposals on further hard disk images and possibly supplement your findings through qualitative feedback from law enforcement experts (I can connect you). Prerequisite. Knowledge of Autopsy, an open-source forensic tool. Additional support will be given, if required, by providing access to my online wiki self-guided tutorials and by providing access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. I will also provide you with sufficient background in the topic prior to you beginning the study. You will need good programming skills, preferably in Python.
Analysing concurrent datatypes in CSP Gavin Lowe Automated Verification C The aim of this project would be to model some concurrent datatypes in CSP, and to analyse them using the model checker FDR. The concurrent datatypes could be some of those studied in the Concurrent Algorithms and Datatypes course. Typical properties to be proved would be linearizability (against a suitable sequential specification) and lock freedom. Prerequisites: Concurrency and Concurrent Algorithms and Datatypes. Reading: Analysing Lock-Free Linearizable Datatypes using CSP, Gavin Lowe.
Exploration of a Better Semantic Representation for Multimodal Information Thomas Lukasiewicz, Bowen Li Artificial Intelligence and Machine Learning MSc

1. Introduction
Interaction between different modality information means the representations interacting with each other come from dif-
ferent domain, such as text/voice and image/video), which is a very promising research area, as it can enable many poten-
tial real-world applications, for example, generating an image from a given text can help non-artists easily create visually
appealing images and enable many new visual effects not possible before by simply using natural language descriptions;
manipulating an original image using a given text can allow users to edit the image in order to satisfy their preference; and
object detection and image captioning techniques can help disabled people to better understand the surroundings.
There exist many different research directions involving the interaction between different-domain information, includ-
ing visual interpreting methods like (1) object / scene classification [26, 8, 18], (2) object detection [7, 10, 16], (3) image
captioning [5, 20], and (4) visual question answering [2, 3, 1, 22], which aim at transferring visual data, such as videos
or images, into abstract representations, such as texts, and also including visual synthesis like (1) text-to-image genera-
tion [17, 21, 24, 25, 15, 23, 19, 27], or with the help of scene graphs [4, 11] and semantic layouts (e.g., bounding boxes and
segmentation masks) [9, 12], where scene graphs and layouts contain semantic information of desired objects to ease the
whole generation process, and (2) image manipulation using natural language descriptions [6, 14, 13].
In this project proposal, we aim to explore a better semantic representation of multimodal information, such as knowledge
graphs, as current approaches ignore the internal semantic relations within each information, and simply feed the original
information into a network and hope that the network is able to capture these semantic relations. However, if we can first
convert the given source information from different domains into the same semantic representation, and then we are able to
easily interact (e.g., combine or filter) this information to achieve a better interaction between them.

2. Approach
This project mainly focuses on the exploration of a better common semantic representation for multimodal information,
which may involve the investigation of different semantic data structures, like knowledge and scene graphs. Then, we
apply the proposed new representation into different downstream tasks (such as text-to-image generation, visual question
answering, etc.) to verify the effectiveness of the proposed method.

References
[1] Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. Bottom-up and top-
down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 6077–6086, 2018.
[2] Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 39–48, 2016.
[3] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. Vqa: Visual
question answering. In Proceedings of the IEEE international conference on computer vision, pages 2425–2433, 2015.
[4] Oron Ashual and Lior Wolf. Specifying object attributes and relations in interactive scene generation. In Proceedings of the IEEE
International Conference on Computer Vision, pages 4561–4569, 2019.
[5] Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor
Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 2625–2634, 2015.
[6] Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. Semantic image synthesis via adversarial learning. In Proceedings of the IEEE
International Conference on Computer Vision, pages 5706–5714, 2017.
[7] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic
segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on
computer vision, pages 630–645. Springer, 2016.
[9] Tobias Hinz, Stefan Heinrich, and Stefan Wermter. Semantic object accuracy for generative text-to-image synthesis. arXiv preprint
arXiv:1910.13321, 2019.
[10] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig
Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[11] Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image generation from scene graphs. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 1219–1228, 2018.
[12] Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, and Jianfeng Gao. Object-driven text-to-image
synthesis via adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
12174–12182, 2019.
[13] Seonghyeon Nam, Yunji Kim, and Seon Joo Kim. Text-adaptive generative adversarial networks: Manipulating images with natural
language. In Advances in Neural Information Processing Systems, pages 42–51, 2018.
[14] Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. Styleclip: Text-driven manipulation of stylegan
imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
[15] Tingting Qiao, Jing Zhang, Duanqing Xu, and Dacheng Tao. Mirrorgan: Learning text-to-image generation by redescription. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1505–1514, 2019.
[16] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
[17] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to
image synthesis. arXiv preprint arXiv:1605.05396, 2016.
[18] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556, 2014.
[19] Ming Tao, Hao Tang, Songsong Wu, Nicu Sebe, Xiao-Yuan Jing, Fei Wu, and Bingkun Bao. DF-GAN: Deep fusion generative
adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865, 2020.
[20] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show,
attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages
2048–2057, 2015.
[21] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. AttnGAN: Fine-grained text to
image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 1316–1324, 2018.
[22] Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked attention networks for image question answering. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 21–29, 2016.
[23] Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, and Yinfei Yang. Cross-modal contrastive learning for text-to-image
generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 833–842, 2021.
[24] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N. Metaxas. StackGAN: Text to
photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference
on Computer Vision, pages 5907–5915, 2017.
[25] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N. Metaxas. StackGAN++: Real-
istic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence,
41(8):1947–1962, 2018.
[26] Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. Learning deep features for scene recognition using
places database. In Advances in Neural Information Processing Systems, pages 487–495, 2014.
[27] Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. DM-GAN: Dynamic memory generative adversarial networks for text-to-image
synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5802–5810, 2019.

Reinforcement Learning via Predictive Coding Thomas Lukasiewicz, Tommaso Salvatori, Yuhang Song Artificial Intelligence and Machine Learning MSc Learning an internal representation of an outside world is a challenging but key task for any agent that aims to interact with it. The subfield of machine learning that tackles this task, called reinforcement learning (RL), has achieved incredible performance on a large number of benchmarks by using an internal model of the world (or dataset) to determine a specific policy. However, agents trained with classic algorithms sometimes learn and are controlled in a non human-like way, and natural behaviors, vital to reach specific goals, are often hard-coded. In parallel, different computational models of action and planning in the brain have been developed by neuroscientists. Merging progress in AI and neuroscience has the potential to give new insight and more human-like behavior to RL algorithms. In a series of our recent publications, we have, for the first time, identified the initial evidence that a neuroscience-inspired learning method, called predictive coding, is potentially more powerful than the current foundation of AI, backpropagation. The goal of this project is to use predictive coding to perform RL experiments, in place of backpropagation. Particularly, the student is required to study and analyse the difference between this method and the ones already present in the literature in a variety of tasks, both theoretically and empirically. Prerequisites: Good programming skills, experience with reinforcement learning experiments. References: [1] David Ha, Jürgen Schmidhuber, World models, NeurIPS 2018 [2] Karl Friston et al. World model learning and inference, Neural Networks 2021 [3] Rajesh Rao, Dana Ballard Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects, Nature, 1999.
Safe Neural Networks for Autonomous Driving: Thomas Lukasiewicz, Eleonora Giunchiglia Artificial Intelligence and Machine Learning MSc Neural networks have proven to be very powerful at computer vision tasks. However, they often exhibit unexpected behaviours, violating known requirements expressing background knowledge. This can be particular dangerous, especially in safety critical scenarios such as autonomous driving. In this project, we will explore out to develop post-processing modules which can guarantee that the final outputs are always compliant with the requirements and can be built on top of multi-label classification neural networks . The post-processing modules will need to (i) be able to improve the performance of the model (in terms of the relevant metric), and (ii) be able to run in linear time with respect to the number of labels. The proposed modules will be tested on the ROad event Awareness Dataset with logical Requirements (ROAD-R), which is the first multi-label dataset for autonomous driving with arbitrary n-ary constraints on the labels.
Video Captioning Thomas Lukasiewicz, Louis Mahon Artificial Intelligence and Machine Learning MSc Video captioning means automatically generating a sentence describing what’s happening in a video. Deep learning methods have improved greatly at this task over the last 3-4 years, but the use of natural language to describe a video has several disadvantages. Some work in our lab has proposed using a formal language, that can describe videos in terms of objects and relations between them: for example, “throws(person,ball)” instead of “a person is throwing a ball” (see http://www.cs.ox.ac.uk/publications/publication14259-abstract.html). This project will extend the above paper, which could just mean scaling it to larger datasets with the latest deep learning techniques, or could involve extending the idea to make it more efficient or performant. Advanced Machine Learning is a prerequisite, and proficiency in logic and reasoning could also be useful, depending on the direction the project was taken in.
Effective Categorical Reasoning Dan Marsden Foundations, Structures, and Quantum MSc

Category theory is an important tool in theoretical computer science. In introductory courses proofs are generally conducted by pasting commuting diagrams together, or simple equational reasoning. There are further proof styles in category theory, for example using string diagrams, “Yoneda style” proofs and the use of internal logics. The aim of this project would be to investigate and contrast these different approaches, showing how, and when, they can be used effectively, on realistic problems. A concrete starting point would be to understanding the techniques involved, and then apply them to non trivial proofs from the literature in order to demonstrate their relative benefits. An ideal outcome would be an example based account for computer scientists of how to reason efficiently in category theory.

Prerequisites: A good understanding of elementary category theory and comfort with mathematical proofs is essential. Some experience with string diagrams, as for example in the quantum computer science courses in the department would also be helpful.

EV Charging Testbed Ivan Martinovic Security B C MSc Co-supervised by Systems Security Lab Right now, there is a format war for methods of charging electric vehicles (EVs). One of the most developed systems is the Combined Charging System (CCS). It is underpinned by an international standard called ISO 15118, which describes a complex web of communication that allows vehicle charging to take place.  The car talks to the charger it is connected to, in order to report status and place requests for current. The charger, in turn, talks to the backend of the company that runs the charging site (the so-called charging provider), in order to authorise charging sessions and report any errors. It may also talk to a site controller at the same physical location, which is responsible for coordinating multiple chargers and ensuring capacity is shared appropriately between them. The charging provider's backend may federate to other charging providers to permit interoperability between networks.  In the near future, it is also expected that the charging provider will be able to talk to energy companies for automated billing, or even to allow electricity to flow the other way -- from vehicle-to-grid (V2G).  Exploring the operations and security of this interconnected system is crucially important, while it is still being developed, to avoid building insecure infrastructure that we have to live with for many years to come. A number of problems have already been discovered (see links below) and many more are still to come. Having access to realistic testbeds will accelerate this work substantially, by allowing researchers worldwide to evaluate ideas on a small scale, without having to gain access to costly real-world infrastructure.  Our lab has already built a physical testbed for the car-to-charger communication, and successfully attacked it. Other researchers have built a system that can simulate multiple such links, but only virtually. Certainly, neither of these existing testbeds implement the full system end-to-end, particularly the backend or federation aspects, so it is hard to understand what all the effects of an attack would be.  In this project, a student would work towards extending the existing testbed implementations to integrate backend functionality and therefore simulate the whole charging ecosystem. When complete, a user could create a virtual charging provider and register some customers, spawn charging sites with chargers, spawn vehicles -- and then simulate full charging sessions. In the ideal case, the implementation would be containerised so that it can be deployed on real hardware, in virtual machines, or a combination of both. It is important to say that there is a lot of scope in this project, and it is not expected that all of it would be implemented in any one year. In fact, there are many parts to suit different interests, across system development and scientific investigation. Meaningful contributions might include: * Build a working, low-power CCS charger for a toy car  * Containerise and integrate several open projects into a connected system  * Implement functionality for a charging provider as an open-source reference  * Model the PKI operations described in ISO 15118  * Investigate the capability of attackers at various stages of the charging process  * Make something crazy, like a car-to-car charger with offline crypto payments  Some research links:  * Our first paper on CCS (testbed in appendix) : https://www.usenix.org/conference/usenixsecurity19/presentation/baker  * The virtual testbed from Padua : https://spritz.math.unipd.it/projects/MiniV2G/  * An overview of CCS : https://en.wikipedia.org/wiki/Combined_Charging_System  Some open implementations of parts of the ecosystem: * OpenV2G : http://openv2g.org/  * RISE-V2G : https://github.com/SwitchEV/RISE-V2G  * OCPP : https://www.openchargealliance.org/
Hacking satellite image processing systems Ivan Martinovic Security B C MSc Co-supervised by Systems Security lab Recent work by the Systems Security Lab has been investigating image injection attacks on satellite downlink protocols, measuring the extent to which stealthy modifications can be made to the image whilst it's being transmitted in flight. In this project, the student will measure the impact of such an attack on real-world systems through the analysis of the FIRMS forest fire detection algorithm.By considering its implementation, the student will measure the most effective mechanisms for manipulating the image to fool the result of the algorithm, starting ficticious fires and masking existing ones. The student will require knowledge of the C programming language, and ideally have an existing interest in either space systems security or image processing. Intended for either third or fourth year projects.
HomePlugSDR Implementation Ivan Martinovic Security B C MSc Co-supervised by Systems Security Lab Our lab built a software-defined radio (SDR) receiver for the HomePlug family of powerline communication (PLC). It can receive messages over a wire -- or even wirelessly -- and has been used to eavesdrop on communications between an electric vehicle and a charger. In its current form, the SDR receiver has been used for security testing of electric vehicles, in academia and by national laboratories. However, the SDR implementation is quite limited at present. There are several clear ways to improve it: * Integrate the reception layer with standard tools like Wireshark, GNURadio and scapy. * Implement a transmission pathway, to send messages back to the network. This could allow a student to fuzz real implementations and find undisclosed hardware/firmware bugs.  * Performance optimisations to make the implementation real-time and usable on lower-power hardware. Some research links:  * HomePlug overview : https://en.wikipedia.org/wiki/HomePlug  * HomePlug receiver : https://gitlab.com/rbaker/hpgp-emis-rx  * Eavesdropping on vehicle charging : https://www.usenix.org/conference/usenixsecurity19/presentation/baker
Improved Intent Recognition Ivan Martinovic Security B C MSc Co-supervised by Systems Security Lab Using Markov chains to improve gesture recognition on current dataset. In recent work, we conducted a user study in which users made mobile payments using a smartwatch and we collected the inertial sensor (accelerometer, gyroscope, etc.) data generated on the smartwatch. Using this data, we showed that the tap gesture performed by the user while making such payments can be used to authenticate the user. We also found that the tap gesture is sufficiently distinct from other hand movements that, by recognising it, we can infer the user’s intention to pay (as opposed to it being an accidental payment or skimming attack). We achieved our results by using random forest classifiers on tap gestures represented by sliding windows consisting of between 0.5 and 4 seconds of inertial sensor data.It may be possible to strengthen our intent recognition model by treating the tap gesture as a 3-tuple of three separate parts, namely: (i) reaching towards the payment terminal, (ii) aligning the watch face with the terminal to establish an NFC connection, and (iii) withdrawing afterwards. The plan would be to attempt to recognise each constituent part separately and use Markov chains to stitch together 3-tuples that form a complete gesture. Our dataset will be provided. The research question is whether this approach is more or less effective at recognising tap gestures than our unseparated approach.
Key Management for Multi-Tenant Spacecraft Ivan Martinovic Security B C MSc Co-supervised by Systems Security Lab CCSDS standards describe the Space Data Link Security (SDLS) protocol for securing communications between spacecraft and ground stations for Telemetry, Tracking, and Command (TT&C). The extended procedures for this protocol detail systems for key management and security association management. This protocol has been implemented for use in NASA's Core Flight System (cFS), an open-source satellite operating system designed to run on a variety of hardware platforms. However, no consideration has been given to spacecraft shared between multiple operators, which is an emerging new model. In this project a student would design and implement extensions to SDLS (either platform-agnostic or specific to cFS) to facilitate key management in the context of a multi-tenant spacecraft, in which the holder of a key should only be permitted access to certain components of the system. This extension could also enable support for allowing holders of different keys to access the same telemetry packets. Students undertaking this project should be familiar with network security concepts and be comfortable writing in C. Useful links: - NASA Core Flight System: https://github.com/nasa/cFS  - NASA CryptoLib: https://github.com/nasa/CryptoLib  - CCSDS standard for SDLS: https://public.ccsds.org/Pubs/355x0b1.pdf  - CCSDS standard for SDLS-EP: https://public.ccsds.org/Lists/CCSDS%203551R1/355x1r1.pdf
Physical-layer Fingerprinting of Satellite Signals Ivan Martinovic Security B C MSc Co-Supervised by Systems Security Lab Physical-layer fingerprinting of radio signals has historically been used to identify radio devices in environments where the addition of cryptographic signatures may be infeasible, such as in low-power IoT devices or avionic systems with long software life cycles. Satellite communications are a good candidate for this type of identification due to their often limited onboard processing power. Furthermore, certain forms of satellite communications such as weather imagery are designed to be able to be received by anyone, making encryption redundant. Recent work by the Systems Security Lab has developed an end-to-end pipeline for fingerprinting aircraft using physical-layer characteristics of signals sent using the ADS-B protocol. This system automates the detection, capture, and decoding of messages, then compares against existing samples to determine whether the source of the message is legitimate. Applying fingerprinting to existing larger-scale systems is interesting as it can provide protection against signal forgery and replay at the receiver end without redesigning the entire system. In this project a student would adapt the existing fingerprinting pipeline to work with satellite communications. This will require changes to the message capture components of the pipeline in addition to the fingerprinting model itself. This pipeline can then be assessed for its effectiveness at correctly identifying devices, and identifying signals injected by an attacker. Useful links: - Existing fingerprinting pipeline: https://github.com/ssloxford/auto-phy-fingerprint - Paper introducing physical-layer radio fingerprinting: https://ethz.ch/content/dam/ethz/special-interest/infk/inst-infsec/system-security-group-dam/research/identification/CSUR-danev.pdf - TDOA satellite fingerprinting: https://dl.acm.org/doi/pdf/10.1145/3448300.3469132
Physically realizable adversarial machine learning models against on-screen classifiers Ivan Martinovic Security B C MSc Co-supervised by Systems Security Lab Taking into account the security-critical contexts in which computer vision models should be used in mixed reality / metaverse, it is of a great interest for us to explore ways of crafting defences against malicious actors using adversarial attacks. Your objective will be to test and improve upon the robustness of existing computer vision models by crafting state of the art adversarial examples in the form of phishing websites that are able to perform on photographs of computer screens captured by a smartphone or XR headset camera.
Security Monitoring and Control for the cFS Satellite Operating System Ivan Martinovic Security B C MSc Co-supervised by Systems Security lab Core Flight System (cFS) is an open-source satellite operating system provided by NASA, providing a standard interface and set of applications which can run on a variety of hardware platforms thanks to its abstraction of the underlying operating systems. Applications and devices within cFS communicate using a publish-subscribe model on a shared bus interface. These messages are not logged or authenticated so a malicious component could affect parts of the satellite it should not have access to. In this project a student would design and implement a security layer over the cFS software bus to provide security monitoring and per-application access control. Such a system would ideally be backwards-compatible with existing cFS applications. Students undertaking this project should be comfortable writing in C. Useful links: - NASA Core Flight System: https://github.com/nasa/cFS
Using state-of-the-art computer vision models to build novel applications on recent XR platforms Ivan Martinovic Security B C MSc Co-supervised by Systems Security Lab In this project, you will be using state-of-the-art AI computer vision models that understand their environments (e.g. detect computer screens) and deploying them in novel scenarios on the latest XR (AR/VR) technologies: from NReal glasses, to Microsoft HoloLens, Oculus Quest 2. Your task will be to test and improve upon the robustness of computer vision models by deploying them in novel environments, evaluating their robustness, and building interesting applications around them. For example, this would allow you to build a “physical ad blocker” that would block out any digital advertisement, or privacy-preserving remote support application that allows sharing.
Applied Formal Verification Tom Melham Automated Verification B C MSc

I am happy to supervise projects at all levels in applied formal verification. These typically involve hands-on investigation and innovation to develop new, practical methods of correctness analysis and verification for various kinds of computer systems – either hardware, software, or a combination. They are ideally suited to students who have an interest in computer systems, logic, and hands-on practical work that has a solid theoretical basis. You don’t have to have taken the course in Computer-Aided Formal Verification, if you are willing to learn the theories and technologies required.

All my projects are designed to have a strong element of research and sufficient challenge to allow a motivated student to make an excellent contribution. Projects are usually therefore quite ambitious, but they are also designed realistically to fit into the time available. And they always have some fall-back options that are less challenging but can still result in an excellent achievement.

Rather than offer readymade project ideas, I encourage students with an interest in this area to meet me and together discuss what might align best with their background and interests. I always have several project ideas that link to my current research or research being done by my group. Often projects will have a connection to real-world verification problems in industry, and many of my students will have contact through their projects with leading verification researchers and engineers in industry.

If interested to discuss, please contact me.

Developing computational tools to aid the design of CRISPR/Cas9 gene editing experiments Peter Minary Computational Biology and Health Informatics B C MSc

At present, the most versatile and widely used gene-editing tool is the CRISPR/Cas9 system, which is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding.  There are a large number of computational tools to design highly specific and efficient CRISPR-Cas9 guides but there is a great variation in performance and lack of consensus among the tools.  We aim to use ensemble learning to combine the benefits of a selected set of guide design tools to reach superior performance compared to any single method in predicting the efficiency of guides (for which experimental data on their efficiency is available) correctly.

Recommended for students who has done the Machine Learning and the Probability and Computing courses.

 

Developing machine learning models for off-target prediction in CRISPR/Cas9 gene editing Peter Minary Computational Biology and Health Informatics B C MSc The CRISPR/Cas9 gene editing system is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding but the Cas9 nuclease may also cleave off-target genomics DNA sequences, which contain mismatches compared to the gRNA, therefore, undesired cleavage could occur. The obvious factors influencing off-target cleavage activity of the CRISPR/Cas9 gene editing system are the sequence identities of the guide RNA and the off-target DNA . Various 'basic features' derived from said sequences have been fueling the development of procedural and machine learning models for off-target cleavage activity prediction but there are numerous 'non-basic fetures' (such as the sequence context around the off-target DNA) that may also influence off-target cleavage activity. The project will aim for the development of novel off-target clavage activity predictions models using approaches that include but not limited to combining 'basic features' and 'non-basic features' to increase the accuracy of model predictions of experimental off-target cleavage activities. Prerequisites: Recommended for students who has done a Machine Learning course and has interest in molecular biology.
Implementing a Datalog Reasoner Boris Motik Data and Knowledge B The objective of this project would be for a student to implement a simple Datalog reasoner. Depending on the student’s ambitions, the reasoner could be running in main memory (easier version) or on disk (more challenging). The student would be expected to design the data structures needed to store the data (in RAM or on disk, as agreed with me). On top of this, the student would implement the seminaive algorithm and evaluate the system on a couple of medium-sized datasets. We have in our group ready-made datasets that could be used for evaluation. A variation of this project where the student would reuse an existing relational database to store the data. Then, the seminaive algorithm would be implemented either as an extension of the database (assuming the database is open source), or on top of the database (by running SQL queries). This last variant would arguably be much easier as it would involve less design, and more just reusing existing technologies. More advanced students could extend their system to implement various incremental reasoning algorithms. For example, I would give them one of the many papers I’ve written on this topic, and they would have to (a) understand the formal material and (b) implement and evaluate the algorithms. Hence, this project would also give students the opportunity to go as far as they can. Having attended the Databases or Database System Implementation courses, and perhaps to a lesser extent the KRR course, would be a prerequisite for doing this project.
Implementing a Tableaux Reasoner for Description Logics Boris Motik Data and Knowledge B The objective of this project would be for a student to implement a simple tableau-based reasoner for description logics. The project would involve the student designing the core data structures, implementing the basic algorithm, realising backtracking, and implementing various optimisations typically used in practical reasoners. The student could use a preexisting API (e.g., the OWL API) to load a description logic ontology, so they could just focus on solving the core problem. The student would be expected to evaluate their implementation on the ontologies from the repository that we’re maintaining in the KRR group (http://www.cs.ox.ac.uk/isg/ontologies/). The project seems to me to provide good opportunity for the student to demonstrate how well they absorbed the CS material (e.g., algorithms, data structures, computational complexity analysis, etc.) taught in our degrees. Also, the project is sufficiently open-ended so that the student can go quite a long way before running out of options of things to do. Having attended the KRR course would be a prerequisite for doing this project.
Topics in Automata Theory, Program Verification and Programming Languages (including lambda calculus and categorical semantics) Andrzej Murawski Programming Languages C MSc

Prof Murawski is willing to supervise in the area of automata theory, program verification and programming languages (broadly construed, including lambda calculus and categorical semantics).

For a taste of potential projects, follow this link.

Concurrent Algorithms and Data Structures Hanno Nickau Foundations, Structures, and Quantum C MSc Projects in the area of Concurrent Algorithms and Data Structures
Concurrent Programming Hanno Nickau Foundations, Structures, and Quantum B C MSc Projects in the area of Concurrent Programming
Branching Temporal Logics, Automata and Games Luke Ong Programming Languages B C MSc

Model checking has emerged as a powerful method for the formal verification of programs. Temporal logics such as CTL (computational tree logic) and CTL* are widely used to specify programs because they are expressive and easy to understand. Given an abstract model of a program, a model checker (which typically implements the acceptance problem for a class of automata) verifies whether the model meets a given specification. A conceptually attractive method for solving the model checking problem is by reducing it to the solution of (a suitable subclass of) parity games. These are a type of two player infinite game played on a finite graph. The project concerns the connexions between the temporal logics CTL and / or CTL*, automata, and games. Some of the following directions may be explored. 1. Representing CTL / CTL* as classes of alternating tree automata. 2. Inter-translation between CTL / CTL* and classes of alternating tree automata 3. Using B¨uchi games and other subclasses of parity games to analyse the CTL / CTL* model checking problem 4. Efficient implementation of model checking algorithms 5. Application of the model checker to higher-order model checking.

References:

Orna Kupferman, Moshe Y. Vardi, Pierre Wolper: An automata-theoretic approach to branchingtime model checking. J. ACM 47(2): 312-360 (2000).
http://dx.doi.org/10.1145/333979.333987

Rachel Bailey: A Comparative Study of Alorithmics for Solving B¨uchi Games. University of Oxford MSc Dissertation, 2010.
http://www.cs.ox.ac.uk/people/luke.ong/personal/publications/RachelBailey_MScdissertation.pdf

Foundational Structures for Concurrency Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Game Semantics and Linear Logic Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Lambda Calculus Luke Ong Programming Languages MSc

See http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Luke Ong's Projects Luke Ong Programming Languages B C MSc

See http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html for more information

Semantics of Programming Languages Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Types, Proofs and Categorical Logic Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Building databases of mathematical objects in Sagemath (Python) Dmitrii Pasechnik Algorithms and Complexity Theory B C MSc

There is an enormous amount of information on constructing various sorts of ``interesting'', in one or another way, mathematical objects, e.g.

block designs, linear and non-linear codes, Hadamard matrices, elliptic curves, etc.  There is considerable interest in having this information available in computer-ready form.  However, usually the only  available form is a paper describing the construction, while no computer code (and often no detailed description of a possible implementation) is provided. This provides interesting algorithmic and software engineering challenges in creating verifiable implementations; properly structured and documented code, supplemented by unit tests, has to be provided, preferably in functional programming style (although  performance is important too).

 

Sagemath project aims in part to remedy this, by implementing such constructions, see e.g. Hadamard matrices in Sagemath: http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/matrices/hadamard_matrix.html and http://arxiv.org/abs/1601.00181.

 

The project will contribute to such implementations.

There might be a possibility for participation in Google Summer of Code (GSoC) with Sagemath as a GSoC organisation, and being partially funded by the EU project ``Open Digital Research Environment Toolkit for the Advancement of Mathematics''

http://opendreamkit.org/.

Prerequisites: Interest in open source software, some knowledge of Python, some maths background.

Computing with semi-algebraic sets in Sage (math) Dmitrii Pasechnik Algorithms and Complexity Theory B C MSc

Semi-algebraic sets are subsets of R^n specified by polynomial inequalities. The project will extend capabilities of Sage (http://www.sagemath.org) to deal with them, such as CADs computations or sums of squares based (i.e. semi-definite programming based) methods. There might be a possibility for participation in Google Summer of Code (GSoC) or Google Semester of Code with Sage as a GSoC organisation.

Prerequisites

Interest in open source software, some knowledge of Python, appropriate maths background.

Implementing and experimenting with variants of Weisfeiler-Leman graph stabilisation Dmitrii Pasechnik Algorithms and Complexity Theory B C MSc Weisfeiler-Leman graph stabilisation is a procedure that plays an important role in modern graph isomorphism algorithms (L.Babai 2015) and enjoyed attention in machine learning community recently (cf. "Weisfeiler-Lehman graph kernels"). While it is relatively easy to implement, efficient portable implementations seem to be hard to find. In this project we will work on producing such an open-source implementation, either by reworking our old code from http://arxiv.org/abs/1002.1921 or producing new one; we will also work on providing Python/Cython interfaces for a possible inclusion in SageMath (http://sagemath.org) library, and/or elsewhere.
Automatic translation to GPGPU Joe Pitt-Francis Computational Biology and Health Informatics B C

This project involves running cardiac cell models on a high-end GPU card. Each model simulates the electrophysiology of a single heart cell and can be subjected to a series of computational experiments (such as being paced at particular heart rates). For more information about the science and to see it in action on CPU see "Cardiac Electrophysiology Web Lab" at https://travis.cs.ox.ac.uk/FunctionalCuration/ An existing compiler (implemented in Python) is able to translate from a domain specific XML language (http://models.cellml.org) into a C++ implementation. The goal of the project is to add functionality to the compiler in order to get OpenCL or CUDA implementations of the same cell models and to thus increase the efficiency of the "Web Lab".

General GPGPU and high performance computing projects Joe Pitt-Francis Computational Biology and Health Informatics B C

I am willing to supervise projects which fit into the general areas of General Purpose Graphics Processing Unit (GPGPU) programming and High-Performance Computing (HPC). Specific technologies used are likely to based around

  • NVIDIA CUDA for GPU programming;
  • MPI for distributed-memory cluster computing.

All application areas considered although geometric algorithms are favourites of mine.

General graphics projects Joe Pitt-Francis Computational Biology and Health Informatics B C

I am interested in supervising general projects in the area of computer graphics.  If you have a particular area of graphics-related research that you are keen to explore then we can tailor a bespoke project for you.  Specific projects I have supervised in the past include

  • "natural tree generation" which involved using Lindenmayer systems to grow realistic looking bushes and trees to be rendered in a scene;
  • "procedural landscape generation" in which an island world could be generated on-the-fly using a set of simple rules as a user explored it;
  • "gesture recognition" where a human could control a simple interface using hand-gestures;
  • "parallel ray-tracing" on distributed-memory clusters and using multiple threads on a GPU card;
  • "radiosity modelling" used for analysing the distribution of RFID radio signal inside a building; and
  • "non-photorealistic rendering" where various models were rendered with toon/cel shaders and a set of pencil-sketch shaders.

(Such a project is generally not suitable for MSc students.  They should note that in order for this option to work as a potential MSc project then it should be combined with a taught-course topic such as machine learning, concurrent programming, linguistics etc.)

Graphics pipeline animator Joe Pitt-Francis Computational Biology and Health Informatics B

Pre-requisites: Computer graphics, Object-oriented programming

The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process

Intuitive exploration through novel visualisation Joe Pitt-Francis Computational Biology and Health Informatics B C I am interested in novel visualisation as a way to represent things in a more appealing and intuitive way. For example the Gnome disk usage analyzer (Baobab) uses either a "ring chart" or "treemap chart" Representation to show us which sub-folders are using the most disk. In the early 1990s the IRIX file system navigator used a 3D skyscraper representation to show us similar information. There are plenty more ways of representing disk usage: from DAGs to centralised Voronoi diagrams. What kind of representation is most intuitive for finding a file which hogging disk-space and which is most intuitive for helping us to remember where something is located in the file-system tree? The aim is to explore other places where visualisation gain intuition: for example, to visualise the output of a profiler to find bottlenecks in software, to visual a code coverage tool in order to check that test-suites are are testing the appropriate functionality or even to visualise the prevalence of diabetes and heart disease in various regions of the country.
Unsupervised Domain Adaption with Adversarial Networks Joe Pitt-Francis Computational Biology and Health Informatics MSc Co-supervised by Julian Godding - j.godding@gardin.co.uk At www.gardin.co.uk "Object counting and segmentation are necessary tasks in image-based plant phenotyping, notably for counting plant organs and segmenting different plant matter. One challenge for computer vision tasks in plant phenotyping is that, unlike large image datasets of general objects, plant image datasets usually include highly self-similar images with a small amount of variation among images within the dataset. An individual plant image dataset is often acquired under similar conditions (single crop type, same field) and therefore trying to directly use a CNN trained on a single dataset to new images from a different crop, field, or growing season will fail. This is because a model trained to count objects on one dataset (source dataset) will not perform well on a different dataset (target dataset) when these datasets have different prior distributions. This challenge is called domain-shift. An extreme case of domain shift in plant phenotyping is using a source dataset of plant images collected in an indoor controlled environment, which can be more easily annotated because plants are separated with controlled lighting on a blank background and attempting to apply the model to a target dataset of outdoor field images with multiple, overlapping plants with variable lighting and backgrounds, and blur due to wind motion. Giuffrida et al proposed a method to adapt a model trained to count leaves in a particular dataset to an unseen dataset in an unsupervised manner. Their method uses the representations of images from the source domain (from a pretrained network) to adapt to the target domain. They employed the Adversarial Discriminative Domain Adaptation approach to solving the domain shift problem in the leaf counting task across different datasets. Their method aligns the predicted leaf count distribution with the source domain’s prior distribution, which limits the adapted model to learning a leaf count distribution similar to the source domain. Ayalew et al propose a custom Domain-Adversarial Neural Network architecture and trained it using the unsupervised domain adaptation technique for density estimation-based organ counting. These approaches have delivered strong results in reducing the impact of domain shift. In this project, the data scientist will perform further research into the use of adversarial networks for domain adaptation and apply them to our unique plant image dataset. " Preredquisites
mmWAVE TECHNOLOGY Kasper Rasmussen Security MSc

One of Stofl’s primary goals is to help facilitate the expansion of the Internet of Things out into the nooks and crannies of the world. The context of this project revolves around our vision for wildlife technology and poaching security. Having secured a partner through the Moholoholo Rehabilitation Centre in South Africa (home to Stoffel the Honey Badger, inspiration for our company, Stofl), we are providing them a LoRaWAN network to deploy these technologies. Together with the Director of the facility and his son, a well-known anti-poacher in the area, discussed a large number of possible LoRa based solutions for the centre, from soil moisture monitoring to feed monitoring, it became clear that the most pressing issue they were facing was regarding poachers breaking in and viciously killing the animals for their skin or tusks. The scope of this project expands on a project that has been started by a student at Imperial College London in an examination into the use of low-power mmWave sensors to detect people, movement, concealed weapons and detect the difference between them using AI algorithms. The purpose of this project is to design a lightweight, portable and battery operated sensor to assist in anti-poaching exercises in South Africa. While the initial solution surrounds the differentiation of various objects, it would need to consider future requirements such as the detection and differentiation between varying weapons made of metal and other materials. Objectives: The study should carry forward the research completed through the previous project with the goal of developing a lightweight, low-power device capable of detecting the metal casing of a bullet from a minimum range of 2 meters. Once a baseline is set for what is available currently on the market, the project focus should be on optimising the solution by increasing detection range, capabilities and power efficiency. Our overall objective is to provide a framework for potential solutions to be trialed at the Moholoholo Rehabilitation Centre to prevent poachers from killing the animals. Initial objectives involve research and categorisation of what is currently available on the market and comparing them against their inherent limitations. As the project develops, further investigation into potential avoidance methods and security threats of the solutions should be considered. Support: Stofl’s concept is openness and transparency and our schedule is available for anyone to book a meeting with us should a formal time be required. We would invite any interested persons to reach out to us and engage in a discussion and we are available through any communications platforms, whether this be by email, WhatsApp, Zoom, Google Meet, Facetime or face to face. We understand that each individual is unique and we are ready to engage with anyone who we feel is a good fit for us. We will provide regularly scheduled meetings, availability through whatsapp, discord, email, etc. as well as access to our extensive network of partners, companies and experts.

Project slides can be viewed here

 

Statistical shape atlas of cardiac anatomy Blanca Rodriguez, Pablo Lamata Computational Biology and Health Informatics MSc

Description: Cardiac remodelling is the change in shape of the anatomy due to disease processes. 3D computational meshes encode shape variation in cardiac anatomy, and render higher diagnostic than conventional geometrical metrics. Better shape metrics will enable an earlier detection, a more accurate stratification of disease, and more reliable evaluation of remodelling response. This project will contribute to the development of a toolkit for the construction of anatomical atlases of cardiac anatomy, and its translation to clinical adoption. The student will learn about the challenges and opportunities of the cross-disciplinary field between image analysis, computational modelling and cardiology, and be part of a project with a big potential impact on the management of cardiovascular diseases.

Prerequisites: Motivation. Good programming skills. Experience with computational graphics and image analysis is an advantage.

A High-Level Scripting Language for AudioMoth Alex Rogers Cyber Physical Systems B C MSc AudioMoth is a low-cost open-source smart acoustic sensor used for biodiversity and environmental sensing (https://www.openacousticdevices.info). There are 10,000 AudioMoth devices in use today; the majority of which are using the standard firmware which provides scheduled recording. We want to open up AudioMoth to ecologists and environmental scientists who wish to use it as a generic data logging platform and to do so we need to provide a high level scripting language that is easy to use and familiar to users of Python and R (currently all AudioMoth code must be in low-level embedded C code). This project will deliver this language (possibly a new language or the porting of an existing language, such as MicroPython, to the AudioMoth platform) and an associated standard library that make reprogramming and reconfiguring AudioMoth simple and straightforward.
A WebGL framework for viewing pre-recorded and live audio data Alex Rogers Cyber Physical Systems B C MSc Conservation biologists commonly use passive acoustic recorders, such as AudioMoth (https://www.openacousticdevices.info), to understand which species of birds, bats and insects are living within an environment. The software tools to analyse these recordings have traditionally been distributed as native applications. However, support for the File System Access API in the Chrome and Edge browsers now allows browser-based tools to offer the same functionality. These browser-based tools can also go further using frameworks such as WebGL to accelerate rendering and computation. In this project, you will be required to develop a set of WebGL tools to manipulate and visualise acoustic recordings within the browser. The tools will allow the user to explore wavefom and sonogram views of pre-recorded WAV files, and should be sufficiently optimised to allow live audio, captured by USB microphones, to be viewed in real-time. A stretch goal is to develop additional tools that allow various filters and recognition algorithms to be applied in real-time; for example, to automatically annotate and recognise bat calls.
A browser-based interpreter for occam Alex Rogers Cyber Physical Systems B C MSc occam is an ideal programming language for running short introductory workshops on concurrent programming. Originally developed as the native programming language of the transputer, occam allows parallel processes to be declared with minimal boilerplate code and all communication between processes is through named channels. Unfortunately, it is not possible to easily compile or run occam today. This project will address this shortcoming by developing a browser-based interpreter for concurrent occam code. You will be required to research the occam language, choosing the appropriate language version to address, and develop the necessary tools to allow occam code to run in the browser. A stretch goal is to develop additional tools that allow occam code run natively on a modern processor by exploiting the existing concurrency models and toolchains of either Java or Golang. Prerequisites: Compilers
A functional/dataflow programming language for acoustic signal processing Alex Rogers Cyber Physical Systems B C MSc Many detection tasks for acoustic sensors deployed in environmental monitoring (for example, triggering an extended recording after analysing a small sample of sound in order to capture the sound of a particular species of animal, or to detect human activity such as hunting and logging) can be readily described using functional or dataflow programming languages. Prototyping algorithms at this level is intuitive, and with the correct tools can be interactive, enabling non-specialist programmers to make rapid progress. However, to run on small embedded devices the resulting program typically needs to be converted to low-level C code. In this project, you will build a toolchain to automate this process. The resulting system will allow a functional or dataflow programming language (possibly one of your own design) to be used to prototype and evaluate acoustic algorithms. It will then automatically generate low level C code that can be compiled to run on an open-source resource-constrained acoustic sensor (www.openacousticdevices.info). Prerequisites: Compilers and Principles of Programming Languages useful but not essential
A scripting language for a low-cost acoustic recorder Alex Rogers Cyber Physical Systems B C MSc AudioMoth (https://www.openacousticdevices.info) is an open-source low-cost acoustic recorder that is widely used for biodiversity and environmental monitoring. Most users deploy it for extended periods using a desktop configuration tool to set a simple static recording schedule (such as record each day at 384kHz sample rate from 18:00 to 22:00). However, a growing number of users require more complex schedules. For example, they may wish to track sunrise and sunset, recording for an hour either side. Or they may wish to record on alternate days, or only on weekdays. Encapsulating all these possibilities in a static GUI is a challenge. In this project, you will design and implement a scripting language that will allow users to program complex recording schedules through a script that is interpreted or compiled on the device itself. The solution must work within the constraints of the device (approximately 80KB of program space and 32KB of memory) and should carefully consider the useability of the language by users who are more likely to be conservation biologists than computer programmers.
Deploying TensorFlow Lite on AudioMoth Alex Rogers Cyber Physical Systems MSc

Co supervised by Johnny Austinn jonny@microbit.org

At www.microbit.org

The use of convolution and deep neural networks on computationally constrained hardware has received lots of attention recently with the development of TensorFlow Lite which targets small ARM devices. This project will explore the use of this framework for audio detection tasks. The intended platform is AudioMoth -- a low-cost open-source smart acoustic sensor used for biodiversity and environmental sensing (https://www.openacousticdevices.info) — and a typical application would involve training a neural network to recognise the sound of a particular species of bird such that only recordings of that species are captured by the device. This project will develop a complete toolchain to allow a model to be trained and tested on a desktop machine using library recordings, and then ported to run on the AudioMoth device (constrained by its limited RAM and flash storage).

Prerequisites - Digital Systems (or equivalent)

Developing a low-cost open-source acoustic sensor for coral health monitoring Alex Rogers Cyber Physical Systems MSc AudioMoth (https://www.openacousticdevices.info) is an open-source low-cost acoustic recorder that is widely used for environmental monitoring. Most recently it has been deployed with partners to monitor the health of coral reefs (https://zslpublications.onlinelibrary.wiley.com/doi/10.1002/rse2.249) by using existing analysis software to post-process recordings in order to count individual invertebrate snaps and to extract ‘acoustic indices’ which measure the complexity of the soundscape. In this project, you will attempt to avoid this post-processing stage by calculating the relevant parameters of interest directly on the AudioMoth’s 48MHz ARM Cortex M4F processor. This will reduce the energy consumption of the device, facilitating longer deployments on standard batteries, and will avoid the cumbersome handling of large audio files. You will work with the end-users to understand their requirement, and you will develop the appropriate device firmware and associated configuration tools to allow a trial deployment of the solution to be performed toward the end of the project.
Resurrecting Extinct Computers Alex Rogers Cyber Physical Systems B C MSc

While the architecture of current reduced instruction set processors is well established, and relatively static, the early days of computing saw extensive experimentation and exploration of alternative designs. Commercial processors developed during the 1960s, 1970s and 1980s included stack machines, LISP machines and massively parallel machines, such as the Connection Machine (CM-1) consisting of 65,536 individual one-bit processors connected together as a 12-dimensional hypercube. This period also saw the development of the first single chip microprocessors, such as the Intel 4004, and the first personal computers, such as the Altair 8800 using the Intel 8080 microprocessor. This project will attempt to resurrect one of these extinct designs (or a scaled down version if necessary) using a modern low-cost field-programmable gate array (FPGA). You will be required research the chosen processor, using both original and modern sources, and then use Verilog to develop a register level description of the device that can be implemented on a FPGA. The final device should be able to run the software of the original and could be realised in a number of different forms depending on the chosen processor (e.g. an Altair 8800 on a small USB stick running Microsoft BASIC). Prerequisites: Digital Systems or Computer Architecture useful but not essential

Using the BBC micro:bit and TinyML to sense and respond to the world Alex Rogers Cyber Physical Systems MSc Co supervised by Johnny Austin Jonny@microbit.org At www.microbit.org "The BBC micro:bit V2 is capable of running TinyML workloads, and there have been some interesting demo applications. However, little work has been done to push the limits of what is possible with the device's built in motion sensor and microphone. This is an open-ended project to look at pushing the envelope of the machine learning tasks possible on the micro:bit. Because the device is widely available and inexpensive, it makes an excellent and accessible platform for datascience and TinyML work. For example, could the micro:bit be used as a wrist-worn device to recognise different activity types, or could it be trained to distinguish bat species, or birdsong? Could we teach it to recognise a range of 'command keywords' for controlling other projects? By demonstrating creative and effective uses for machine learning with the micro:bit you will be feeding into the future feature roadmap of the Micro:bit Educational Foundation as we work out how to teach students around the world about AI and ML with physical computing" Prerequisites Digital Systems (or equivalent)
Using the BBC micro:bit for conservation research Alex Rogers Cyber Physical Systems MSc Co supervised by Johnny Austin Jonny@microbit.org At www.microbit.org "The BBC micro:bit V2 is capable of running TinyML workloads. Because the device is widely available, consumes little power, and is inexpensive, it makes an excellent and accessible platform for datascience and TinyML work. This project aims to see how the BBC micro:bit could be used for conservation studies - could it be used to detect the presence of different species using the built-in microphone, or deployed for recognising logging activies, either with sound or motion. How else could the device be used for conservation fieldwork? As well as creating valuable tools for those i the conservation space, by demonstrating creative and effective uses for machine learning with the micro:bit you will be feeding into the future feature roadmap of the Micro:bit Educational Foundation as we work out how to teach students around the world about AI and ML with physical computing." Prerequisites Digital Systems (or equivalent)
Applications of ad hoc security Bill Roscoe Security MSc My group have developed novel technology for developing security between pairs and groups of devices such as mobile phones. This technology clearly has many potential applications: the objective of this project is to build an implementation directed at an application of the student's choosing, for example email, healthcase or social networking. See my papers with Bangdao Chen, Ronald Kainda and Long Nguyen on my web page. Prerequisites: Security
Developing and using SVA Bill Roscoe Automated Verification MSc SVA is a front end for FDR that analyses shared variable programs. See Chapters 18 and 19 of Understanding Concurrent Systems. It has the potential to support a number of projects including: (a) Building a version that accepts a subset of some standard programming language as opposed to the custom language it presently uses. (b) Extending its functionality (c) Case studies using it. (a) and (b) would both involve programming in CSP and JAVA, the two languages SVA is written in. Prerequisites: Concurrency for (a) and (b)
Modelling and verifying systems in Timed CSP and FDR Bill Roscoe Automated Verification B C MSc

Timed CSP reinterprets the CSP language in a real-time setting and has a semantics in which the exact times of events are recorded as well as their order. Originally devised in the 1980s, it has only just been implemented as an alternative mode for FDR. The objective of this project is to take one or more examples of timed concurrent system from the literature, implement them in Timed CSP, and where possible compare the performance of these models with similar examples running on other analysis tools such as Uppaal.

References:

(Reference Understanding Concurrent Systems, especially Chapter 15, and Model Checking Timed CSP, from AWR's web list of publications)

Category-theoretic syntactic models of programming languages Philip  Saville, Sam Staton Programming Languages C MSc

Syntactic models -- categories constructed from the syntax of a programming language -- play a key role in category-theoretic denotational semantics. Showing such models exist and satisfy a suitable universal property amounts to giving a sound and complete semantic interpretation for the language in question. Often this involves carefully studying the interplay between program features and categorical structures.

The three main aims of the project are as follows. Firstly, to construct syntactic models for two idealised effectful functional programming languages, namely Moggi's monadic metalanguage [1] and computational lambda calculus [2]. Next, to prove their universal properties, and finally to use these to give syntactic translations between the languages.

The starting point would be to understand the categorical semantics of the simply-typed lambda calculus, the monadic metalanguage and the computational lambda calculus. Extensions would include exploring extensionality / well-pointedness and constructing fully abstract syntactic models of these languages.

[1] E. Moggi, "Notions of computation and monads," 1991 [2] E. Moggi,

Pre-requisite courses:

     - Categories, proofs and processes Useful background courses:

     - Principles of programming languages

    - Lambda calculus and types

 

A better JIT for Keiko Michael Spivey Programming Languages B C MSc The existing JIT for Keiko is very simple-minded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pure-JIT implementation. One idea: the compiler now knows too much about the layout of stack frames. In particular, it knows that the arguments of a procedure are on the stack at a certain offset from the frame base, effectively fixing the size of the frame head. By introducing appropriate instructions into the VM, it should be possible to make the code more independent of the exact stack layout, and that may make it possible to fit in with the native calling convention of some platforms. This should make it possible to merge the Oberon stack and the native subroutine stack, saving space and freeing up registers.
A fast, portable grep Michael Spivey Programming Languages B C MSc

Grep is a unix tool that takes a regular expression on the command line and prints all lines from a file that contain a string matching the regular expression. Various ways of implementing regular expression matching are possible, but in a 1968 article, Ken Thompson describes an implementation that dynamically translates regular expressions into machine code. Thunder is a small library (a replacement for GNU Lightning) that makes dynamic code generation possible in a portable way, with backends for the x86 and the ARM at present. Thunder works by taking assembly-level code for an invented RISC-like register machine and transliterating each instruction into one or two instructions for the host machine. The goal of this project is to implement a fast version of Grep that uses Thompson's method of matching.

Thompson's paper contains a stack-based compiler written in Algol 60 that generates machine code for the IBM 7094, one of the first transistorised computers, with bizarre features such as index registers that are subtracted from the address rather than added. It's worth deciphering to appreciate the wicked cleverness of Thompson's program, which although it does not mention nondeterministic finite automata explicitly, was nevertheless the source of the construction that turns regular expressions into NFAs. Russ Cox has written a cheat sheet that contains just enough detail about the 7094 to read Thompson's paper, and I have some notes of my own that transliterate the machine code into a more modern style. Thompson's compiler uses a fiendish stack-based algorithm that exploits quirks of the way subroutines work on the 7094, but a simpler compiler can work from an AST and use Thunder's system of code labels to avoid explicit patching of branch destinations.

To learn to use Thunder, it's best to begin with the program fact.c that is included with the source of the Oxford Oberon Compiler, and contains both iterative and recursive implementations of the factorial function written as Thunder code. There is a tutorial that explains how it works.

For the project, it would be possible to prototype a regular expression compiler with OCaml using Lex and Yacc, and generate the object code in the form of C source like the fact.c program. A smoother implementation would be written entirely in C, parsing regexps by recursive descent to build an AST, then using a recursive function written in C to translate to Thunder code. Thomson's algorithm cannot handle certain regular expressions like (a*)* without going into an infinite loop, but it is always possible to simplify these expressions (in this case to a*) to avoid the problem. This can be done during parsing or as a separate pass between parsing and code generation.

A regular expression toolkit Michael Spivey Programming Languages B C MSc Various constructions on regular expressions and finite automata are nice to contemplate, but hard to describe informally without sinking in a mire of nested subscripts, and hard to implement in a conventional way without becoming lost in a maze of twisty little pointers, all alike. The goal in this project is to come up with a perspicuous implementation of some of these algorithms in a suitable programming language such as Haskell or OCaml, abstracting shared patterns such as fixpoint computations as higher-order functions. A suitable target would be a program for marking Models of Computation homework: it would accept two regular expressions on the command line, and either verify that they were equivalent or print a word that matched one expression but not the other. More ambitious would be a program that could also prove the equivalence from the Kozen axioms.
A simple functional language Michael Spivey Programming Languages B C MSc

GeomLab is based on a simple functional language called Fun, and a still simpler language with the same name (but without the pattern-matching) is implemented by interpreter in the Principles of Programming Languages course. (GeomLab has a self-hosting compiler that targets the JVM). The goal of this project is to produce another prototype implementation of Fun, or a language like it, via a compiler that targets some other platform. One possibility is to modify Keiko by allowing heap-allocated activation records; another is to compile to the existing Keiko machine but use closure conversion.

 

A simple object-oriented language Michael Spivey Programming Languages B C MSc Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala. Almost any variation on the theme is acceptable: a different language, object-oriented or not; or a target other than Keiko. A similar thing already exists, in that Oberon–2 is object-oriented (in an interesting way), and the design of Keiko was expressly made so that it would provide what's needed to implement it. Since this project idea is so broad, successful completion of it depends on quickly reaching a precise definition of the language that you are going to implement, and doing that will be the first task.
Better register allocation Michael Spivey Programming Languages B C MSc The compiler we currently study in the last part of the Compilers course allocates registers greedily during the instruction selection phase. Even on the ARM, this can lead to unnecessary register-to-register moves, but a variant of the compiler for AMD64 suffers from the problem more severely. The goal of this project is to implement a better register allocation scheme that operates as a separate pass after instruction selection. The pass will work from an explicit representation of the instruction sequence without assigned registers, and will be able to take into account move instructions marked in the instruction sequence, wherever possible assigning the same register to both the source and target of the move, so that the instruction itself can be eliminated. Source material for the project can be found in the book, A retargetable C compiler by Fraser and Hanson, and in articles referenced in that book.
Eliminating array bounds checks Michael Spivey Programming Languages B C MSc

The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is non-null. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety.

References

AT+98: A. Adl-Tabatabai et al., Fast, effective code generation in a just-in-time Java compiler, SIGPLAN Notices 33, 5 (May 1998), pp. 280–90.

Mos16: A. Mosoi, Bounds check elimination for Go, 2016. https://docs.google.com/document/d/1vdAEAjYdzjnPA9WDOQ1e4e05cYVMpqSxJYZT33Cqw2g

QHV02: F. Qian, L. Hendren and C. Verbrugge, A comprehensive approach to array bounds elimination for Java. In Compiler Construction (R. Nigel Horspool, ed.), pp. 325–41, Springer, 2002.

WWM09: T. Wüthinger, C. Wimmer and H. Mössenbock, Array bounds check elimination in the context of deoptimization, Science of Computer Programming 74, 5 (March 2009), pp. 279–95.

 

Generational garbage collection for Keiko Michael Spivey Programming Languages B C MSc The existing implementation of Keiko uses a stop-the-world copying collector that could cause long pauses in some applications. Modify the instruction set so pointer stores can be identified, and enhance the garbage collector to make it incremental and perhaps generational.
Implementing a BURS generator generator Michael Spivey Programming Languages B C MSc In the Compilers course, we selected instructions by a greedy pattern-matching algorithm that works on operator trees, and expressed the algorithm by means of a family of recursive functions. For CISC machines especially, this greedy algorithm is inadequate, and it becomes necessary to use an algorithm based on dynamic programming to find an optimal tiling of the operator tree with instructions. Code generator generators exist that are capable of translating a machine grammar annotated with costs and instruction templates into efficient pattern matching code in C, in the form of a Bottom Up Rewriting System. The aim in this project is to implement a similar tool based on OCaml, so that the code generation phase of our compilers can be expressed directly as a tree grammar.
Native code from Oberon Michael Spivey Programming Languages B C MSc I have an Oberon compiler that targets a portable bytecode, and a native-code compiler for a small Pascal-like language that uses a similar language of instructions to the bytecode for its tree-based IR. It would be nice to combine the two to make a native code compiler for Oberon, targeting (perhaps) the ARM or maybe the Thumb variant of ARM. Actually, there's the native code compiler used for the course, and also an extended, multi-target compiler that contains partial solutions for at least some of the issues listed below.
Porting micro:bian Michael Spivey Programming Languages B C MSc micro:bian is a very simple embedded OS that is used for teaching the first year course on Digital Systems. Unlike most other embedded operating systems, it is based on the single synchronisation primitive of synchronous message passing, with all interrupts converted to messages. At present, micro:bian is specific to the BBC micro:bit. But plenty of other inexpensive ARM-based development boards exist, such as the Freescale FRDM-KL25Z and FRDM-K64F boards, and many other boards that are part of the MBED ecosystem. The aim of this project is to port micro:bian to one or more of these other boards, identifying and isolating the parts that are board-specific and providing multiple implementations of device driver processes so that micro:bian applications are as far as possible portable between boards. Other boards don't have the switches and lights of the micro:bit but some, including the FRDM boards, support Arduino-compatible add-on 'shield' boards, and we could easily make a shield that replicates the features of the micro:bit. Work to extend and port micro:bian is ongoing, and another page has notes and details of progress.
micro:bian under pthreads Michael Spivey Programming Languages B C MSc For experimentation and debugging, it would be really nice to have an implementation of the micro:bian API in terms of pthreads. Then micro:bian programs could run on a Linux (or other) host instead of running on the micro:bit. Dummy device drivers could be written so that a micro:bian program could communicate using standard input and output.
MSc and Undergraduate Projects Bernard Sufrin Programming Languages MSc

Bernard Sufrin's most recent collection of detailed project descriptions can always be found at http://www.cs.ox.ac.uk/people/bernard.sufrin/personal/projects.html. In addition to the projects described there, he is happy to discuss supervising projects in concurrent programming, user-interface design, programming language implementation, program transformation, and proof support.

"Parallel reasoning in Sequoia" David Tena Cucala, Bernardo Cuenca Grau Artificial Intelligence and Machine Learning, Data and Knowledge C MSc

Sequoia is a state-of-the-art system developed in the Information Systems Group of the University of Oxford for automated reasoning in OWL 2, a standard ontology language in the Semantic Web. Sequoia uses consequence-based calculus, a novel and promising approach towards fast and scalable reasoning. Consequence-based algorithms are naturally amenable to parallel reasoning; however, the current version of Sequoia does not take advantage of this possibility. This project aims at implementing and testing parallel reasoning capabilities in Sequoia. The student involved in this project will acquire knowledge of state-of-the-art techniques for reasoning in OWL 2. They will develop an implementation that exploits the characteristics of Sequoia's underlying algorithm to achieve parallel reasoning, and they will perform an evaluation of the system's performance against other state-of-the-art reasoners.

Prerequisites:

only suitable for someone who has done the
Knowledge Representation and Reasoning course. Experience with concurrent programmig and/or Scala is desirable

ARETHA: A Transparent and Respectful Virtual Assistant for the Home Max Van Kleek Human Centered Computing B C MSc Virtual assistants like Alexa and Siri have become hugely popular conveniences in homes worldwide. However, they also pose unprecedented privacy risks by being black-box vehicles of corporate surveillance by the likes of Amazon, Apple and others. Is it possible to design a virtual assistant that can be guaranteed to protect user privacy, and be fully transparent and controllable by end-users? This project will involve designing and developing a fully modular, open source virtual assistant called ARETHA from the ground up, using open source components, which can provide strong guarantees about being prirvacy-preserving and pro-user.
Algorithmic Explanations Max Van Kleek, Jun Zhao Human Centered Computing B C MSc Description: Machine learning is a range of methods for classifying or predicting real-world phenomena using data. When ML models are used to predict employee success, or set insurance prices, they can have significant effects on people’s lives. Understandably, in such circumstances people might want explanations for why a system has given them a negative score. Various methods have been proposed to generate interpretable ML models; including systematically perturbing the inputs to observe effects on the output, or providing power indices for each input. However, few of these techniques have been translated into user-friendly explanation interfaces. In this project, you would explore ways to improve explanation systems. This could involve generating novel explanation methods; translating existing ones into graphical or natural language formats; and / or testing them out on users for different purposes.
App and loT X-Ray Max Van Kleek, Jun Zhao Human Centered Computing B C MSc The X-Ray project enables people to see all of the first and third parties that receive data from the smartphone apps and IoT devices in their homes, so that they can make better privacy decisions. There are several pieces to the X-Ray: the Observatory, which discovers and retrieves smartphone software such as smartphone apps, the analyser, a static analysis method for analysing these apps to derive information flow models, and end-user clients which provide visualisations and ways for people to understand these models in the context of specific privacy choices. In this project, you will be invited to extend the X-Ray in some way: such as by introducing a new module for the observatory (to ingest new kinds of content), the analyser, or a new way for end-users to view and use the models derived.
Privopticon: A privacy-preserving mesh OS for self-observation Max Van Kleek Human Centered Computing B C MSc

Self-monitoring has many potential applications within the home, such as the ability to understand important health and activity rhythms within a household automatically. But such monitoring activities also have associated extreme privacy risks. Can we design new kinds of sensing architectures that are designed to preserve inhabitants'  privacy?  We have made initial inroads of a new mesh operating system prototype for raspberry π hardware for new classes of privacy-preserving self-monitoring applications for the home, which will provide  user-configurable degrees of information fidelity, have built-in forgetting and are accountable by design. We need your help to try and evaluate different kinds of methods for achieving these goals.

3D demos of geometric concepts Irina Voiculescu Computational Biology and Health Informatics B The Geometric Modelling course (Part B) deals with several interesting concepts which would be best visualised in a suite of applications. A coherent suite of 3D demos could easily become a useful tool for this course, as well as for users worldwide. This project would be most suitable for a candidate who already have some experience using a 3D graphics library of their choice and want to improve this skill. The mathematical concepts are well-documented.
3D environment for Hand Physiotherapy Irina Voiculescu Computational Biology and Health Informatics B C After hand surgery, it is almost always necessary for patients to have physiotherapy afterwards to help with their recovery. As part of this, the patient will need to perform hand exercises at home. However, the patient may not always do the exercises correctly, or they might forget to do their exercises. The goal of this project is to use the Leap Motion to create a user-friendly GUI which a patient could use to aid them with their home exercises. The interface would show the user where their hand should be and they would then need to follow the movements. It could work from a web-based software or a downloaded software. It would need to be tailored to the patient so it contained their specific required exercises, which could be input by the physiotherapist. It would need to store data on how the patient is doing and feedback this data to the patient, and possibly also to the physiotherapist via the internet. If internet-based, patient confidentiality and security would need to be considered. This project would be performed in close collaboration with a physiotherapist, an orthopaedic hand surgeon, and a post-doctoral researcher based at the Nuffield Orthopaedic Centre.
3D printing medical scan data Irina Voiculescu Computational Biology and Health Informatics B C MSc

Computed tomography (CT) scanning is a ubiquitous scanning modality. It produces volumes of data representing internal parts of a human body. Scans are usually output in a standard imaging format (DICOM) and come as a series of axial slices (i.e. slices across the length of the person's body, in planes perpendicular to the imaginary straight line along the person's spine.)

The slices most frequently come at a resolution of 512 x 512 voxels, achieving an accuracy of about 0.5 to 1mm of tissue per voxel, and can be viewed and analysed using a variety of tools. The distance between slices is a parameter of the scanning process and is typically much larger, about 5mm.

During the analysis of CT data volumes it is often useful to correct for the large spacing between slices. For example when preparing a model for 3D printing, the axial voxels would appear elongated. These could be corrected through an interpolation process along the spinal axis.

This project is about the interpolation process, either in the raw data output by the scanner, or in the post-processed data which is being prepared for further analysis or 3D printing.

The output models would ideally be files in a format compatible with 3D printing, such as STL. The main aesthetic feature of the output would be measurable as a smoothness factor, parameterisable by the user.

Existing DICOM image analysis software designed within the Spatial Reasoning Group at Oxford is available to use as part of the project.

3D stereo display of medical scan data Irina Voiculescu, Stuart Golodetz Computational Biology and Health Informatics, Cyber Physical Systems B C The Medical Imaging research group has been working with a variety of data sourced from CT and MRI scans. This data comes in collections of (generally greyscale) slices which together make up 3D images. Our group has developed software to generate 3D models of the major organs in these images. This project aims to develop a simple augmented reality simulation for the Oculus Rift which will render these organs within a transparent model of a human and allow the user to walk around the model so as to view the organs from any angle. This has a number of possible applications, including to train medical students and to help surgeons to explain medical procedures to their patients.
Exact Algorithms for Complex Root Isolation Irina Voiculescu Computational Biology and Health Informatics B C MSc

Not available in 2013/14

Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and Hansen-Sengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU.

Gesture recognition using Leap Motion Irina Voiculescu Computational Biology and Health Informatics B C MSc

Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users being asked to carry out static gestures above a Leap Motion sensor, and these gestures being scored according to a variety of criteria. A prototype has been constructed to gather data, and some data has been gathered from a few controls and patients. In order to deploy this as a clinical tool into the NHS, there is need for a systematic data collection and analysis tool, based on machine learning algorithms to help classify the data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. Also, the graphical user interface needs to be extended to give particular kinds of feedback to the patient in the form of home exercises, as part of a rehabilitation programme.

This project was originally set up in collaboration with Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.

Identifying features in MRI scan data Irina Voiculescu Computational Biology and Health Informatics B C MSc In recent years, medical diagnosis using a variety of scanning modalities has become quasi-universal and has brought about the need for computer analysis of digital scans. Members of the Spatial Reasoning research group have developed image processing software for CT (tomography) scan data. The program partitions (segments) images into regions with similar properties. These images are then analysed further so that particular features (such as bones, organs or blood vessels) can be segmented out. The team's research continues to deal with each of these two separate meanings of medical image segmentation. The existing software is written in C++ and features carefully-crafted and well-documented data structures and algorithms for image manipulation. In certain areas of surgery (e.g. orthopaedic surgery involving hip and knee joint) the magnetic resonance scanning modality (MRI) is preferred, both because of its safety (no radiation involved) and because of its increased visualisation potential. This project is about converting MRI scan data into a format that can become compatible with existing segmentation algorithms. The data input would need to be integrated into the group's analysis software in order then to carry out 3D reconstructions and other measurements. This project is co-supervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS.
Reinforcement learning techniques for games Irina Voiculescu Computational Biology and Health Informatics B C

This project is already taken for 2017-2018

 

Psychology has inspired and informed a number of machine learning methods. Decisions within an algorithm can be made so as to improve an overall aim of maximising a (cumulative) reward. Supervised learning methods in this class are known as Reinforcement Learning. A basic reinforcement learning model consists of establishing a number of environment states, a set of valid actions, and rules for transitioning between states. Applying this model to the rules of a board game means that the machine can be made to learn how to play a simple board game by playing a large number of games against itself. The goal of this project is to set up a reinforcement learning environment for a simple board game with a discrete set of states (such as Backgammon). If time permits, this will be extended to a simple geometric game (such as Pong) where the states may have to be parameterised in terms of geometric actions to be taken at each stage in the game.

Simple drawing analysis Irina Voiculescu Computational Biology and Health Informatics B C MSc

Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users drawing simple figures on a tablet, and these figures being scored according to a variety of criteria. Data has already been gathered from 200 or so controls, and is being analysed for a range of parameters in order to assess what a neurotypical person could achieve when drawing such simple figures. Further machine learning analysis could help classify such data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia.

This project was originally co-supervised by Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.

Preconditioning techniques for sparse linear systems Jonathan Whiteley Computational Biology and Health Informatics B Many large systems of linear equations are sparse, i.e. if the matrix that describes the linear system is of size N times N, where N is large, there are very few non-zero entries in each row. Under these conditions there may not be enough memory to store the whole matrix, and so only the non-zero entries are stored. This prevents techniques such as LU decomposition being used to solve the linear system; instead an iterative technique such as the conjugate gradient technique for symmetric positive definite matrices, or GMRES for more general matrices is used. The number of iterations needed with these techniques can be large, rendering these techniques inefficient. To prevent this, preconditioning techniques are used - if the linear system is defined by Ax=b, then a preconditioner P is used and the system solved is instead PAx = Pb, where P is cheap to calculate and both PAx and Pb are cheap to evaluate. > > In this project we will investigate matrices of the form > > A = ( B C ) > ( C^T 0 ) > > which arise in many fields, such as constrained optimisation and continuum mechanics. We will utilise the block structure of these matrices to heuristically derive candidate preconditioners, and compare their performances.
The efficiency of numerical algorithms Jonathan Whiteley Computational Biology and Health Informatics B C MSc Many numerical algorithms have error bounds that depend on some user provided input.  For example, the error in a numerical method for solving a differential equation is bounded in terms of the step-size h, and so the user may change the step-size h until a desired accuracy is attained.  Although useful, these error bounds do not take account of computational efficiency.  For example, a numerical method for solving a differential equation may have a very impressive bound with respect to step size h, but may require significantly more computational effort than other methods with less impressive error bounds.  The field of scientific computing is a rich source of algorithms such as these, for example the numerical solution of differential equations, the numerical solution of linear systems, and interpolation of functions.  The aim of this project is to work with a variety of algorithms for solving a given problem, and to assess the computational efficiency of these algorithms.
The efficient storage of sparse matrices Jonathan Whiteley Computational Biology and Health Informatics B Many matrix applications involve sparse matrices, i.e. matrices that have a very large number of rows and columns, but only a small number of non-zero entries in each row.  On a given computer we may only store a finite number of matrix entries.  When working with sparse matrices we usually store only the non-zero entries and their locations.  There are several established techniques for storing sparse matrices.  These methods all have individual strengths and weaknesses - some allow efficient multiplication of sparse matrices by vectors, others allow entries to be modified effectively, while others allow the sparsity pattern to be modified dynamically.  The aim of this project is to investigate these sparse storage methods, and to highlight the strengths and weaknesses of each approach.
Model Checking LTL on Markov Chains James Worrell Automated Verification B C

The goal of this project is to write a program that model checks a Markov chain against an LTL formula, i.e., calculates the probability that formula is satisfied. The two main algorithmic tasks are to efficiently compile LTL formulas into automata and then to solve systems of linear equations arising from the product of the Markov chain and the automaton. An important aspect of this project is to make use of an approach that avoids determinising the automaton that represents the LTL formula. This project builds on material contained in the Logic and Proof and Models of Computation Courses.

Reading: J-M. Couvreur, N. Saheb and G. Sutre. An optimal automata approach to LTL model checking of probabilistic systems. Proceedings of LPAR'03, LNCS 2850, Springer 2003.

Topics in Linear Dynamical Systems James Worrell Automated Verification C MSc A linear dynamical system is a discrete- or continuous-time system whose dynamics is given by a linear function of the current state. Examples include Markov chains, linear recurrence sequences (such as the Fibonacci sequence), and linear differential equations. This project involves investigating the decidability and complexity of various reachability problems for linear dynamical systems. It would suit a mathematically oriented student. Linear algebra is essential, and number theory or complexity theory is desirable. A relevant paper is Ventsislav Chonev, Joël Ouaknine, James Worrell: The orbit problem in higher dimensions. STOC 2013: 941-950.
Analysing user-algorithm interaction Jun Zhao Human Centered Computing MSc

"Members of the Human Centred Computing Group are currently involved in the UnBias research project ( http://unbias.wp.horizon.ac.uk/). This investigates the user experience of algorithm-driven internet platforms. As part of this project we have been conducting observational data collection in which pairs of volunteers are video recorded whilst browsing online. The on screen activity is also captured and the data are being analysed to identify instances of user-algorithm interaction; in particular instances in which an algorithmic outcome – such as the auto complete suggestion in a search bar or the filtering mechanisms on a recommendation platform – might shape the trajectory of browsing activity. We will shortly undertake a second wave of data collection in which users browse online but are placed in situations in which their usual experience of certain algorithmic processes will be disrupted. We are looking for a student to conduct analysis on the collected data and to then build on this analysis by designing and conducting their own disruption experiment. The student can set their own specific research questions for the analysis, and determine what particular kind of methodological approach they would like to undertake. :There are no prerequisites. The study will suit a student interested in developing  skills in social research methods and human computer interaction approaches. We will assist in securing ethical clearance and recruiting research participants for the data collection activities. 

Children, Well-Being, and Apps Jun Zhao Human Centered Computing MSc Description: Tablet computers are becoming the primary means for young children to go online. However, few studies have examined how young children perceive and cope with personal data privacy during their interactions with these mobile technologies. Our current research has shown that young children and their parents largely have little awareness of personal data collection by mobile apps. Parents’ primary concerns are misplaced, and remain focused on familiar risks, like content appropriateness or in-app purchases. As a result, children are regularly exposed to personal data risks and have little skills to recognise or cope with them. In the KOALA project, we will explore different ways to mitigate these risks, including recognising and detecting apps that designed for children in the general app store, and analysing personal data collection practices of existing mobile apps from leading kids app stores.
Topics in Algorithms, Complexity, and Combinatorial Optimisation Standa Živný Algorithms and Complexity Theory C MSc

Prof Zivny is willing to supervise in the area of algorithms, complexity, and combinatorial optimisation. In particular, on problems related to convex relaxations (linear and semidefinite programming relaxations), submodular functions, and algorithms for and complexity of homomorphisms problems and Constraint Satisfaction Problems. Examples of supervised projects involve extensions of min-cuts in graphs, analysis of randomised algorithms for graph and hypergraph colourings, and sparsification of graphs and hypergraphs. The projects would suit mathematically oriented students, with interest in rigorous analysis of algorithms and applications of combinatorics and probabilistic methods to computer science.