Skip to main content

Undergraduate student projects

List of projects

Project Supervisors Parts Description
Aggregation of Photovoltaic Panels Alessandro Abate C

The increased relevance of renewable energy sources has modified the behaviour of the electrical grid. Some renewable energy sources affect the network in a distributed manner: whilst each unit has little influence, a large population can have a significant impact on the global network, particularly in the case of synchronised behaviour. This work investigates the behaviour of a large, heterogeneous population of photovoltaic panels connected to the grid. We employ Markov models to represent the aggregated behaviour of the population, while the rest of the network (and its associated consumption) is modelled as a single equivalent generator, accounting for both inertia and frequency regulation. Analysis and simulations of the model show that it is a realistic abstraction, and quantitatively indicate that heterogeneity is necessary to enable the overall network to function in safe conditions and to avoid load shedding. This project will provide extensions of this recent research. In collaboration with an industrial partner.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking

Analysis and verification of stochastic hybrid systems Alessandro Abate C

Stochastic Hybrid Systems (SHS) are dynamical models that are employed to characterize the probabilistic evolution of systems with interleaved and interacting continuous and discrete components.

Formal analysis, verification, and optimal control of SHS models represent relevant goals because of their theoretical generality and for their applicability to a wealth of studies in the Sciences and in Engineering.

In a number of practical instances the presence of a discrete number of continuously operating modes (e.g., in fault-tolerant industrial systems), the effect of uncertainty (e.g., in safety-critical air-traffic systems), or both occurrences (e.g., in models of biological entities) advocate the use of a mathematical framework, such as that of SHS, which is structurally predisposed to model such heterogeneous systems.

In this project, we plan to investigate and develop new analysis and verification techniques (e.g., based on abstractions) that are directly applicable to general SHS models, while being computationally scalable.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification

Automated verification of complex systems in the energy sector Alessandro Abate C

Smart microgrids are small-scale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks.

These systems present heterogeneity and complexity coming from 1. local and volatile renewables generation, and 2. the presence of nonlinear dynamics both over continuous and discrete variables. These factors calls for the development of proper quantitative models. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is in particular to focus on the energy production via renewables, such as photovoltaic panels.

The project can benefit form a paid visit/internship to industrial partners.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed.

Development of software for the verification of MPL models Alessandro Abate B 2017-18  C

This project is targeted to enhance the software tollbox VeriSiMPL (''very simple''), which has been developed to enable the abstraction of Max-Plus-Linear (MPL) models. MPL models are specified in MATLAB, and abstracted to Labeled Transition Systems (LTS). The LTS abstraction is formally put in relationship with its MPL counterpart via a (bi)simulation relation. The abstraction procedure runs in MATLAB and leverages sparse representations, fast manipulations based on vector calculus, and optimized data structures such as Difference-Bound Matrices. LTS can be pictorially represented via the Graphviz tool and exported to PROMELA language. This enables the verification of MPL models against temporal specifications within the SPIN model checker.

Courses: Computer-Aided Formal Verification, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with dynamical systems, working knowledge of MATLAB and C

Innovative Sensing and Actuation for Smart Buildings Alessandro Abate C Sensorisation and actuation in smart buildings and the development of smart HVAC (heat, ventilation and air-conditioning) control strategies for energy management allow for optimised energy usage, leading to the reduction in power consumption or to optimised demand/response strategies that are key in a rather volatile market. This can further lead to optimised maintenance for the building devices. Of course the sensitisation of buildings leads to heavy requirements on the overall infrastructure: we are interested in devising new approaches towards the concept of using ``humans as sensors''. Further, we plan to investigate approaches to perform meta-sensing, namely to extrapolate the knowledge from physical sensors towards that of virtual elements (as an example, to infer the current building occupancy from correlated measurements of temperature and humidity dynamics). On the actuation side, we are likewise interested in engineering non-invasive minimalistic solutions, which are robust to uncertainty and performance-certified. The plan for this project is to make the first steps in this direction, based on recent results in the literature. The project can benefit from a visit to Honeywell Labs (Prague). Courses: Computer-Aided Formal Verification. Prerequisites: Some familiarity with dynamical systems.
Model learning and verification Alessandro Abate, Daniel Kroening C

This project will explore connections of techniques from machine learning with successful approaches from formal verification. The project has two sides: a theoretical one, and a more practical one: it will be up to the student to emphasise either of the two sides depending on his/her background and/of interests. The theoretical part will develop existing research, for instance in one of the following two inter-disciplinary domain pairs: learning & repair, or reachability analysis & Bayesian inference. On the other hand, a more practical project will apply the above theoretical connections on a simple models setup in the area of robotics and autonomy.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Precise simulations and analysis of aggregated probabilistic models Alessandro Abate C

This project shall investigate a rich research line, recently pursued by a few within the Department of CS, looking at the development of quantitative abstractions of Markovian models. Abstractions come in the form of lumped, aggregated models, which are beneficial in being easier to simulate or to analyse. Key to the novelty of this work, the proposed abstractions are quantitative in that precise error bounds with the original model can be established. As such, whatever can be shown over the abstract model, can be as well formally discussed over the original one.

This project, grounded on existing literature, will pursue (depending on the student's interests) extensions of this recent work, or its implementation as a software tool.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Safe Reinforcement Learning Alessandro Abate C

Reinforcement Learning (RL) is a known architecture for synthesising policies for Markov Decision Processes (MDP). We work on extending this paradigm to the synthesis of ‘safe policies’, or more general of policies such that a linear time property is satisfied. We convert the property into an automaton, then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the automaton. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is `constrained' by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP. We evaluate the performance of the algorithm on numerous numerical examples. This project will provide extensions of these novel and recent results.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Software development for abstractions of stochastic hybrid systems Alessandro Abate C

Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reach-avoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states.

The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.).

Courses: Probabilistic Model Checking, Probability and Computing, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with stochastic processes, working knowledge of MATLAB and C

Pebble games, monads and comonads in classical, probabilistic and quantum computation Samson Abramsky C Pebble games are an important and widely used tool in logic, algorithms and complexity, constraint satisfaction and database theory. The idea is that we can explore a pair of structures, e.g. graphs, by placing up to k pebbles on them, so we have a window of size at most k on the two structures. If we can always keep these pebbles in sync so that the two k-sized windows look the same (are isomorphic) then we say that Duplicator has a winning strategy for the k-pebble game. This gives a resource-bounded notion of approximation to graphs and other structures which has a wide range of applications. Monads and comonads are widely used in functional programming, e.g. in Haskell, and come originally from category theory. It turns out that pebble games, and similar notions of approximate or local views on data, can be captured elegantly by comonads, and this gives a powerful language for many central notions in constraints, databases and descriptive complexity. For example, k-consistency can be captured in these terms; another important example is treewidth, a key parameter which is very widely used to give “islands of tractability” in otherwise hard problems. Finally, monads can be used to give various notions of approximate or non-classical solutions to computational problems. These include probabilistic and quantum solutions. For example, there are quantum versions of constraint systems and games which admit quantum solutions when there are no classical solutions, thus demonstrating a “quantum advantage”. Monads and comonads can be used in combination, making use of certain “distributive laws”. The aim of this project is to explore a number of aspects of these ideas. Depending on the interests and background of the student, different aspects may be emphasised, from functional programming, category theory, logic, algorithms and descriptive complexity, probabilistic and quantum computation. Some specific directions include: 1. Developing Haskell code for the k-pebbling comonad and various non-classical monads, and using this to give a computational tool-box for various constructions in finite model theory and probabilistic or quantum computation. 2. Developing the category-theoretic ideas involved in combining monads and comonads, and studying some examples. 3. Using the language of comonads to capture other important combinatorial invariants such as tree-depth. 4. Developing the connections between category theory, finite model theory and descriptive complexity. Some background reading. 1. Leonid Libkin, Elements of finite model theory. (Background on pebble games and the connection with logic and complexity). 2. The pebbling comonad in finite mode theory. S. Abramsky, A. Dawar and P. Wang. (Technical report describing the basic ideas which can serve as a starting point.)
Projects for Samson Abramsky Samson Abramsky C Samson Abramsky is happy to supervise projects in the following areas: - study of nonlocality and contextuality in quantum information and beyond - sheaf theory and contextual semantics - applications of coalgebra in game theory and economics - intensional forms of recursion in computation theory and applications Please contact him to discuss any of these in more detail
Sheaf theoretic semantics for vector space models of natural language Samson Abramsky B 2017-18  C

Contextuality is a fundamental feature of quantum physical theories and one that distinguishes it from classical mechanics. In a recent paper by Abramsky and Brandenburger, the categorical notion of sheaves has been used to formalize contextuality. This has resulted in generalizing and extending contextuality to other theories which share some structural properties with quantum mechanics. A consequence of this type of modeling is a succinct logical axiomatization of properties such as non-local correlations and as a result of classical no go theorems such as Bell and Kochen-Soecker. Like quantum mechanics, natural language has contextual features; these have been the subject of much study in distributional models of meaning, originated in the work of Firth and later advanced by Schutze. These models are based on vector spaces over the semiring of positive reals with an inner product operation. The vectors represent meanings of words, based on the contexts in which they often appear, and the inner product measures degrees of word synonymy. Despite their success in modeling word meaning, vector spaces suffer from two major shortcomings: firstly they do not immediately scale up to sentences, and secondly, they cannot, at least not in an intuitive way, provide semantics for logical words such as `and', `or', `not'. Recent work in our group has developed a compositional distributional model of meaning in natural language, which lifts vector space meaning to phrases and sentences. This has already led to some very promising experimental results. However, this approach does not deal so well with the logical words.

The goal of this project is to use sheaf theoretic models to provide both a contextual and logical semantics for natural language.  We believe that sheaves provide a generalization of the logical Montague semantics of natural language which did very well in modeling logical connectives, but did not account for contextuality. The project will also aim to combine these ideas with those of the distributional approach, leading to an approach which combines the advantages of Montague-style and vector-space semantics.

Prerequisites

==========

The interested student should have taken the category theory and computational linguistics courses, or be familiar with the contents of these.

Classical simulation of quantum systems Jonathan Barrett C

Description

For many information theoretic tasks, an advantage can be gained if quantum systems are used as the basic carrier of information, rather than classical variables. For example, it may be the case that with quantum systems, fewer resources are required. The project will investigate the classical simulation of quantum systems in simple scenarios in which quantum systems are communicated from one party to another, or in which quantum systems are measured to produce correlated outcomes. The aim is to compare the resources required by the classical simulation with those required when quantum systems are used.

Prerequistes

Linear algebra. A student taking this project should also be taking the Quantum Computer Science course. Some extra reading to cover the basic formalism of quantum theory would be an advantage.

Device-independent quantum cryptography Jonathan Barrett C

Description

One of the most successful applications of quantum information science is quantum key distribution, which enables separated parties to send secret messages, with security guaranteed by the laws of quantum theory. The mysterious phenomenon of “quantum nonlocality”, wherein two quantum systems appear to influence one another even though they are separated in space, can be used to design a particularly strong kind of key distribution protocol. The idea is that the honest users do not need to trust that their quantum devices are behaving as advertised, or even that quantum theory is correct. The project will explore the relationship between different kinds of nonlocality and the possibilities for secure communication.

Prerequisites

Linear algebra. A student taking this project should also be taking the Quantum Computer Science course. Some extra reading to cover the basic formalism of quantum theory would be an advantage.

Optimization of Web Query Plans Michael Benedikt

This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions.

We will look at logic-based methods for analyzing query plans, taking into account integrity constraints that may exist on the data.

History of Science: Aligning the world of scholarship data from the Royal Society Reuben Binns B 2017-18  C One recent ambitious project from the Royal Society aims to digitise all transactions of the Royal Society over the years. As the earliest scientific journal publication in the world, this resource is an invaluable asset to the history of human scientific knowledge. However, given the historical span and sheer volume of this information, being able to uniquely and accurately identify each contributing fellow and their metadata is a real challenge. Furthermore, not all the contributors are Royal Society Fellows, and therefore their unique identity and metadata is not retained in the Royal Society knowledge. Both issues make the digitisation project a real challenge, to achieve high accuracy and coverage. This UG project will work together with the Royal Society, to design a novel algorithm that uses multiple features to align scholarship entities from different datasets, in order to enhance the quality of existing datasets and the coverage of the scholarship knowledgebase from the Royal Society
Mobile App X-Ray Specs Reuben Binns B 2017-18  C Smartphone applications often collect personal data and share it with various first and third party entities, for purposes like advertising and analytics. Using dynamic traffic analysis techniques, we have mapped many of the third-party data flows, including particular data types, from popular applications. The aim of this project would be to extend this work by deploying a mobile application that automatically reveals to the user what kinds of data are sent to whom via the apps they installed on their device. Alternatively, the project could look into extending our existing traffic analysis framework in order to scale up the analysis and improve the accuracy and coverage of the tracker detection.
Simulation / Learning Tool for Process Scheduling Stephen Cameron C

"In domains such as manufacturing there may be a large number of individual steps required to complete an overall task, with various constraints between the steps and finite availability of resources. For example, an aircraft may require hundreds of thousands of steps to build, with constraints like ""we cannot mount the engines before the wings"", and resources like the number of workers and pieces of key machinery. Scheduling software exists that takes the lists of steps, constraints, and resources and generates feasible schedules; that is, produces lists of which steps should be performed at what times. Given the complexity of the problem it is impractical to generate optimal schedules, but in general close to optimal schedules ('good schedules') can be generated in a reasonable time. However the choice of which good schedule to use is often determined by factors that are not known early in the process or are difficult to quantify, such as the layout of a factory or the temporary loss of a worker due to illness. The goal of this project is to take an existing scheduling program and a class of real-life industrial problems and to develop a simulation program that could assist a process engineer to visualise the similarities and differences between a small number of good schedules, and hence to interactively adjust the scheduling parameters in order to improve the schedule. For example the set of feasible schedules can be displayed as a graph with the steps as graph nodes, and a visualisation might show the progress of the schedule as annotations on the graph nodes, whilst providing a set of on-screen controls to adjust the scheduling parameters. The scheduling program itself is written in C++, but this does not constrain the simulation program to be written in a particular language. The skill-set required of a student taking this project would then be a mixture of two-dimensional graphics, plus a desire to find out more about two-dimensional animation and graphical user interface design. There is also the option of applying techniques from machine learning in order to automatically improve the schedule quality. The scheduling scenario to be used as an example in this project will be provided by an Oxford-based company who are interested in potentially applying these techniques in the future. On-line videos: https://www.promodel.com/solutionscafe/webinars/PCS_Quickstart/PCS_Quickstart.html gives a very simple example of the style of animation envisioned for this project, although we would expect the scheduling graph to be produced by the scheduling system rather than by hand as in this video. https://uk.mathworks.com/videos/simevents-for-operations-research-118566.html shows a Matlab extension (Simulink) on a more realistic example; this is without any animation but includes the use of machine learning in the form of a genetic algorithm."

Visualization for Process Scheduling Stephen Cameron C "In domains such as manufacturing there may be a large number of individual steps required to complete an overall task, with various constraints between the steps and finite availability of resources. For example, an aircraft may require hundreds of thousands of steps to build, with constraints like ""we cannot mount the engines before the wings"", and resources like the number of workers and pieces of key machinery.  Scheduling software exists that takes the lists of steps, constraints, and resources and generates feasible schedules; that is, produces lists of which steps should be performed at what times.  Given the complexity of the problem it is impractical to generate optimal schedules, but in general close to optimal schedules ('good schedules') can be generated in a reasonable time. However the choice of which good schedule to use is often determined by factors that are not known early in the process or are difficult to quantify, such as the layout of a factory or the temporary loss of a worker due to illness. The goal of this project is to take an existing scheduling program and a class of real-life industrial problems and to develop a visualisation program that could help an end-user picture the running of a particular schedule as a three-dimensional animation. The scheduling program itself is written in C++, but this does not constrain the visualisation program to be written in a particular language. The skill-set required of a student taking this project would primarily be in three-dimensional animation, either using their own code or by bolting onto an existing animation tool such as Blender. It should also be possible for the end-user to easily vary the context of the visualisation (i.e., the placement of the equipment in the three-dimensional world and its relationship the the output of the scheduling program). The scheduling scenario to be used as an example in this project will be provided by an Oxford-based company who are interested in potentially applying these techniques in the future. On-line video: An on-line example of such a visualisation is at https://www.youtube.com/watch?v=YWuM43yP09A, although this level of detail is not expected in the student project; rather we are seeking a proof of concept that integrates the scheduling program output directly. "
Information processing and conservation laws in general probabilistic theories Giulio Chiribella B 2017-18  C "Project description and suitability for different students considered. See the appendix. -areas in which I could supervise student projects in quantum information theory and quantum foundations. -general areas for student supervision: quantum information and foundations of quantum mechanics.
Quantum Computing and Quantum Information, Logic, Category Theory, Fundamental Physics Bob Coecke B 2017-18  C

Bob Coecke is willing to supervise projects in the following areas. Please feel free to contact him.

- Development and applications of the categorical quantum mechanics formalism and corresponding graphical languages

http://en.wikipedia.org/wiki/Categorical_quantum_mechanics

- Development of the quantomatic automated graphical reasoning software

https://sites.google.com/site/quantomatic/

- Theory development and empirical experiments for meaning in natural language processing http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf

http://www.cs.ox.ac.uk/industry/content/IndustryNewsSummer2011.pdf

Software development: quantomatic, automated graphical reasoning tools Bob Coecke B 2017-18  C Recent graph-based formalisms for computation provide an abstract and symbolic way to represent and simulate quantum information processing. Manual manipulation of such graphs is slow and error prone. This project employs a formalism, based on monoidal categories, that supports mechanised reasoning with open-graphs. This gives a compositional account of graph rewriting that preserves the underlying categorical semantics.
Kinect interface development for cybersecurity visual analytics tool Sadie Creese B 2017-18  C

(Joint with Michael Goldsmith) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces for security possible, depending on interest and inspiration.

 

 

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs Sadie Creese B 2017-18  C

(Joint with Michael Goldsmith)

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue.

Smartphone security Sadie Creese B 2017-18  C

(Joint with Michael Goldsmith)

Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.

 

Prerequisites:

Concurrency, Concurrent Programming, Computer Security all possibly an advantage.

Technology-layer social networks Sadie Creese C

(joint with Michael Goldsmith)

Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc.

Information Disclosure in Data Integration Bernardo Cuenca Grau, Michael Benedikt, Egor Kostylev C Data integration systems allow users to effectively access data sitting in multiple datasources (typically relational databases) by means of queries over a global schema. In practice, datasources often contain sensitive information that the data owners want to keep inaccessible to users. In a recent research paper, the project supervisors have formalized and studied the problem of determining whether a given data integration system discloses sensitive information to an attacker. The paper studies the computational properties of the relevant problems and also identifies situations in which practical implementations are feasible. The goal of the project is to design and implement practical algorithms for checking whether information disclosure can occur in a data integration setting. These algorithms would be applicable to the aforementioned situations for which practical implementations seem feasible. Prerequisites: Familiarity with Databases. The students would also benefit from taking the Knowledge Representation and Reasoning Course and/or Theory of Data and Knowledge Bases.
Voting for facility location Edith Elkind B 2017-18  C Description: suppose that people living in a given geographic area would like to decide where to place a certain facility (say, a library or a gas station). There are several possible locations, and each person prefers to have the facility as close to them as possible. However, the central planner, who will make the final decision, does not know the voters' location, and, moreover, for various reasons (such as privacy or the design of the voting machine), the voters cannot communicate their location. Instead, they communicate their ranking over the available locations, ranking them from the closest one to the one that is furthest away. The central planner then applies a fixed voting rule to these rankings. The quality of each location is determined by the sum of distances from it to all voters, or alternatively the maximum distance. A research question that has recently received a substantial amount of attention is whether classic voting rules tend to produce good-quality solutions in this setting. The goal of the project would be to empirically evaluate various voting rules with respect to this measure, both for single-winner rules and for multi-winner rules (where more than one facility can be opened). In addition to purely empirical work, there are interesting theoretical questions that one could explore, such as proving worst-case upper and lower bounds of the performance of various rules. Prerequisites: basic programming skills.
Form Corpus & Benchmark Tim Furche B 2017-18  C (Supervisor C Schallhart) Web pages are the past since interactive web application interfaces have reshaped the online world. With all their feature richness, they enrich our personal online experience and provide some great new challenges for research. In particular, forms became much complex in assisting the user during the _lling, e.g., with completion options, or through structuring the form _lling process by dynam-ically enabling or hiding form elements. Such forms are a very interesting research topic but their complexity prevented so far the establishment of a corpus of modern forms to benchmark di_erent tools dealing with forms automatically. This MSC project will _ll this gap in building a corpus of such forms: Based on a number of production sites from one or two domains, we will build our corpus of web interfaces, connected to a (shared) database. Not only will the future evaluations in the DIADEM project rely on this corpus, but we will also publish the corpus { promoting it as a common benchmark for the research community working on forms. Knowledge in Java, HTML, CSS, Javascript, and web application development are required.
Form Filling & Probing Tim Furche B 2017-18  C (Joint with C Schallhart) Unearthing the knowledge hidden in queryable web sites requires a good understanding of the involved forms. As part of DIADEM, we are developing OPAL (Ontology based web Pattern Analysis with Logic), a tool to recognize forms belonging to a parameterizable application domain, such as the real estate or used car market. OPAL determines the meaning of individual form elements, e.g., it identi_es the _eld for the minimum or maximum price or for some location. This MSC project will build upon OPAL to not only deal with static forms but also with sequences of interrelated forms, as in case of a rough initial form, followed by a re_nement form, or in case of forms showing some options only after _lling some other parts. Over the course of this MSC project, we will develop a tool which invokes OPAL to analyze a given form, to explore all available submission mechanisms on this form, analyze the resulting pages for forms continuing the initial query, and to combine the outcome all found forms into a single interaction description. Knowledge in Java, HTML, CSS are required, prior experience in logic programming would be a strong plus.
Benchmarks for Bayesian Deep Learning: Astrophysics Yarin Gal B 2017-18  C Bayesian deep learning (BDL) is a field of Machine Learning where we develop tools that can reason about their confidence in their predictions. A main challenge in BDL is comparing different tools to each other, with common benchmarks being much needed in the field. In this project we will develop a set of tools to evaluate Bayesian deep learning techniques, reproduce common techniques in BDL, and evaluate them with the developed tools. The tools we will develop will rely on downstream tasks that have made use of BDL in real-world applications such as parameter estimation in Strong Gravitational Lensing with neural networks. Prerequisites: only suitable for someone who has done Probability Theory, has worked in Machine Learning in the past, and has strong programming skills (Python).
Benchmarks for Bayesian Deep Learning: Diabetes Diagnosis Yarin Gal B 2017-18  C Bayesian deep learning (BDL) is a field of Machine Learning where we develop tools that can reason about their confidence in their predictions. A main challenge in BDL is comparing different tools to each other, with common benchmarks being much needed in the field. In this project we will develop a set of tools to evaluate Bayesian deep learning techniques, reproduce common techniques in BDL, and evaluate them with the developed tools. The tools we will develop will rely on downstream tasks that have made use of BDL in real-world applications such as detecting diabetic retinopathy from fundus photos and referring the most uncertain decisions for further inspection. Prerequisites: only suitable for someone who has done Probability Theory, has worked in Machine Learning in the past, and has strong programming skills (Python).
Small Data Challenge in Reinforcement Learning Yarin Gal B 2017-18  C Reinforcement learning (RL) algorithms often require large amounts of data for training, data which is often collected from simulations of experiments of robotics systems. The requirement for large amounts of data forms a major hurdle in using RL algorithms for tasks in robotics though, where each real-world experiment would cost time and potential damage to the robot. In this project we will develop a mock "Challenge" similar to Kaggle challenges. In this challenge we will restrict the amount of data a user can query the system at each point in time, and try to implement simple RL baselines under this constraint. We will inspect the challenge definition and try to improve it. Prerequisites: only suitable for someone who has worked in Machine Learning in the past, is familiar with Reinforcement Learning, and has strong programming skills (Python).
Data Science and Machine Learning Techniques for Model Parameterisation in Biomedical and Environmental Applications David Gavaghan, Martin Robinson, Michael Clerx, Sanmitra Ghosh C "Time series data arise as the output of a wide range of scientific experiments and clinical monitoring techniques. Typically the system under study will either be undergoing time varying changes which can be recorded, or the system will have a time varying signal as input and the response signal will be recorded.  Familiar everyday examples of the former include ECG and EEG measurements (which record the electrical activity in the heart or brain as a function of time), whilst examples of the latter occur across scientific research from cardiac cell modelling to battery testing. Such recordings contain valuable information about the underlying system under study, and gaining insight into the behaviour of that system typically involves building a mathematical or computational model of that system which will have embedded within in key parameters governing system behaviour. The problem that we are interested in is inferring the values of these key parameter through applications of techniques from machine learning and data science. Currently used methods include Bayesian inference (Markov Chain Monte Carlo (MCMC), Approximate Bayesian Computation (ABC)), and non-linear optimisation techniques, although we are also exploring the use of other techniques such as probabilistic programming and Bayesian deep learning. We are also interested in developing techniques that will speed up these algorithms including parallelisation, and the use of Gaussian Process emulators of the underlying models Application domains of current interest  include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. Application domains of current interest include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. " Prerequisites: some knowledge of Python
Cake-cutting with low envy Paul Goldberg C

Description: I-cut-you-choose is the classical way for two people to share a divisible good. For three people, there exists a sequence of operations using 5 cuts, that is also envy-free, but for 4 or more people, it is unknown whether you can share in an envy-free manner, using a finite number of cuts. (This is with respect to a well-known class of procedures that can be represented using a tree whose nodes are labelled with basic "cut" and "choose" operations.) The general idea of this project is to generate and test large numbers of potential cake-cutting procedures, and measure the "extent of envy" in the case where they are not envy-free. (See wikipedia's page on "cake-cutting problem".) It is of interest to find out the amount of envy that must exist in relatively simple procedures.

Prerequisites: competance and enthusiasm for program design and implementation; mathematical analysis and proofs.

Computing prices in a generalisation of the Product-Mix Auction Paul Goldberg C

The Product-Mix Auction was devised by Klemperer in 2008 for the purpose of providing liquidity to commercial banks during the financial crisis; it was used for a number of years by the Bank of England. See the following link:

https://www.nuffield.ox.ac.uk/economics/papers/2009/w6/BoeTarp28_7_09.pdf

It uses bids that represent the amounts a buyer is willing to pay for certain bundles of goods, and the "correct" prices (causing the available supply of goods to be released to the buyers) can be found by solving a linear program. The project investigates an extension to the original auction that allows buyers more flexibility to express their requirements. This extension, allowing "negative bids" to be made, allows a buyer to express any "strong substitutes" demand function. In this extension, the search for prices has the form of a sub modular minimisation problem, and the project envisages applying algorithms such as Fugishige-Wolfe to this challenge. We envisage applying algorithms to simulated data, and obtaining experimental results about their runtime complexity. We also envisage testing local-search heuristics.

Construction of finite automata from examples Paul Goldberg B 2017-18  C

Description: The aim of the project is to allow a user to construct a finite automaton (or alternatively, a regular expression) by providing examples of strings that ought to be accepted by it, in addition to examples that ought not to be accepted. An initial approach would be to test simple finite automata against the strings provided by the user; more sophisticated approaches could be tried out subsequently. One possibility would be to implement an algorithm proposed in a well-known paper by Angluin, "Learning regular sets from queries and counterexamples".

Prerequisites: competance and enthusiasm for program design and implementation; familiarity with finite automata, context-free langauges etc.; mathematical proofs.

Contest design Paul Goldberg C

Description

A "contest" refers to a widespread form of competition that has attracted a huge literature in Economics. In a contest, players expend costly effort that translate to some form of output, or score. Then they (or some of them) receive prizes associated with their ranking on output/score. The project would study a class of contests considered in a paper "Ranking games that have competitiveness-based strategies". One aspect of the project would be to study the convergence properties of the Fictitious Play procedure, applied to these games. We also envisage addressing, experimentally, the challenge of handicapping. Handicapping involves ranking the players on the values of monotonic functions of their outputs, rather than just raw outputs, in order to elicit greater effort; the challenge is to choose the best functions. The project is experimental, with some scope for analytical work (specifically, for the problem of optimal handicapping in the 2-player case).

Prerequisites

familiarity with basic probability theory.

Rank aggregation Paul Goldberg B 2017-18  C

Description: In social choice theory, a general theme is to take a set of rankings of a set of candidates (also known as alternatives) and compile an "overall" ranking that attempts to be as close as possible to the individual rankings. Each individual ranking can be thought of as a vote that we want to compile into an overall decision. The project involves taking some real-world data, for example university league tables, and computing aggregate rankings to see which individual votes are closest to the consensus. In the case of the Kemeny consensus, which is an NP-complete rank aggregation rule, it is of interest to exploit heuristics that may be effective on real-world data, and see for how large a data set can the Kemeny consensus be computed.

Prerequisites: familiarity with polynomial-time algorithms, NP-hardness; interest in computational experiments on data

Rock-paper-scissors for the computerised strategist Paul Goldberg C

In the well-known game of rock-paper-scissors, it is clear that any player can "break even" by playing entirely at random. On the other hand, people do a poor job of generating random numbers, and expert players of the game can take advantage of predictable aspects of opponents' behaviour. In this project, we envisage designing algorithms that adapt to human opponent's behaviour, using for example no-regret learning techniques, and modelling the opponent as a probabilistic automaton. Ideally, the student taking this project should manage to persuade some volunteers to test the software! This is needed to provide data for the algorithms, and should provide feedback to the opponent on what mistakes they are making. Depending on how it goes, we could also consider extending this approach to related games.

Prerequisite: Interest in working with probability is important.

A Conceptual Model for Assessing Privacy Risk Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C Privacy is not a binary concept, the level of privacy enjoyed by an individual or organisation will depend upon the context within which it is being considered; the more data at attacker has access to the more potential there may be for privacy compromise. We lack a model which considers the different contexts that exist in current systems, which would underpin a measurement system for determining the level of privacy risk that might be faced. This project would seek to develop a prototype model – based on a survey of known privacy breaches and common practices in data sharing. The objective being to propose a method by which privacy risk might be considered taking into consideration the variety of (threat and data-sharing) contexts that any particular person or organisation might be subjected to. It is likely that a consideration of the differences and similarities of the individual or organisational points of view will need to be made, since the nature of contexts faced could be quite diverse.
Computer Vision for Physical Security Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

Computer Vision allows machines to recognise objects in real-world footage. In principle, this allows machines to flag potential threats in an automated fashion based on historical and present images in video footage of real-world environments. Automated threat detection mechanisms to help security guards identify threats would be of tremendous help to them, especially if they have to temporarily leave their post, or have to guard a significant number of areas. In this project, students are asked to implement a system that is able to observe a real environment over time and attempt to identify potential threats, independent of a number of factors, e.g. lighting conditions or wind conditions. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. One approach might be to implement a camera system using e.g. a web camera or a Microsoft Kinect to conduct anomaly detection on real-world environments, and flag any issues related to potential threats.

Requirements: Programming skills required

Considering the performance limiters for controls Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

The behavior of controls will be dependent upon how they are used. One obvious example being the protection afforded by a firewall is dependent upon the maintenance of the rules that determine what the firewall stops and what it does not. The benefit of various technical controls in operational context lacks good evidence and data, so there is scope to consider the performance of controls in a lab environment. This mini-project would select one or more controls from the CIS Top 20 Critical Security Controls (CSC) (version 6.1) and seek to develop laboratory experiments (and implement them) to gather data on how the effectiveness of the control is impacted by its deployment context (including, for example, configuration, dependence on other controls, threat faced).

Requirements: Students will need an ability to develop a test-suite and deploy the selected controls.

Considering where residual risk may result from differences in standards and regulatory requirements Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

While these controls might be advantageous in many regards, they are often up against changing regulatory frameworks which place differing demands with respect to information security and privacy. One potential challenge is where these security control sets suggest best practice that is not in-line with regulatory requirements in the company’s industry or jurisdiction. This would lead to companies following key standards but failing to reach requisite levels of security. This is especially the case where standards suggest that certain controls are optional, when actually, they may be critical for a certain locale. The aim of this project will be to consider these issues with special emphasis on the CIS Top 20 Critical Security Controls (CSC) (version 6.1), and their context of use within Europe – particularly with the existing Data Protection Act and upcoming General Data Protection Regulation. Version 6.1 is of interest given that it is structured to have ‘foundational’ and ‘advanced’ controls, which allow companies flexibility that might not actually be afforded with current regulation in mind.

Cybersecurity visualization Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

Cybersecurity visualization helps analysts and risk owners alike to make better decisions about what to do when the network is attacked. In this project the student will develop novel cybersecurity visualizations. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the visualizations they develop. These projects tend to have a focus on network traffic visualization, but the student is encouraged to visualize datasets they would be most interested in. Other interesting topics might for instance include: host-based activities (e.g. CPU/RAM/network usage statistics), network scans (incl. vulnerabilities scanning). Past students have visualized network traffic patterns and android permission violation patterns. Other projects on visualizations are possible (not just cybersecurity), depending on interest and inspiration.

Requirements: Programming skills required.

Data-poisoning Michael Goldsmith B 2017-18  C How worried should we be about a malign entity deliberately changing or poisoning our data? Is there risk? What is its nature? What harms might be effected? Can we build data-analytics that are resistant to such attacks, can we detect them? It is highly unlikely that techniques for handling erroneous data will be sufficient since we are likely to face highly targeted data-corruption.
Design /Architectural Vulnerability analysis of Distributed Ledgers Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

This project would seek to study the general form of distributed ledgers, and the claimed nuances and general for in implementations, and assess all the possible week points that might make implementations open to compromise. The general approach will be to develop a detailed understanding of the security requirements and inter-dependencies of functionality – capturing the general security case for a distributed ledger and how it decomposes into lower level security requirements. Then an assessment will be made of each, and the potential for vulnerabilities in design and implementation considered. The ultimate result being a broad analysis of potential weak-points. If possible these will then be practically investigated in a lab-based environment. One output might be a proposal for testing strategies.

Experimenting with anomaly detection features for performance Michael Goldsmith, Ioannis Agrafiotis, Sadie Creese, Jason Nurse, Arnau Erola B 2017-18  C Considering relative detection performance using different feature sets, and different anomalies of interest, in the face of varying attacks. Conducted with a view to exploring the minimal sets that would result in the threat detection, and producing guidance that is aimed at determining the critical datasets required for the control to be effective.
Formal threat and vulnerability analysis of a distributed ledger model Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

This project would utilise the process algebra CSP and associated model checker FDR to explore various types of threat and how they might successfully compromise a distributed ledger. This type of modelling would indicate possible attacks on a distributed ledger, and could guide subsequent review of actual designs and testing strategies for implementations. The modelling approach would be based on the crypto-protocol analysis techniques already developed for this modelling and analysis environment, and would seek to replicate the approach for a distributed ledger system. Novel work would include developing the model of the distributed ledger, considering which components are important, formulating various attacker models and also formulating the security requirements / properties to be assessed using this model-checking based approach.

Requirements: In order to succeed in this project students would need to have a working knowledge of the machine readable semantics of CSP and also the model checker FDR. An appreciation of threat models and capability will need to be developed.

High Dynamic Range Imaging for Documentation Applications Michael Goldsmith, Jassim Happa B 2017-18  C High dynamic range imaging (HDRI) allows more accurate information about light to be captured, stored, processed and displayed to observers. In principle, this allows viewers to obtain more accurate representations of real-world environments and objects. Naturally, HDRI would be of interest to museum curators to document their objects, particularly, non-opaque objects or whose appearance significantly alter dependent on amount of lighting in the environment. Currently, few tools exist that aid curators, archaeologists and art historians to study objects under user-defined parameters to study those object surfaces in meaningful ways. In this project the student is free to approach the challenge as they see fit, but would be expected to design, implement and assess any tools and techniques they develop. The student will then develop techniques to study these objects under user-specified conditions to enable curators and researchers study the surfaces of these objects in novel ways. These methods may involve tone mapping or other modifications of light exponents to view objects under non-natural viewing conditions to have surface details stand out in ways that are meaningful to curators.
High Dynamic Range Imaging for Physical Security Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

High dynamic range imaging (HDRI) allows more accurate information about light to be captured, stored, processed and displayed to observers. In principle, this allows viewers to obtain more accurate representations of real-world environments. Naturally, HDRI would be of interest to security personnel who use Closed-Circuit Television (CCTVs) to identify potential threats or review security footage. Conceivably, it may be possible to identify threats or activities in shadows or overexposed areas. Another example being able to identify facial features better of someone who is wearing a hoodie. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. One approach might be to implement an HDR viewer, then conduct a user study and in which participants attempt to identify how well low dynamic range content viewing compares to HDR viewing set in a physical security context. Another approach might be to implement an HDR viewer that changes exposures based on where the viewer is looking. We have eyetrackers at our disposable that students would be able to use as part of their assessment. We will also be able to provide training for the student, so they are able to use the eyetracking tools themselves.

Requirements: Programming skills required

Human-computer interaction using motion gestures Michael Goldsmith, Jassim Happa B 2017-18  C Motion-gesture peripherals have become popular in recent years, with Leap motion, Myo and Kinect being a few examples. In this project the student will develop an application to use such peripherals. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the tools they develop. The tool would also need to serve a meaningful purpose. These projects tend to have a strong focus on human-computer interaction elements, particularly on designing and implementing user-friendly and meaningful motion gestures for a variety of real-world applications. Past students for instance have developed support for Myo on Android devices (so one does not have to touch a tablet screen while cooking) as well as added leap motion and Kinect support to cyber security visualization tools. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces possible, depending on interest and inspiration.
International Cybersecurity Capacity Building Initiatives Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C There is a large investment being made by the international community aimed at helping nations and regions to develop their capacity in cybersecurity. The work of the Global Cyber Security Capacity Centre (based at the Oxford Martin School) studies and documents this: https://www.sbs.ox.ac.uk/cybersecurity-capacity/content/front. There is scope to study in more detail the global trends in capacity building in cybersecurity, the nature of the work and the partnerships that exist to support it. An interesting analysis might be to identify what is missing (through comparison with the Cybersecurity Capacity Maturity Model, a key output of the Centre), and also to consider how strategic, or not, such activities appear to be. An extension of this project, or indeed a second parallel project, might seek to perform a comparison of the existing efforts with the economic and technology metrics that exist for countries around the world, exploring if the data shows any relationships exist between those metrics and the capacity building activities underway. This analysis would involve regression techniques.
Kinect interface development for cybersecurity visual analytics tool Michael Goldsmith B 2017-18  C

(Joint with Sadie Creese) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces for security possible, depending on interest and inspiration.

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs Michael Goldsmith B 2017-18  C (Joint with Sadie Creese) Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue.
Penetration testing for harm Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C Current penetration testing is typically utilised for discovering how organisations might be vulnerable to external hacks, and testing methods are driven by using techniques determined to be similar to approaches used by hackers. The result being a report highlighting various exploitable weak-points and how they might result in unauthorised access should a malign entity attempt to gain access to a system. Recent research within the cybersecurity analytics group has been studying the relationship between these kinds of attack surfaces and the kinds of harm that an organisation might be exposed to. An interesting question would be whether an orientation around intent, or harm, might result in a different test strategy; would a different focus be given to the kinds of attack vectors explored in a test if a particular harm is aimed at. This mini-project would aim to explore this question by designing penetration test strategies based on a set of particular harms, and then seek to consider potential differences with current penetration practices by consultation with the professional community. Requirements: Students will need to have a working understanding of penetration testing techniques.
Photogrammetry Michael Goldsmith, Jassim Happa B 2017-18  C Photogrammetry is a set of techniques that allows for 3D measurements from 2D photographs, especially those measurements pertaining to geometry or surface colours. The purpose of this project is to implement one or more photogrammetry techniques from a series of 2D photographs. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the tool they develop.
Predicting exposure to risk for active tasks Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

Prior research has been considering how we might better understand and predict the consequences of cyber-attacks based on knowledge of the business processes, people and tasks and how they utilise the information infrastructure / digital assets that might be exposed to specific attack vectors. However, this can clearly be refined by moving to an understanding of those tasks live or active at the time of an attack propagating across a system. If this can be calculated, then an accurate model of where risk may manifest and the harm that may result can be constructed. This project would explore the potential for such a model through practical experimentation and development of software monitors to be placed on a network aimed at inferring the tasks and users that are active based from network traffic. If time allows then host-based sensors might also be explored (such as on an application server) to further refine the understanding of which users and live on which applications etc.

Requirements: Students must be able to construct software prototypes and have a working knowledge of network architectures and computer systems.

Procedural Methods in Computer Graphics Michael Goldsmith, Jassim Happa B 2017-18  C

Procedural methods in computer graphics help us develop content for virtual environments (geometry and materials) using formal grammars. Common approaches include fractals and l-systems. Examples of content may include the creation of cities, planets or buildings. In this project the student will develop an application to use create content procedurally. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the methods they develop. These projects tend to have a strong focus designing and implementing existing procedural methods, but also includes a portion of creativity. The project can be based on reality - e.g. looking at developing content that has some kind of basis on how the real world equivalent objects were created (physically-based approaches), or the project may be entirely creative in how it creates content. Past students for instance have built tools to generate cities based on real world examples and non-existent city landscapes, another example include building of procedural planets, including asteroids and earth-like planetary bodies.

Reflectance Transformation Imaging Michael Goldsmith, Jassim Happa B 2017-18  C Reflectance Transformation Imaging (RTI) is a powerful set of techniques (the first of which known as Polynomial Texture Maps, PTMs) that enables us to capture photographs of objects under a several lighting conditions. Combined, these RTI images form a single photograph in which users can relight these objects by moving the light sources around the hemisphere in front of the object, but also specify user-defined parameters, including removing colour, making the objects more specular or diffuse in order to investigate the surface details in depth. It can be used for forensic investigation of crime scenes as well as cultural heritage documentation and investigation. The purpose of this project is to implement RTI methods of their preference.
Smartphone security Michael Goldsmith B 2017-18  C

(Joint with Sadie Creese) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.

Prerequisites:

Concurrency, Concurrent Programming, Computer Security all possibly an advantage.

Technology-layer social networks Michael Goldsmith C (Joint with Sadie Creese) Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc.
Trip-wires Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C At Oxford we have developed a framework for understanding the components of an attack, and documenting known attack patterns can be instrumental in developed trip-wires aimed at detecting the presence of insiders in a system. This project will seek to develop a library of such trip-wires based on a survey of openly documented and commented upon attacks, using the Oxford framework. There will be an opportunity to deploy the library into a live-trial context which should afford an opportunity to study the relative utility of the trip-wires within large commercial enterprises. The mini-project would also need to include experimenting with the trip-wires in a laboratory environment, and this would involve the design of appropriate test methods
Understanding Enterprise Infrastructure Dependencies Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

There are many tools available for detecting and monitoring cyber-attacks based on network traffic and these are accompanied by a wide variety of tools designed to make alerts tangible to security analysts. By comparison, the impact of these attacks on an organisational level has received little attention. An aspect that could be enhanced further is the addition of a tool facilitating management and updating of our understanding of business processes, but also how those processes are dependent on a network infrastructure. This tool could facilitate the mapping between company strategies, activities needed to accomplish company goals and map these down to the network and people assets. At the top of the hierarchy lies the board, responsible for strategic decisions. These decision are interpreted in the managerial level and could be captured and analysed with business objective diagrams. These diagrams in return could be refined further to derive business processes and organisational charts, ensuring that decision made in the top level will be enforced in the lower levels. The combination of business processes and organisation charts could eventually provide the network infrastructure. For this project we suggest a student could develop novel algorithms for mapping of business processes to network infrastructures in an automated way (given the updated business process files). That said, the student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. Other projects on business process modelling also possible, depending on interest and inspiration.

Visualising attack patterns Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jason Nurse B 2017-18  C

The use of visualisation techniques to better detect attacks and predict the harm that might result is a common approach. This project will provide the student with sample data-sets so that a novel visualisation might be developed (using open-source libraries) of one or more insider attacks. The aim being to help an analyst understand the nature of the attack and help identify appropriate response techniques.

Bisimilarity in Logics for Strategic Reasoning Julian Gutierrez C

Nash equilibrium is the standard solution concept for multi-player games. Such games have multiple applications in Logic and Semantics, Artificial Intelligence and Multi-Agent Systems, and Verification and Computer Science.  Unfortunately, Nash equilibria is not preserved under bisimilarity, one of the most important behavioural equivalences for concurrent systems. In a recent paper it was shown that this problem may not arise when certain models of strategies are considered. In this project the aim is to investigate further implications of considering the new model of strategies. For instance, whether a number of logics for strategic reasoning become invariant under bisimilarity if the new model of strategies is considered, whether some logics that are unable to express Nash equilibria can do so with respect to the new model of strategies, and whether the results already obtained still hold more complex classes of systems, for instance, where nondeterminism has to be considered.

Prerequisites: Discrete Mathematics, Introduction to Formal Proof, Logic and Proof

Desirable: Computer-Aided Formal Verification, Computational Complexity

Project type: theory

Towards Tractable Strategic Reasoning in Strategy Logic Julian Gutierrez C

Strategy Logic (SL) is a temporal logic to reason about strategies in multi-player games. Such games have multiple applications in Logic and Semantics, Artificial Intelligence and Multi-Agent Systems, and Verification and Computer Science. SL is a very powerful logic for strategic reasoning -- for instance, Nash equilibria and many other solution concepts in game theory can be easily expressed -- which has an undecidable satisfiability problem and a non-elementary model checking problem. In this project, the aim is to study fragments of SL which can potentially have better results (decidability and complexitity) with respect to the satisfiability and model checking problems. The fragments to be studied can be either syntactic fragments of the full language or semantic fragments where only particular classes of models are considered.

Prerequisites: Discrete Mathematics, Introduction to Formal Proof, Logic and Proof

Desirable: Computer-Aided Formal Verification, Computational Complexity

Project type: theory

Parallel Algorithms for Computing Hilbert Bases Christoph Haase B 2017-18  C Given a homogeneous system of linear equations A x = 0, a Hilbert basis is a unique finite minimal set of non-negative solutions from which every non-negative solution of the system can be generated. Computing Hilbert bases is a fundamental problem encountered in various areas in computer science and mathematics, for instance in decision procedures for arithmetic theories, the verification of infinite-state systems and pure combinatorics. In this project. we plan to revisit an approach to computing Hilbert bases described in [1] that is highly parallelizable. With the ubiquity of multi-core architectures, it seems conceivable that, with proper engineering, this approach will outperform existing approaches. The goal of this project is to deliver an implementation of the algorithm of [1] and benchmark it against competing tools. If successful, an integration into the SageMath platform could be envisioned. [1] https://pdfs.semanticscholar.org/6e84/f9ccfdaa37cb33cfc2024aec1dc7d13964c7.pdf Prerequisites: good knowledge of concurrent programming and data structures, and linear algebra
The Complexity of Deciding Whether Ordering is Necessary in a Presburger Formula Christoph Haase C It has been shown in [1] that the sets of integers definable in the first-order theory of the integers with addition and order, FO(Z,+,<=), is a strict superset of those sets of integers definable when only equality instead of order is allowed (i.e. those definable in FO(Z,+,=)). The goal of this project is to determine the computational complexity of deciding whether a given sentence of FO(Z,+,<=) is definable in FO(Z,+,=). [1] C. Choffrut and A. Frigeri. Deciding whether the ordering is necessary in a Presburger formula. Discrete Mathematics and Theoretical Computer Science, 12(1):20–37, 2010.  URL https://www.dmtcs.org/dmtcs-ojs/index.php/dmtcs/article/view/1300/0.html.
The Complexity of TensorFlow Christoph Haase, Stefan Kiefer B 2017-18  C

TensorFlow is an open-source library for machine learning released by Google in 2015. Though widely used for modelling neural networks, TensorFlow actually allows for expressing any computation that can be represented as a data flow graph. In computational complexity, such graphs have been studied for a long time in the context of arithmetic circuits. The combination of techniques from circuit complexity theory as well as number theory have in recent years led to algorithms that allow for evaluating arithmetic circuits more efficiently than standard methods. The goal of this project is to investigate whether we can apply those techniques in order to efficiently evaluate (subclasses of) dataflow graphs present in TensorFlow. To this end, we plan to develop a formal model of TensorFlow data flow graphs, and to analyse the computational complexity of evaluating such graphs and computing gradients.

Prerequisites: Computational Complexity and a robust background in mathematics

The Complexity of the First-Order-Theory of the Integers with Addition Christoph Haase B 2017-18  C The goal of this project is to determine the computational complexity of the first-order theory of the integers with addition and equality FO(Z,+,=), but without order. This theory is a strict fragment of Presburger arithmetic, an arithmetic theory that finds numerous applications, for instance, in the verification of infinite-state systems. We aim for making an original scientific contribution by determining what time and space resources are needed in order to decide a sentence in FO(Z,+,=). If time permits, an implementation of the decision procedure developed in this project could be envisioned. Prerequisites: good familiarity with first-order logic, linear algebra and computational complexity
Bioinformatics Projects Jotun Hein B 2017-18  C

Bioinformatics Projects can be found here.

If you choose any of these projects you must find a joint supervisor from the Department of Computer Science. Pete Jeavons can advise on a suitable choice.

Compilation of a CSP-like language Geraint Jones B 2017-18  C

This is a compiler project, also requiring familiarity with concurrency.

The parallel programming language occam is essentially an implementable sublanguage of CSP. The aim of this project is to produce a small portable implementation of a subset of occam; the proposed technique is to implement a virtual machine based on the inmos transputer, and a compiler which targets that language.

One of the aims of the project is to implement an extension to occam which permits recursion; more ambitious projects might implement a distributed implementation with several communicating copies of the virtual machine. Other possibilities are to produce separate virtual machines, optimised for displaying a simulation, or for efficiency of implementation, or translating the virtual machine code into native code for a real machine.

Logic circuit workbench Geraint Jones B 2017-18  C

This is an interactive programming project with some logic simulation behind it.

The idea is to produce a simulator for a traditional logic breadboard. The user should be able to construct a logic circuit by drawing components from a library of standard gates and latches and so on, and connecting them together in allowable ways. It should then be possible to simulate the behaviour of the circuit, to control its inputs in various ways, and to display its outputs and the values of internal signals in various ways.

The simulator should be able to enforce design rules (such as those about not connecting standard outputs together, or limiting fan-out) but should also cope with partially completed circuits; it might be able to implement circuits described in terms of replicated sub-circuits; it should also be able to some sort of standard netlist.

It might be that this would make a project for two undergraduates: one implementing the interface, the other implementing a simulator that runs the logic.

Modelling of arithmetic circuits Geraint Jones B 2017-18  C

This is a project in the specification of hardware, which I expect to make use of functional programming.

There is a great deal of knowledge about ways (that are good by various measures) of implementing standard arithmetic operations in hardware. However, most presentations of these circuits are at a very low level, involving examples, diagrams, and many subscripts, for example these descriptions.

The aim of this project is to describe circuits like this in a higher-level way by using the higher order functions of functional programming to represent the structure of the circuit. It should certainly be possible to execute instances of the descriptions as simulations of circuits (by plugging in simulations of the component gates), but the same descriptions might well be used to generate circuit netlists for particular instances of the circuit, and even to produce the diagrams.

Mosaic player Geraint Jones B 2017-18

This is an interactive graphics programming project.

This is an idea based on a child's toy, consisting of a collection of acrylic pieces. There are several colours, and two shapes: isosceles right-angled triangles, and rhombuses with 45 degree and 135 degree angles and sides to match the equal sides of the triangles. The player arranges (all) the pieces in one of the many ways that produce a square of the right size, with a pattern that has several (reflectional or rotational) symmetries.

The project is to design and build a small program that can be used to simulate the play: perhaps the user specifies symmetries, and then places a single piece which is replicated in several places as required by the symmetry.

There is a small subtlety in that the target square has a side which is an irrational number of the unit length (edge of the rhombus), so there is no integer grid on whcih the pieces lie. (It is very easy to make a mistake with the real toy which almost produces a square of the wrong size.)

Toys for Animating Mathematics Geraint Jones B 2017-18  C

The aim is to take some mathematics that would be within the grasp of a mathematically-inclined sixth-former and turn it into some attention-grabbing web pages. Ideally this reveals a connection with computing science. I imagine that this project necessarily involves some sort of animation, and I have visions of Open University television maths lectures.

The programming need not be the most important part of this project, though, because some of the work is in choosing a topic and designing problems and puzzles and the like around it. There's a lot of this sort of thing about, though, so it would be necessary to be a bit original.

Think of GeomLab (and then perhaps think of something a little less ambitious). It might involve logic and proof, it might be about sequences and series, it might be about graphs, it might be about the mathematics of cryptography... you might have something else in mind.

Translating OWL Ontologies into Rules Mark Kaminski C

"Background: Many reasoning tasks for expressive ontology languages such as OWL 2 DL can be reduced to or approximated by rule-based reasoning in formalisms such as OWL 2 RL, Datalog, or Disjunctive Datalog. A key step of any such reduction is a structural transformation of the original ontology into a set of rules in the target formalism that are in a certain sense equivalent to the ontology. The goal of the project is to implement an algorithm for transforming OWL 2 DL ontologies into rules that can then serve as input to Datalog and OWL 2 RL reasoners, as well as ASP solvers. 

Prereaquisites: * Good programming skills in Java *

                         The following courses are relevant for this project: - Logic and Proof - Knowledge Representation and Reasoning "

Effect of Noise in NN-Training Varun Kanade B 2017-18  C One of the interesting approaches to reducing overfitting in neural networks is to add noise to the inputs and activations before performing a gradient step. The key insight is that this noise injection prevents the learnt weights from being too delicately balanced to fit the data; some kind of robustness is necessary to fit noisy data. Another interesting consequence of noisy data is that recent work shows that learning algorithms using noisy data may be better at protecting privacy of the data. Thus, there may be twin advantages to this approach. This project will involve understanding the backgroud in this topic, performing simulations to understand the behaviour and hopefully developing new theories. This project may involve collaboration with Mr. Alexis Poncet and Dr. Thomas Steinke.
Evolution as Computational Learning Varun Kanade B 2017-18  C Evolution is basically a form of learning, where the search happens through variation caused by mutations, recombination and other factors, and (natural) selection is a feedback mechanism. There has been much recent work in understanding evolution through a computational lens. One of the fundamental building blocks of life is circuits where the production of protein is controlled by other ones (transcription factors); these circuits are known as transcription networks. Mathematical models of transcription networks have been proposed using continuous-time Markov processes. The focus of the project is to use these models to understand the expressive power of these networks and whether simple evolutionary algorithms, through suitably guided selection, can result in complex expressive patterns. The work will involve both simulations and theory.
Predicting Election Results Using Social Media & Fundamentals Varun Kanade B 2017-18  C With the upcoming French elections and the mixed performance of polls in recent months in mind, the goal of this project will be to see to what extent fundamentals, such as economic indicators, demographic indicators, incumbents, etc. and the use of social media such as twitter, facebook etc. can be used to predict election results. The project will involve a survey of past methods used, as well as data collection and model fitting. This project is open-ended (and hence potentially risky); the main aim is to get good results using historical and current data. It would be helpful if the student has good programming experience as wel as knowledge of various different machine learning techniques. The project will involve collabortion with Dr. Vincent Cohen-Addad (Copenhagen)
A fast numerical solver for multiphase flow David Kay B 2017-18  C

The Allen-Cahn equation is a differential equation used to model the phase separation of two, or more, alloys. This model may also be used to model cell motility, including chemotaxis and cell division. The numerical approximation, via a finite difference scheme, ultimately leads to a large system of linear equation. In this project, using numerical linear algebra techniques, we will develop a computational solver for the linear systems. We will then investigate the robustness of the proposed solver.

Efficient solution of one dimensional airflow models David Kay B 2017-18  C In this project we will investigate the use of numerical and computational methods to efficiently solve the linear system of equations arising from a one dimensional airflow model within a network of tree like branches. This work will build upon the methods presented within the first year Linear Algebra and Continuous Maths courses. All software to be developed can be in the student’s preferred programming language.  General area: Design and computational implementation of fast and reliable numerical models arising from biological phenomena.
The Square Root Law of Steganography: Empirical Validation Andrew Ker B 2017-18  C Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media (in this project: images). Steganalysis is the art of detecting that hiding took place. A key question is how the amount of information that can be securely hidden (i.e. such that detectors have a high error rate) scales with the size of the cover. In 2008 I co-authored a paper showing that my theoretical "square root law" was observed experimentally, using state-of-the-art (for 2008) hiding and detection methods. This project is to run similar experiments using methods 10 years more modern. It would involve combining off-the-shelf code (some in MATLAB, some in Python) from various researchers and running fairly large scale experiments to measure detection accuracy versus cover size and payload size in tens of thousands of images, then graphing the results suitably. Prerequisites: No particular prerequisites. Ability to piece together others' code and draw graphs nicely.
Transparent Session-Layer Steganography Andrew Ker B 2017-18  C Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. This project is to implement a transparent proxy which uses something like a webcam video stream to hide steganographic packets. A local process should receive communication on a local socket and merge it with the webcam data stream, a process which can be reversed at the receiver. Prerequisites: It is necessary to have some experience working with video codecs, for example experience contributing to the H.264/5 codec, in order to place the payload in a webcam stream. It is definitely NOT possible to learn the format within the time available. Don't ask if it is possible to read up on it instead: it isn't.
Transparent session-layer steganography Andrew Ker B 2017-18  C

Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. This project is to implement a transparent proxy which uses something like a webcam video stream to hide steganographic packets. A local process should receive communication on a local socket and merge it with the webcam data stream, a process which can be reversed at the receiver.

Prerequisites: It is necessary to understand something of video formats, for example experience with working on the H.264/5 codec, in order to place the payload in a webcam stream. It is definitely NOT possible to learn the format within the time available. Don't ask if it is possible to read up on it instead: it isn't.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Ker but should note that the response may be delayed as he is on sabbatical.

Twitter Pictures as a Steganographic Channel Andrew Ker B 2017-18  C Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. Most pure research focuses on bitmap or JPEG images, or simple video codes. In practice, there are often further constraints: the carrier might recompress or damage the object. This project is primiarly programming. First we must characterize the properties of image transmission through Twitter, and then provide an implementation of image steganography through it. This might be a complete piece of software with a nice interface, if the channel is straightforward, or a proof-of-concept if the channel causes difficult noise (in which case we will need suitable error-correcting codes.) Prerequisites: Linear algebra. You will need a Twitter account and to learn the Twitter API without help from me. It would be useful to know a little about the JPEG image format before starting the project.
Visualiation of Steganalysis Andrew Ker B 2017-18  C Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media (in this project: images). Steganalysis is the art of detecting that hiding took place. The most effective ways to detect steganography are machine learning algorithms applied to "features" extracted from the images, trained on massive sets of known cover and stego objects. The images are thus turned into points in high-dimensional space. We have little intuition as to the geometrical structure of the features (do images form a homogeneous cluster? do they scale naturally with image size? how much and in what ways do images from different cameras differ?), or how they are altered under embedding (do they move in broadly the same direction? is there is linear separator of cover from stego?). This is a programming project that creates a visualization tool for features, extracting them from images and then projecting them onto 2-dimensional space in interesting ways, while illustrating the effects of embedding. Prerequisites: Linear Algebra, Computer Graphics. Machine Learning an advantage.
An environment for board game competitions in Scala suitable for parts B and C Stefan Kiefer B 2017-18  C The goal of this project is to set up an environment that allows for machine-vs-machine competitions in the board game Hex. The long-term objective of this project is to provide a reusable interface where future students can pit their game-playing engines against each other. The focus here is on good object-oriented design, reusability, easy-to-use interfaces for game engines, and an appealing graphical user interface. The student should also create two (possibly simple) engines to test the software. Time permitting, those engines might involve advanced techniques in order to play better. Prerequisites: good Scala skills
Bots for a Board Game Stefan Kiefer B 2017-18  C The goal of this project is to develop bots for the board game Hex. In a previous project, an interface was created to allow future students to pit their game-playing engines against each other. In this project the goal is to program a strong Hex engine. The student may choose the algorithms that underly the engine, such as alpha-beta search, Monte-Carlo tree search, or neural networks. The available interface allows a comparison between different engines. It is hoped that these comparisons will show that the students' engines become stronger over the years. The project involves reading about game-playing algorithms, selecting promising algorithms and datastructures, and design and development of software (in Java or Scala).
Small matrices for irrational nonnegative matrix factorisation Stefan Kiefer B 2017-18  C Recently it was shown (https://arxiv.org/abs/1605.06848) that there is a 6x11-matrix M with rational nonnegative entries so that for any factorisation M = W H (where W is a 6x5-matrix with nonnegative entires, and H is a 5x11-matrix with nonnegative entries) some entries of W and H need to be irrational. The goal of this project is to explore if the number of columns of M can be chosen below 11, perhaps by dropping some columns of M. This will require some geometric reasoning. The project will likely involve the use of tools (such as SMT solvers) that check systems of nonlinear inequalities for solvability. Prerequisites: The project is mathematical in nature, and a good recollection of linear algebra is needed. Openness towards programming and the use of tools is helpful.
Bilateral trade Elias Koutsoupias B 2017-18  C One of the fundamental problems at the interface of Algorithms and Economics is the problem of designing algorithms for simple trade problems that are incentive compatible. An algorithm is incentive compatible when the participants have no reason to lie or deviate from the protocol. In general, we have a good understanding when the problem involves only buyers or only sellers, but poor understanding when the market has both buyers and sellers. This project will investigate and implement optimal algorithms for this problem. It will also consider algorithms that are natural and simple but may be suboptimal. Prerequisites: Mathematical and algorithmic maturity. Fundamentals of game theory is useful, but not essential
Bitcoin mining games Elias Koutsoupias B 2017-18  C Bitcoin is a new digital currency that is based on distributed computation to maintain a consistent account of the ownership of coins. The main difficulty that the bitcoin protocol tries to address is to achieve agreement in a distributed system run by selfish participants. To address this difficulty, the bitcoin protocol relies on a proof-of-work scheme. Since the participants in this proof-of-work scheme want to maximize their own utility, they may have reasons to diverge from the prescribed protocol. The project will address some of the game-theoretic issues that arise from the bitcoin protocol Prerequisites: Mathematical and algorithmic maturity. Fundamentals of game theory is useful, but not essential
Learning algorithms for linear programming Elias Koutsoupias B 2017-18  C Linear programming (LP) is an important algorithmic problem. There are efficient algorithms for LP, such as the Simplex Algorithm, that do not perform well in the worst case, and there are inefficient algorithms, such as the Ellipsoid Algorithm, that are good in theory but not in practice. Learning algorithms, such as the Multiplicative Weight Update Algorithm, have the potential to be good both in theory and in practice. The project will address this question with theoretical analysis and implementation. Prerequisites: Mathematical and algorithmic maturity. Fundamentals of learning theory is useful, but not essential
Applications of deductive machine learning Daniel Kroening C

The goal of deductive machine learning is to provide computers with the ability to automatically learn a behaviour that provably satisfies a given high-level specification. As opposed to techniques that generalise from incomplete specifications (e.g. examples), deductive machine learning starts with a complete problem description and develops a behaviour as a particular solution.

Potential applications of deductive machine learning are detailed below, and a student would focus on one of these items for their project. We envisage applying existing algorithms, with potential to develop new ones.

- Game playing strategy: given the specification of the winning criteria for a two-player game, learn a winning strategy.

- Program repair: given a buggy program according to a correctness specification, learn a repair that makes the program correct.

- Lock-free data structures: learn a data structure that guarantees the progress of at least one thread when executing multi-threaded procedures, thereby helping to avoid deadlock.

- Security exploit generation: learn code that takes advantage of a security vulnerability present in a given software in order to cause unintended behaviour of that software.

- Security/cryptographic protocol: learn a protocol that performs a security-related function and potentially applies cryptographic methods.

- Compression: learn an encoding for some given data that uses fewer bits than the original representation. This can apply to both lossless and lossy compression.

Projects being offered by D Kroening Daniel Kroening B 2017-18  C

 Daniel Kroening is happy to supervise projects related to

  • automated hardware and software verification
  • program analysis
  • SAT and decision procedures
  • applications of SAT or SMT in other areas, e.g., systems biology or finance

For a list of sample projects, see here.

 

The Galileo Phone Daniel Kroening, Tom Melham B 2017-18  C

The Intel Galileo boards are micro controller boards that can be networked, and also conform to the Arduino standard so that devices (sensors, actuators, screens, I/O devices) can be attached to them and controlled. The task is to build a device resembling a system-on-chip, i.e., a big system with several devices that work in concert together. An obvious example would be to build your own smartphone. It should provide the ability to initiate and receive calls, feature an MP3 player, and possibly some other phone features – alarm clock, GPS, and so on. You will be given a small budget to add the devices you need. Part of the challenge is restricted battery power — how long can you run your phone off a largish capacitor?

Quantitative/Probabilistic Modelling, Verification and Synthesis Marta Kwiatkowska C

Professor Marta Kwiatkowska is happy to supervise projects in the area of quantitative/probabilistic modelling, verification and synthesis, particularly those relating to the PRISM model checker. PRISM is an open source formal verification tool for analysis of probabilistic systems. PRISM has an extensive website which includes software for download, tutorial, manual, publications and many case studies. Students' own proposals in the broad area of theory, algorithms and implementation techniques for software verification/synthesis will also be considered.

Below are some concrete project proposals:

  • Modelling trust in human-robot interaction. When human users interact with autonomous robots, appropriate notions of computational trust are needed to ensure that their interactions are safe and effective: too little trust can lead to user disengagement, and too much trust may cause damage. Trust management systems have been introduced for autonomous agents on the Internet, but need to be adapted to the setting of mobile robots, taking into account intermittent connectivity and uncertainty on sensor readings. Recently, a logic and model checking algorithm were formulated for reasoning about trust (http://qav.comlab.ox.ac.uk/bibitem.php?key=HK170, see also https://www.hayfestival.com/p-12297-marta-kwiatkowska.aspx?skinid=16. This project aims to develop an implementation of a simplified model checking algorithm. The project will suit a student interested in theory and/or software implementation.
  • Autonomous urban driving. This project is concerned with synthesising strategies for autonomous driving directly from requirements expressed in temporal logic, so that they are correct by construction. Probability is used to quantify information about hazards, such as accidents hotspots. Inspired by the DARPA Urban Challenge, a method for synthesising strategies (controllers) from multi-objective requirements was developed and validated on map data for villages in Oxfordshire ( http://www.prismmodelchecker.org/bibitem.php?key=CKSW13 ). The idea is to develop the techniques further, to allow high-level navigation based on waypoints, and to develop strategies for avoiding threats, such as road blockage, at runtime. In the longer term, the goal is to validate the methods on realistic scenarios in collaboration with the Mobile Robotics Group. The project will suit a student interested in theory or software implementation. For more information about the project see http://www.veriware.org/autonomous.php
  • Controller synthesis for robot coordination. Autonomous robots have numerous applications in scenarios such as warehouse management, planetary exploration, or search and rescue. In view of environmental uncertainty, such scenarios are modelled using Markov decision processes. The goals (e.g. the goods should be delivered to location A while avoiding the hazardous location B) can be conveniently specified using temporal logic, from which correct-by-construction controllers (strategies) can be generated. This project aims to develop a PRISM model of a system of robots for a particular scenario so that safety and effectiveness of their cooperation is guaranteed. Techniques based on machine learning, and specifically real-time dynamic programming ( http://www.prismmodelchecker.org/papers/atva14.pdf ), will be utilised to generate controllers directly from temporal logic goals. This project will suit a student interested in machine learning and software implementation.
  • Modelling and verification of DNA programs. DNA molecules can be used to perform complex logical computation. DNA computation differs from conventional digital computation and is sometimes referred to as ‘computing with soup’http://www.economist.com/node/21548488 . Correct design of DNA devices is error-prone and the task is supported by tools such as the DNA Strand Displacement (DSD) programming language and simulator (http://research.microsoft.com/en-us/projects/dna/default.aspx) developed at Microsoft. The DSD tool enables the probabilistic model checking of DNA circuits and has been used to identify a flaw in a DNA transducer gate (see http://qav.comlab.ox.ac.uk/bibitem.php?key=LPC+12). The aim of this project is to model and analyse DNA implementation of logic inference proposed in “Autonomous Resolution Based on DNA Strand Displacement”, DNA Computing and Molecular Programming, LNCS 6937, 2011. The project will suit a student interested in modelling DNA programs using DSD and/or PRISM. For more information about the DNA computing project see http://www.veriware.org/dna.php.

A Molecular Recorder (with Luca Cardelli). Recent technological developments allow massive parallel reading (sequencing) and writing (synthesis) of heterogeneous pools of DNA strands. We are no longer limited to simple circuits built out of a small number of different strands, nor to reading out a few bits of output by fluorescence. While these

Safety Assurance for Deep Neural Networks Marta Kwiatkowska C

Professor Marta Kwiatkowska is happy to supervise projects in the area of safety assurance and automated verification for deep learning. This is a new research topic initiated with this paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=HKW+17), see also https://www.youtube.com/watch?v=XHdVnGxQBfQ.

Below are some concrete project proposals:

  • Safety Testing of Deep Neural Networks. Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In a recent paper (https://arxiv.org/abs/1710.07859) a method was proposed for searching for adversarial examples using the SIFT feature extraction algorithm. The method is based on a two-player turn-based stochastic game, where the first player's objective is to find an adversarial example by manipulating the features, and proceeds through Monte Carlo tree search. It was evaluated on various networks, including YOLO object recognition from camera images. This project aims to adapt the techniques to object detection in lidar images such as Vote3D (http://ori.ox.ac.uk/efficient-object-detection-from-3d-point-clouds/), utilising the Oxford Robotics Institute dataset (http://ori.ox.ac.uk/datasets/).
  • Safety Testing of End-to-end Neural Network Controllers. NVIDIA has created a deep learning system for end-to-end driving called PilotNet (https://devblogs.nvidia.com/parallelforall/explaining-deep-learning-self-driving-car/). It inputs camera images and produces a steering angle. The network is trained on data from cars being driven by real drivers, but it is also possible to use the Udacity simulator to train it. Safety concerns have been raised for neural network controllers because of their vulnerability to adversarial examples – an incorrect steering angle may force the car off the road. In a recent paper (https://arxiv.org/abs/1710.07859) a method was proposed for searching for adversarial examples using the SIFT feature extraction algorithm. The method is based on a two-player turn-based stochastic game, where the first player's objective is to find an adversarial example by manipulating the features, and proceeds through Monte Carlo tree search. This project aims to use these techniques to evaluate the robustness of PilotNet to adversarial examples.
  • Universal L0 Perturbations for Deep Neural Networks (with Wenjie Ruan). Since deep neural networks (DNNs) are deployed in autonomous driving systems, ensuring their safety, security and robustness is essential. Unfortunately, DNNs are vulnerable to adversarial examples - slightly perturbing an image may cause a misclassification, see CleverHans (https://github.com/tensorflow/cleverhans). Such perturbations can be generated by adversarial attackers. Most current adversarial perturbations are designed based on the L1, L2 or Linf norms. Recent work demonstrated the advantages of perturbations based on the L0-norm (http://nicholas.carlini.com/papers/2017_sp_nnrobustattacks.pdf). This project, aims to, given a well-trained deep neural network, demonstrate the existence of a universal (image-agnostic) L0-norm perturbation that causes most of images to be misclassified. The work will involve designing a systematic algorithm for computing universal perturbations and empirically analysing these to show whether they generalize well across neural networks. This project is suited to students familiar with neural networks and Python programming.

 

An Innovative Network Scanning Framework Harjinder Lallie, Michael Goldsmith B 2017-18  C Networks often contain multiple vulnerabilities and weaknesses which can be exploited by attackers. Security analysts often find it necessary to perform n Network scanning, probing and vulnerability testing aids the process of discovering and correcting network vulnerabilities. There exist a number of challenges in terms of gathering network data in this manner: -          Scalability. Scanning/probing techniques do not scale well to large networks (Lippmann et al., 2006, Bopche and Mehtre, 2014). -          Semantic problems. A number of approaches – such as that by Roschke et al. (2009) and Cheng et al. (2011) attempted to combine and correlate the results returned from multiple vulnerability databases. However, the databases return the results in different formats - some in textual format, others in XML format. This means that the results have to be unified into a common meaningful format -          Data Consolidation. Other than Rosschke et al’s (2009) study into combining network scanning methods, there have been very few studies which critically evaluate methods of combining the use and results of multiple scanning tools -          Performance. Scanning and then combining the results from multiple scanners takes a lot of time (Cheng et al., 2011). Quite often the network configuration changes during the scan – which means that the results are quite often inaccurate. This thesis addresses one or more of the challenges presented above. Prerequisites: This project involves practical work which may involve setting up a virtual network and applying a range of scanning, probing and vulnerability testing mechanisms on the network.
Calculating Network Security Metrics Harjinder Lallie, Michael Goldsmith B 2017-18  C The calculation of network security metrics is a complex problem which involves understanding the state and configuration of network connections, devices and protocols. This research analyses the problem of calculating network security metrics and proposes a framework which can calculate security metrics for a typical small network comprising of numerous devices, operating systems and hosts. The project will involve the configuration of virtual networks for testing the framework. CVSS or other metric scores may be used to aid the calculation of the metric. This dissertation involves some practical work which may involve setting up a virtual network which has a number of machines on the network and applying a range of scanning, probing and vulnerability testing mechanisms on the network.
Non-invasive analysis of blood pressure in the aorta Pablo Lamata, David Kay C

Description: Blood pressure is a widely used biomarker to stratify and diagnose several cardiovascular diseases. Recent advances in imaging and computational techniques now enable the non-invasive assessment of blood pressure gradients in our central circulatory system (heart chambers and main vessels), and have the potential to improve our capability to understand disease processes of our cardiovascular system. This project will contribute to the development of these novel technologies, and to the analysis of the distribution of pressure fields in the human aorta. The student will engage with a project at its early stage, having the opportunity to feel the thrill of exploring novel clinical data never analysed before. He will engage in a multidisciplinary team with cardiologist from the John Radcliffe Hospital in the search for explanations and justification of findings.

Prerequisites: Motivation. Good analytical skills. Experience with finite element methods (Navier Stokes) and image analysis is an advantage.

Symmetry Declarations for SAT-solvers Anthony Lin B 2017-18  C Constraint programming is a programming paradigm, in which programmers write programs by specifying the problem description (in terms of constraints) and letting computers use their computational power to figure out the solution automatically. This is in contrast to typical programming paradigms, wherein programmers explicitly spell out procedures for carrying out the computation. Constraint programming has a wide range of applications (e.g. optimisation, planning, ) and rely on powerful constraint solvers to solve computational problems. SAT-solvers --- claimed by some researchers to be among the greatest achievements in the past decade --- are one of the most powerful solvers in the constraint programming toolbox. In a nutshell, SAT-solvers are algorithms for solving satisfiability of boolean formulas, which is an NP-complete problem that can be used as a universal language for encoding many practical combinatorial problems. Although the problem of satisfiability of boolean formulas is difficult in theory, SAT-solvers have greatly advanced in the past two decades to the extent that large formulas (with millions of variables/ clauses) can now be handled. The aim of the project is to investigate ways in which to improve the performance of SAT-solvers by embedding symmetry information (e.g. variable symmetry). Boolean formulas that arise in practice (e.g. as encodings of combinatorial problems) exhibit many symmetries, which the programmers typically know. Specific questions to explore include: (1) what is a convenient (but general) way to specify symmetry information in boolean formulas? (2) given the symmetry information, how do we engineer SAT-solvers to exploit such information to speed up the search for solutions?
Analysing concurrent datatypes in CSP Gavin Lowe C The aim of this project would be to model some concurrent datatypes in CSP, and to analyse them using the model checker FDR. The concurrent datatypes could be some of those studied in the Concurrent Algorithms and Datatypes course. Typical properties to be proved would be linearizability (against a suitable sequential specification) and lock freedom. Prerequisites: Concurrency and Concurrent Algorithms and Datatypes. Reading: Analysing Lock-Free Linearizable Datatypes using CSP, Gavin Lowe.
Understanding the Java Memory Model Gavin Lowe C Concurrent programs on modern architectures can often behave in surprising ways: the presence of caches means that writes can take time to propagate from one thread to another; the compiler can perform optimisations that cause operations to be re-ordered; similarly, the hardware may execute operations out of order. For example, given the Scala program var x = 0; var y = 0 def P = { val r1 = x; y = 1 } def Q = { val r2 = y; x = 1 } Executing P and Q in parallel may lead to a state where r1 = r2 = 1, counter-intuitively, for example as a result of P's two statements being re-ordered. The Java Memory Model (JMM) gives a formal definition of the behaviours that are allowed for such programs. Unfortunately, the definition is convoluted and hard to understand. The aim of this project is to aid better understanding of the JMM, by producing a tool that, given a small program (like the one above or the ones in [4]), returns all its valid executions. References: [1] Java Memory Model Pragmatics (transcript), Aleksey Shipilev, http://shipilev.net/blog/2014/jmm-pragmatics. [2] Formalising Java's Data Race Free Guarantee, David Aspinall and Jaroslav Sevcik. [3] Jeremy Manson, William Pugh and Sarita Adve, The Java Memory Model. [4] William Pugh and Jeremy Manson, Java memory model causality test cases (2004), http://www.cs.umd.edu/~pugh/java/memoryModel/CausalityTestCases.html.
Assessing the performance of stochastic optimization methods in structural modelling Peter Minary B 2017-18  C

The efficiency of Monte Carlo (MC) based stochastic optimization methods (e.g. Kirkpatrick, et al. Science, 220, 671–680 (1983)) are compared to find low energy conformational states of a given system. We are particularly interested in MC protocols where the temperature varies as a series of impulses followed by relaxation and only the temperature of a given part (e.g. site of interest) of the system is changed.

Prerequisites:  Recommended for students who has done the Probability and Computing and Geometric Modelling courses and have interest in randomized search methods and their applications in structural biology.

Software development for computational structural biology Peter Minary B 2017-18  C

Several projects are available to implement new algorithms or protocols into the MOSAICS (http://www.cs.ox.ac.uk/mosaics) software package. The first project would aim for the development of new analysis tools to interpret the simulation result. For example, the new protocol would take (input) a structural similarity mearure and a trajectory of simulated conformations and would produce (output) a measure of structural diversity of conformations visited. In the second project, students would implement and compare different physical models to describe hydrogen bonding, which is among the most important canonical interactions that stabilize the double helical DNA.

Prerequisites:  

The project is recommended for students who took the Geometric Modelling and Computer Animation courses as well as have some interest in Numerical Methods (e.g. solution of Ordinary Differential Equations) and their applications to atomistic simulations.

Tools and applications in computational epigenetics Peter Minary B 2017-18  C

Chemical modifications such as (hydroxy)methylation on nucleic acids are used by the cell for silencing and activation of genes. These so called epigenetic marks can be recognized by ‘protein readers’ indirectly due to their structural ‘imprints', the effects they impose on DNA structure. The project include the development of computational protocols to assess the effect of epigenetic modifications on DNA structure. This research may shed light on how different epigentic modifications affect the helical parameters of the double stranded DNA.

Prerequisites:

Strong interest in visualizing, analysing and comparing 3D objects and modelling molecular structures.  The project can be tailored to suit those from a variety of backgrounds but would benefit from having taken the following courses: Computer Graphics, Geometric Modelling and Computer Animation.

An Electronic Commerce Protocol Hanno Nickau B 2017-18  C

Commercial use of the Internet is becoming more and more common, with an increasing variety of goods becoming available for purchase over the Net. Clearly, we want such purchases to be carried out securely: a customer wants to be sure of what (s)he's buying and the price (s)he's paying; the merchant wants to be sure of receiving payment; both sides want to end up with evidence of the transaction, in case the other side denies it took place; the act of purchase should not leak secrets, such as credit card details, to an eavesdropper.

The aim of this project is to find out more about the protocols that are used for electronic commerce, and to implement a simple e-commerce protocol. In more detail:

Understand the requirements of e-commerce protocols;

Specify an e-commerce protocol, both in terms of its functional and security requirements;

  • Understand cryptographic techniques;
  • Understand how these cryptographic techniques can be combined together to create a secure protocol - and understand the weaknesses that allow some protocols to be attacked;
  • Design a protocol to meet the requirements identified;
  • Implement the protocol.

A variant of this project would be to implement a protocol for voting on the web (which would have a different set of security properties).

Prerequisites for this project include good program design and implementation skills, including some experience of object-oriented programming, and a willingness to learn about protocols and cryptography. The courses on concurrency and distributed systems provide useful background for this project.

1 Jonathan Knudsen, Java Cryptography, O'Reilly, 1998.

Efficient processing of nested collections in Apache Spark Milos Nikolic C

"Apache Spark is today one of the most popular frameworks for large-scale data analysis. Spark offers a functional (Scala-like) API for processing data collections that are distributed over a cluster of machines. Its declarative approach, domain-specific libraries (e.g., for machine learning and graph processing), and high performance have enabled its wide adoption in the industry.

Although Spark can transform collections of arbitrary types, it can exhibit severe performance problems when processing nested data formats such as JSON and XML. In particular, distributed processing of datasets where nested collections have skewed cardinalities (e.g., one extremely large, others small nested collections) leads to uneven distribution of work among the machines. In such cases, developers typically have to undergo a painful process of manual query re-writing to avoid load imbalance for large inner collections in their workloads. This project aims to extend the Spark API with a new functionality that would automatically transform user queries to avoid data skews. This project is a great opportunity for students to understand how Apache Spark works under the hood and to contribute to an open-source project."

Experimental evaluation of hashmap implementations Milos Nikolic B 2017-18  C This project will study existing in-memory hashmap implementations (e.g, GCC, Boost, Google's sparsehash) and experimentally evaluate their performance. This evaluation aims to provide deeper insights into the behavior of these hashmaps under different types of workloads using performance profiling tools (e.g., Valgrind). Armed with this knowledge, the student will implement a hashmap with multi-index access optimised for using in real-time (streaming) environments. Prerequisites: solid programming skills (C++ preferable)
Parallel query evaluation in streaming environments Milos Nikolic C Many modern applications such as social web and IoT applications use stream processing engines to handle continuously arriving data in real time. In this project, we consider one such state-of-the-art engine, called DBToaster, which continuously evaluates (static) queries over changing (dynamic) data. The goal of this project is to enable multi-core query processing over streaming data in DBToaster, which requires implementing necessary primitives for parallelization of the existing single-threaded query evaluation procedures.
Factorised Databases Dan Olteanu B 2017-18  C

More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students.

 

Probabilistic databases (MayBMS, SPROUT) Dan Olteanu B 2017-18  C

Please see Dr Dan Olteanu's web page (http://www.cs.ox.ac.uk/people/dan.olteanu.html) for further details

Branching Temporal Logics, Automata and Games Luke Ong B 2017-18  C

Model checking has emerged as a powerful method for the formal verification of programs. Temporal logics such as CTL (computational tree logic) and CTL* are widely used to specify programs because they are expressive and easy to understand. Given an abstract model of a program, a model checker (which typically implements the acceptance problem for a class of automata) verifies whether the model meets a given specification. A conceptually attractive method for solving the model checking problem is by reducing it to the solution of (a suitable subclass of) parity games. These are a type of two player infinite game played on a finite graph. The project concerns the connexions between the temporal logics CTL and / or CTL*, automata, and games. Some of the following directions may be explored. 1. Representing CTL / CTL* as classes of alternating tree automata. 2. Inter-translation between CTL / CTL* and classes of alternating tree automata 3. Using B¨uchi games and other subclasses of parity games to analyse the CTL / CTL* model checking problem 4. Efficient implementation of model checking algorithms 5. Application of the model checker to higher-order model checking.

References:

Orna Kupferman, Moshe Y. Vardi, Pierre Wolper: An automata-theoretic approach to branchingtime model checking. J. ACM 47(2): 312-360 (2000).
http://dx.doi.org/10.1145/333979.333987

Rachel Bailey: A Comparative Study of Alorithmics for Solving B¨uchi Games. University of Oxford MSc Dissertation, 2010.
http://www.cs.ox.ac.uk/people/luke.ong/personal/publications/RachelBailey_MScdissertation.pdf

Luke Ong's Projects Luke Ong B 2017-18  C See http://users.comlab.ox.ac.uk/luke.ong/projects/ for more information
Building databases of mathematical objects in Sagemath (Python) Dmitrii Pasechnik B 2017-18  C

There is an enormous amount of information on constructing various sorts of ``interesting'', in one or another way, mathematical objects, e.g.

block designs, linear and non-linear codes, Hadamard matrices, elliptic curves, etc.  There is considerable interest in having this information available in computer-ready form.  However, usually the only  available form is a paper describing the construction, while no computer code (and often no detailed description of a possible implementation) is provided. This provides interesting algorithmic and software engineering challenges in creating verifiable implementations; properly structured and documented code, supplemented by unit tests, has to be provided, preferably in functional programming style (although  performance is important too).

 

Sagemath project aims in part to remedy this, by implementing such constructions, see e.g. Hadamard matrices in Sagemath: http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/matrices/hadamard_matrix.html and http://arxiv.org/abs/1601.00181.

 

The project will contribute to such implementations.

There might be a possibility for participation in Google Summer of Code (GSoC) with Sagemath as a GSoC organisation, and being partially funded by the EU project ``Open Digital Research Environment Toolkit for the Advancement of Mathematics''

http://opendreamkit.org/.

Prerequisites: Interest in open source software, some knowledge of Python, some maths background.

Computing with semi-algebraic sets in Sage (math) Dmitrii Pasechnik B 2017-18  C

Semi-algebraic sets are subsets of R^n specified by polynomial inequalities. The project will extend capabilities of Sage (http://www.sagemath.org) to deal with them, such as CADs computations or sums of squares based (i.e. semi-definite programming based) methods. There might be a possibility for participation in Google Summer of Code (GSoC) or Google Semester of Code with Sage as a GSoC organisation.

Prerequisites

Interest in open source software, some knowledge of Python, appropriate maths background.

Implementing and experimenting with variants of Weisfeiler-Leman graph stabilisation Dmitrii Pasechnik B 2017-18  C Weisfeiler-Leman graph stabilisation is a procedure that plays an important role in modern graph isomorphism algorithms (L.Babai 2015) and enjoyed attention in machine learning community recently (cf. "Weisfeiler-Lehman graph kernels"). While it is relatively easy to implement, efficient portable implementations seem to be hard to find. In this project we will work on producing such an open-source implementation, either by reworking our old code from http://arxiv.org/abs/1002.1921 or producing new one; we will also work on providing Python/Cython interfaces for a possible inclusion in SageMath (http://sagemath.org) library, and/or elsewhere.
A user-friendly lung simulation for the inspired sinewave device to measure lung function and pulmonary blood flow Phi Phan C The inspired sinewave device is a novel non-invasive medical device to measure lung function and pulmonary blood flow. It works by delivering small doses of tracer gases into the patient breaths and measuring the responses in the expired breaths. The device is being developed towards commercialisation. A lung simulation that could be used by non-expert computer scientists (such as nurses and medical doctors) would be a useful addition to the technology. Objectives: To develop a user-friendly lung simulation to help predicting the responses of the inspired sinewave tests in various lung conditions, from healthy to diseased. The simulation consists of 2 key parts: the GUI and the lung model. The GUI: (i)                   The GUI would allow users to enter different inspired sinewave test settings and change parameters of the lung model. (ii)                 The output of the simulation would be a visualisation of the simulated test and also an excel file of simulated data for back-testing of the device algorithms (already developed). The lung model: the mathematical equations for the model have already been established and will need to be implemented in either Matlab or Simulink. Details of the mathematical lung model are available up on request. Prerequisites: The student should: _ be competent in Matlab (Simulink would be a bonus). _ be competent in implementing differential equations. _ have an interest in computational biology and mathematical modelling.
An interactive visual tutorial for Bayesian parameter estimation Joe Pitt-Francis, Michael Clerx B 2017-18  C The aim of this project is to build an educational tool which enables the progress of a Bayesian parameter estimation algorithm to be visualised. The model to be fitted might be (but is not limited to) a system of Ordinary Differential Equations and the Bayesian estimation tools might be build around an existing system such as Stan, PyML or Edward. A good tutorial system should be able to let the user change the underlying model system, introduce noise to a system, visualise interactive updates to probability distributions, explore the progress of a chosen sampling method such as Metropolis-Hastings and provide enough information that a novice student can get an intuition into all aspects of the process.
Automatic translation to GPGPU Joe Pitt-Francis B 2017-18  C

This project involves running cardiac cell models on a high-end GPU card. Each model simulates the electrophysiology of a single heart cell and can be subjected to a series of computational experiments (such as being paced at particular heart rates). For more information about the science and to see it in action on CPU see "Cardiac Electrophysiology Web Lab" at https://travis.cs.ox.ac.uk/FunctionalCuration/ An existing compiler (implemented in Python) is able to translate from a domain specific XML language (http://models.cellml.org) into a C++ implementation. The goal of the project is to add functionality to the compiler in order to get OpenCL or CUDA implementations of the same cell models and to thus increase the efficiency of the "Web Lab".

General graphics projects Joe Pitt-Francis B 2017-18  C

I am interested in supervising general projects in the area of computer graphics.  If you have a particular area of graphics-related research that you are keen to explore then we can tailor a bespoke project for you.  Specific projects I have supervised in the past include "natural tree generation" which involved using Lindenmayer systems to grow realistic looking bushes and trees to be rendered in a scene; "procedural landscape generation" in which an island world could be generated on-the-fly using a set of simple rules as a user explored it; "gesture recognition" where a human could control a simple interface using hand-gestures; "parallel ray-tracing" on distributed-memory clusters and using multiple threads on a GPU card; "radiosity modelling"

used for analysing the distribution of RFID radio signal inside a building; and "non-photorealistic rendering" where various models were rendered with toon/cel shaders and a set of pencil-sketch shaders.

Graphics pipeline animator Joe Pitt-Francis B 2017-18  C

Pre-requisites: Computer graphics, Object-oriented programming

The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process

Intuitive exploration through novel visualisation Joe Pitt-Francis B 2017-18  C I am interested in novel visualisation as a way to represent things in a more appealing and intuitive way. For example the Gnome disk usage analyzer (Baobab) uses either a "ring chart" or "treemap chart" Representation to show us which sub-folders are using the most disk. In the early 1990s the IRIX file system navigator used a 3D skyscraper representation to show us similar information. There are plenty more ways of representing disk usage: from DAGs to centralised Voronoi diagrams. What kind of representation is most intuitive for finding a file which hogging disk-space and which is most intuitive for helping us to remember where something is located in the file-system tree? The aim is to explore other places where visualisation gain intuition: for example, to visualise the output of a profiler to find bottlenecks in software, to visual a code coverage tool in order to check that test-suites are are testing the appropriate functionality or even to visualise the prevalence of diabetes and heart disease in various regions of the country.
Parsing and reinforcement learning Stephen Pulman C Incremental dependency parsers such those described in http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-056-R1-07-027 typically try to predict what parsing action to take next by training a classifier which will look ahead in the input, and at the current parse state, and make a choice between actions. In many current non-linguistic applications, however, this kind of "what action do I perform next" question is answered by using reinforcement learning, where the system learns a reward function from training data that should bias towards that action most likely to lead to a successful conclusion. This project aims to experiment to see whether a reinforcement learning decision component could lead to better parsing performance than the more usual classifier-based decisions.
Computation of particle interactions in n-dimensional space Martin Robinson C

"Aboria (https://github.com/martinjrobins/Aboria) is a C++ library for evaluating and solving systems of equations that can be described as interactions between particles in n-dimensional space. It can be used as a high performance library to implement numerical methods such as Molecular Dynamics in computational chemistry, or Gaussian Processes for machine learning.

Project 1: Aboria features a radial neighbour search to find nearby particles in the n-dimensional space, in order to calculate their interactions. This project will implement a new algorithm based on calculating the interactions between neighbouring *clusters* of particles. Its performance will be compared against the existing implementation, and across the different spatial data structures used by Aboria (Cell-list, Octree, Kdtree). Prerequisites: C++

Project 2: Aboria features a serial Fast Multipole Algorithm (FMM) for evaluating smooth long range interactions between particles. This project will implement and profile a parallel FMM algorithm using CUDA and/or the Thrust library.

Prerequisites: C++, Knowledge of GPU programming using CUDA and/or Thrust

Project 3: The main bottleneck of the FMM is the interactions between well-separated particle clusters, which can be described as low-rank matrix operations. This project will explore different methods compressing these matrices in order to improve performance, using either Singular Value Decomposition (SVD), Randomised" SVD, or Adaptive Cross Approximation

Prerequisites: C++, Linear Algebra

Verification of Concurrent Software based on Partial-Order Semantics César Rodríguez, Daniel Kroening B 2017-18  C

Description: A Boolean program is one where all variables are of Boolean type. In the context of formal verification of concurrent software, concurrent Boolean programs (CBPs) are generated by a process of abstracting the original program, in such a way that the original analysis problem reduces now to an equivalent problem on the CBP. The unfolding technique is a well studied verification method for another model of concurrency called Petri nets. Unfoldings represent the behaviour of a Petri net in a compact way, by means of a partial-order enriched with additional information. Not only this this representation is a theoretically neat one, but also very efficient for practical verification. The goal of the project is to apply the unfolding technique to the verification of CBPs, and compare with existing verification techniques for CBPs.

Prerequisites: suitable for students having followed the course "Computer-Aided Formal Verification"

A processor design toolkit for an open-source FPGA toolchain Alex Rogers B 2017-18  C Field-programmable gate arrays (FPGA) are integrated circuits that can be configured after manufacture into almost any conceivable logic circuit combining both combinatorial and synchronous elements. They can be used to explore real hardware implementations of processor designs from simple accumulator machines, through to register machines, and more unusual stack machines. Their use outside industry has previously been limited due to the high cost and complexity of their associated development software. However, this is changing, with an open-source toolchain for popular FPGA devises becoming available in the last year (the equivalent of Linux in the OS world). This project will build on this toolchain to develop the additional software necessary to allow students to design and explore simple processor designs on a custom FPGA development board. The current toolchain requires the use of the Verilog hardware description language. Verilog is very powerful but also very general. This project will develop a high-level language (possibly a graphical language) focused on the development of simple processor designs from a small number of standard components (such as RAM, multiplexers and registers). Prerequisites: Digital Systems, Compilers & Computer Architecture useful but not essential.
Convolution neural networks for microcontrollers and constrained hardware Alex Rogers B 2017-18  C Convolution neural networks have made dramatic advances in recent years on many image and vision processing tasks. While training such networks is computationally expensive (typically requiring very large image datasets and exploiting GPU acceleration), they can often be deployed on much simpler hardware if simplifications such as integer or even binary weights are imposed on the network. This project will explore the deployment of trained convolution networks on microcontrollers (and possibly also FPGA-based hardware) with the intention of demonstrating useful image processing (perhaps recognising the presence of a face in the field of view of a low pixel camera) on low-power devices. Prerequisites: Machine Learning & Computer Architecture useful but not essential.
Resurrecting Extinct Computers Alex Rogers B 2017-18  C

While the architecture of current reduced instruction set processors is well established, and relatively static, the early days of computing saw extensive experimentation and exploration of alternative designs. Commercial processors developed during the 1960s, 1970s and 1980s included stack machines, LISP machines and massively parallel machines, such as the Connection Machine (CM-1) consisting of 65,536 individual one-bit processors connected together as a 12-dimensional hypercube. This period also saw the development of the first single chip microprocessors, such as the Intel 4004, and the first personal computers, such as the Altair 8800 using the Intel 8080 microprocessor. This project will attempt to resurrect one of these extinct designs (or a scaled down version if necessary) using a modern low-cost field-programmable gate array (FPGA). You will be required research the chosen processor, using both original and modern sources, and then use Verilog to develop a register level description of the device that can be implemented on a FPGA. The final device should be able to run the software of the original and could be realised in a number of different forms depending on the chosen processor (e.g. an Altair 8800 on a small USB stick running Microsoft BASIC). Prerequisites: Digital Systems or Computer Architecture useful but not essential

Units of Measure as Types within an Interactive Programming Environment Alex Rogers B 2017-18  C Being able to define the units of constants and variables in a programming language has great value in many applications. NASA's Mars Climate Orbiter was lost in 1999 due to software that calculated trajectory thruster firings in pounds seconds, rather than newton-seconds. In other cases, dimensional analysis (statically checking that the computed units match those that are expected) is sufficient to catch many errors in calculations. While F# supports units natively, and libraries exist in many others languages (e.g. Java, Haskell, Python), none are particularly easy to use, and often introduce clumsy syntax. This project will build improve on these approaches. You will be required to develop a unit-aware interactive programming environment enabling unit-safe physics based calculations to be performed. This might be a stand-alone solution, or a kernel for an existing interactive computing environment such as Project Jupyter (jupyter.org). Prerequisites: Compilers useful but not essential
Modelling and verifying systems in Timed CSP and FDR Bill Roscoe B 2017-18  C

Timed CSP reinterprets the CSP language in a real-time setting and has a semantics in which the exact times of events are recorded as well as their order. Originally devised in the 1980s, it has only just been implemented as an alternative mode for FDR. The objective of this project is to take one or more examples of timed concurrent system from the literature, implement them in Timed CSP, and where possible compare the performance of these models with similar examples running on other analysis tools such as Uppaal.

References:

(Reference Understanding Concurrent Systems, especially Chapter 15, and Model Checking Timed CSP, from AWR's web list of publications)

A simple object-oriented language Michael Spivey B 2017-18  C

Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

NB. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

Better JIT translation Michael Spivey B 2017-18  C

The existing JIT for Keiko is very simple-minded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pure-JIT implementation.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

Better performance for GeomLab Michael Spivey B 2017-18  C

At present, GeomLab programs show a performance that competes favourably with Python, making it possible to address tasks like computing images of the Mandelbrot set using a purely functional program that calls a function once for each pixel. But there is still a gap between the performance of GeomLab programs and similar ones written in Java or C, and more ambitious image-processing tasks would be made possible by better performance, particularly in the area of arithmetic. Explore ways of improving performance, perhaps including the possibility of improving the performance of GeomLab by allowing numbers to be passed around without wrapping them in heap-allocated objects, or the possibility of compiling the code for Haskell-style pattern matching in a better way.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

Eliminating array bounds checks Michael Spivey B 2017-18  C

The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is non-null. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

GeomLab and Mindstorms Michael Spivey B 2017-18  C

GeomLab has a turtle graphics feature, but the pictures are drawn only on the screen. It should be possible to make a turtle out of Lego Mindstorms, then control it with an instance of Geomlab running on a host computer, with communication over Bluetooth.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

GeomLab on Android Michael Spivey B 2017-18  C

Produce an implementation of GeomLab's GUI and graphics library that works on the Android platform. Either use an interpreter for GeomLab's intermediate code to execute GeomLab programs, or investigate dynamic translation of the intermediate code into code for Android's virtual machine Dalvik.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

Heap-based activation records Michael Spivey B 2017-18  C

At present, Keiko supports only conventional Pascal-like language implementations that store activation records on a stack. Experiment with an implementation where activation records are heap-allocated (and therefore recovered by a garbage collector), procedures are genuinely first-class citizens that can be returned as results in addition to being passed as arguments, and tail recursion is optimised seamlessly.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

Keiko on Mindstorms Michael Spivey B 2017-18  C

Alternative firmware for the Mindstorms robot controller provides an implementation of the JVM, allowing Java programs to run on the controller, subject to some restrictions. Using this firmware as a guide, produce an interpreter for a suitable bytecode, perhaps some variant of Keiko, allowing Oberon or another robot language of your own design to run on the controller. Aim to support the buttons and display at first, and perhaps add control of the motors and sensors later.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B. This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

Type-checking for GeomLab Michael Spivey B 2017-18  C

The GeomLab language is untyped, leading to errors when expressions are evaluated that would be better caught at an earlier stage. Most GeomLab programs, however, follow relatively simple typing rules. The aim in this project is to write a polymorphic type checker for GeomLab and integrate it into the GeomLab system, which is implemented in Java. A simple implementation of the type-checker would wait until an expression is about to be evaluated, and type-check the whole program at that point. As an extension of the project, you could investigate whether it is possible to type-check function definitions one at a time, even when some of the functions they call have not yet been defined.

Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.

N.B This project is not available to MSc students in 2016-17.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical.

A Machine Learning Approach to Personalizing Education: Improving Individual Learning through Tracking and Course Recommendation Mihaela van der Schaar, Edith Elkind C More students are enrolling in college and professional degree programs than ever before. However, current degree programs are often “one size fits all”; such programs ignore the heterogeneity of students in terms of backgrounds, abilities, learning styles and career goals. Moreover, because of ever-increasing student/teacher ratios, students are often left struggling to find their own pathways through degree programs. The combination leads to poor learning outcomes, low engagement, dissatisfaction and high dropout rates. In this project, an interactive electronic system will be built that is personalized for each student, is able to continuously track progress and goals, capitalize on the knowledge accumulated, and recommend suitable courses and activities in order to build skills, enhance interest and promote  long-term goals. In effect, our personalized interactive system operates as “if” there is a dedicated mentor for each student. To build this system, the following modules will need to be developed: (1) student and course similarity discovery methods; (2) student performance prediction algorithms; (3) personalized course recommendation algorithms. To read more about the role of machine learning in education – see medianetlab.ee.ucla.edu/EduAdvance Prerequisites: This project is suitable for someone with at least basic knowledge of machine learning
Detecting structure in voters' preferences Mihaela van der Schaar, Edith Elkind B 2017-18  C A large international community of researchers is trying to use computers to allow groups of people (or groups of automated agents) make better joint decisions. This is a hard problem, since the preferences that agents report might contradict each other, and this leads to so-called voting paradoxes. Also, it can be computationally hard to calculate what decisions to make. A promising way to tackle this problem is by exploiting structure in the reported preferences. For this purpose, Australian researchers have collected an impressive amount of real-world preference data (PrefLib http://www.preflib.org/) comprising over 3,000 data sets coming from user preferences taken from places as diverse as rating systems at Netflix and TripAdvisor, and real political election data, with the aim of figuring out how we might use properties of preferences that actually occur in the real world. This project is about analysing this data to reveal how much structure is contained in these preferences. Technically, we are interested in figuring out how close the preferences reported are to what are known as single-peaked and/or single-crossing preferences (see, e.g., https://en.wikipedia.org/wiki/Single_peaked_preferences). There are different measures of closeness, and for many of them the associated decision problem is NP-hard; for others, the computational complexity is not known. Prerequisites: There are several ways in which this project can be pursued. On the one hand, one can consider the notions of closeness for which the complexity of the associated problem is not known, and try to develop an efficient algorithm or prove NP-hardness. On the other hand, one can try to develop practical algorithms for detecting profiles that are almost single-peaked/single-crossing, by encoding the associated problem as an instance of SAT or an integer linear program and running a respective solver; such algorithms could then be applied to PrefLib data.
Machine learning for finance Mihaela van der Schaar, Edith Elkind C This project aims to use machine learning techniques such as ensemble learning, convolutional neural networks etc. to predict spot prices for a variety of industries. Machine learning is increasingly used in finance to make predictions as well as to aggregate among existing strategies for making investments over time. We will use various free as well as proprietary data sets to assess the value of our newly developed methods in terms of both profit and risk, and compare them with state of the art techniques. This will also involve developing new “lucky factors” (features) that can be extracted from the data to inform and improve existing and new investment strategies. The expectation is that the work will lead to a conference publication. Prerequisites: This project is suitable for someone with at least basic knowledge of machine learning.
Revolutionizing medicine through machine learning – Using advanced graphical models for developing personalized policies for HIV screening and treatment Mihaela van der Schaar, Edith Elkind C The first part of this project aims to use advanced graphical models (including enhancements of Hidden Markov Models etc.) to discover personalized trajectories for HIV disease progression, using available electronic health record data. The second part of the project aims to learn how personalized treatment and screening plans can affect disease trajectories in the short run and in the long run, with the overall goal of identifying effective treatment plans for various types of patients. The dataset contains various types of patients and their responses to different medications over time. The project will involve also interacting with a renowned clinician specializing in HIV. In the short run, this work will lead to a publication in an important conference. In the longer run, this work – and more generally, the development of these methods – will change and advance the way medicine is practiced. To read more about the role of machine learning in medicine – see medianetlab.ee.ucla.edu/MedAdvance Prerequisites: This project is suitable for someone with at least basic knowledge of neural networks and/or machine learning.
Revolutionizing medicine through machine learning – Using convolutional neural networks for asthma Mihaela van der Schaar, Edith Elkind C This project aims to use convolutional neural networks to discover how to best treat patients with asthma, using available electronic health record data. The dataset contains information about various (types of) patients and their responses to different medications over time. The focus of the project is to train a convolutional neural network to identify how to best treat patients over time. The project will involve interacting with a renowned clinician specializing in asthma diagnosis and treatment. In the short run, this work will lead to a publication in an important conference. In the longer run, this work – and more generally, the development of these methods – will change and advance the way medicine is practiced. To read more about the role of machine learning in medicine – see medianetlab.ee.ucla.edu/MedAdvance Prerequisites: This project is suitable for someone with at least basic knowledge of neural networks and/or machine learning.
Deciding satisfiability of formulas in the guarded negation fragment Michael Vanden Boom B 2017-18  C

The guarded negation fragment of first-order logic is an expressive logic of interest in databases and knowledge representation. It has been shown to have a decidable satisfiability problem but, to the best of my knowledge, there is no tool actually implementing a decision procedure for it.

The goal would be to design a tool to determine whether or not a formula in this logic is satisfiable. Most likely, this would require designing and implementing a tableau-based algorithm, in the spirit of related tools for description logics and the guarded fragment.

Prerequisites: Logic and Proof (or equivalent). There are some connections to material in Knowledge Representation and Reasoning, but this is not essential background

Interpolation Michael Vanden Boom C

Let F1 and F2 be sentences (in first-order logic, say) such that F1 entails F2: that is, any model of F1 is also a model of F2. An interpolant is a sentence G such that F1 entails G, and G entails F2, but G only uses relations and functions that appear in *both* F1 and F2.

The goal in this project is to explore and implement procedures for constructing interpolants, particularly for certain decidable fragments of first-order logic. It turns out that finding interpolants like this has applications in some database query rewriting problems.

Prerequisites: Logic and Proof (or equivalent)

3D demos of geometric concepts Irina Voiculescu B (2018-19)  B 2017-18 The Geometric Modelling course (Part B) deals with several interesting concepts which would be best visualised in a suite of applications. A coherent suite of 3D demos could easily become a useful tool for this course, as well as for users worldwide. This project would be most suitable for a candidate who already have some experience using a 3D graphics library of their choice and want to improve this skill. The mathematical concepts are well-documented.
3D environment for Hand Physiotherapy Irina Voiculescu B 2017-18  C After hand surgery, it is almost always necessary for patients to have physiotherapy afterwards to help with their recovery. As part of this, the patient will need to perform hand exercises at home. However, the patient may not always do the exercises correctly, or they might forget to do their exercises. The goal of this project is to use the Leap Motion to create a user-friendly GUI which a patient could use to aid them with their home exercises. The interface would show the user where their hand should be and they would then need to follow the movements. It could work from a web-based software or a downloaded software. It would need to be tailored to the patient so it contained their specific required exercises, which could be input by the physiotherapist. It would need to store data on how the patient is doing and feedback this data to the patient, and possibly also to the physiotherapist via the internet. If internet-based, patient confidentiality and security would need to be considered. This project would be performed in close collaboration with a physiotherapist, an orthopaedic hand surgeon, and a post-doctoral researcher based at the Nuffield Orthopaedic Centre.
3D printing medical scan data Irina Voiculescu B 2017-18  C

Computed tomography (CT) scanning is a ubiquitous scanning modality. It produces volumes of data representing internal parts of a human body. Scans are usually output in a standard imaging format (DICOM) and come as a series of axial slices (i.e. slices across the length of the person's body, in planes perpendicular to the imaginary straight line along the person's spine.)

The slices most frequently come at a resolution of 512 x 512 voxels, achieving an accuracy of about 0.5 to 1mm of tissue per voxel, and can be viewed and analysed using a variety of tools. The distance between slices is a parameter of the scanning process and is typically much larger, about 5mm.

During the analysis of CT data volumes it is often useful to correct for the large spacing between slices. For example when preparing a model for 3D printing, the axial voxels would appear elongated. These could be corrected through an interpolation process along the spinal axis.

This project is about the interpolation process, either in the raw data output by the scanner, or in the post-processed data which is being prepared for further analysis or 3D printing.

The output models would ideally be files in a format compatible with 3D printing, such as STL. The main aesthetic feature of the output would be measurable as a smoothness factor, parameterisable by the user.

Existing DICOM image analysis software designed within the Spatial Reasoning Group at Oxford is available to use as part of the project.

3D stereo display of medical scan data Irina Voiculescu, Stuart Golodetz C The Medical Imaging research group has been working with a variety of data sourced from CT and MRI scans. This data comes in collections of (generally greyscale) slices which together make up 3D images. Our group has developed software to generate 3D models of the major organs in these images. This project aims to develop a simple augmented reality simulation for the Oculus Rift which will render these organs within a transparent model of a human and allow the user to walk around the model so as to view the organs from any angle. This has a number of possible applications, including to train medical students and to help surgeons to explain medical procedures to their patients.
CUDA parallelisation of 3D reconstruction Irina Voiculescu, Stuart Golodetz B 2017-18 The Medical Imaging research group has been working with a variety of data sourced from CT and MRI scans. This data comes in collections of (generally greyscale) slices which together make up 3D images. Our group has developed software to generate 3D meshes of the major organs in these images. Because it is CPU-only, it is quite slow. The aim of this project is to speed it up with an optimised CUDA implementation on the GPU.
Exact Algorithms for Complex Root Isolation Irina Voiculescu C

Not available in 2013/14

Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and Hansen-Sengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU.

Gesture recognition using Leap Motion Irina Voiculescu C

Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users being asked to carry out static gestures above a Leap Motion sensor, and these gestures being scored according to a variety of criteria. A prototype has been constructed to gather data, and some data has been gathered from a few controls and patients. In order to deploy this as a clinical tool into the NHS, there is need for a systematic data collection and analysis tool, based on machine learning algorithms to help classify the data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. Also, the graphical user interface needs to be extended to give particular kinds of feedback to the patient in the form of home exercises, as part of a rehabilitation programme.

This project was originally set up in collaboration with Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.

Identifying features in MRI scan data Irina Voiculescu C In recent years, medical diagnosis using a variety of scanning modalities has become quasi-universal and has brought about the need for computer analysis of digital scans. Members of the Spatial Reasoning research group have developed image processing software for CT (tomography) scan data. The program partitions (segments) images into regions with similar properties. These images are then analysed further so that particular features (such as bones, organs or blood vessels) can be segmented out. The team's research continues to deal with each of these two separate meanings of medical image segmentation. The existing software is written in C++ and features carefully-crafted and well-documented data structures and algorithms for image manipulation. In certain areas of surgery (e.g. orthopaedic surgery involving hip and knee joint) the magnetic resonance scanning modality (MRI) is preferred, both because of its safety (no radiation involved) and because of its increased visualisation potential. This project is about converting MRI scan data into a format that can become compatible with existing segmentation algorithms. The data input would need to be integrated into the group's analysis software in order then to carry out 3D reconstructions and other measurements. This project is co-supervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS.
Reinforcement learning techniques for games Irina Voiculescu B 2017-18  C

This project is already taken for 2017-2018

 

Psychology has inspired and informed a number of machine learning methods. Decisions within an algorithm can be made so as to improve an overall aim of maximising a (cumulative) reward. Supervised learning methods in this class are known as Reinforcement Learning. A basic reinforcement learning model consists of establishing a number of environment states, a set of valid actions, and rules for transitioning between states. Applying this model to the rules of a board game means that the machine can be made to learn how to play a simple board game by playing a large number of games against itself. The goal of this project is to set up a reinforcement learning environment for a simple board game with a discrete set of states (such as Backgammon). If time permits, this will be extended to a simple geometric game (such as Pong) where the states may have to be parameterised in terms of geometric actions to be taken at each stage in the game.

Simple drawing analysis Irina Voiculescu C

Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users drawing simple figures on a tablet, and these figures being scored according to a variety of criteria. Data has already been gathered from 200 or so controls, and is being analysed for a range of parameters in order to assess what a neurotypical person could achieve when drawing such simple figures. Further machine learning analysis could help classify such data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia.

This project was originally co-supervised by Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.

Surgery teaching and training tool Irina Voiculescu C

Knee replacement surgery involves a precise series of steps that a surgeon needs to follow. Trainee surgeons have traditionally mastered these steps by learning from textbooks or experienced colleagues. Surgeons at the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS) in Oxford have been working on a standardised method to help trainees internalise the sequence of events in an operation. It is proposed to construct a computer-based tool which would help with this goal. Apart from the choice of tools and materials, the tool would also feature a virtual model of the knee. The graphical user interface will present a 3D model of a generic knee to be operated, and would have the ability for the user to make cuts necessary to the knee replacement procedure. There would be pre-defined parameters regarding the type and depth of each cut, and an evaluation tool on how the virtual cuts compared against the parameters.

The project goals are quite extensive and so this would be suitable for an experienced programmer.

This project is co-supervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS.

Textile fabric detail simulation Irina Voiculescu B 2017-18  C Conventional computer-aided design (CAD) software uses methods such as extrusion and revolution tools that the user can apply to create the 3D shape of a part. These tools are based on traditional manufacturing methods and work very well for most CAD applications. One application which these tools do not work well for is creating 3-dimensional representations of textiles. The exact path of each fibre within the textile is dependent upon the other fibres, and the flexibility of the fibres. The purpose of this project is to create a simple software tool/algorithm into which a user can input a weave pattern (flat), or a braid pattern (cylindrical), and the flexibility of the fibres and it will create a 3-dimensional representation of the structure.
An investigation of the solution of least squares problems using the QR factorisation Jonathan Whiteley B 2017-18  C Experimental data inevitably contains error.  These experimental observations are often compared to theoretical predictions by writing as a least squares problem, i.e. minimising the sum of squares between the experimental data and theoretical predictions.  These least squares problems are often solved using a QR-factorisation of a known matrix, which uses the Gram-Schmidt method to write the columns of this matrix as a linear sum of orthonormal vectors.  This method, when used in practice, can exhibit numerical instabilities, where the (inevitable) numerical errors due to fixed precision calculations on a computer are magnified, and may swamp the calculation.  Instead, a modified Gram-Schmidt method is used for the QR-factorisation.  This modified Gram-Schmidt factorisation avoids numerical instabilities, but is less computationally efficient.  The first aim of this project is to investigate the relative computational efficiencies of the two methods for QR-factorisation.  The second aim is to use the QR-factorisation to identify: (i) what parameters can be recovered from experimental data; and (ii) whether the data can automatically be classified as "good" or "bad".
The efficiency of numerical algorithms Jonathan Whiteley B (2017-18)  B 2017-18  C Many numerical algorithms have error bounds that depend on some user provided input.  For example, the error in a numerical method for solving a differential equation is bounded in terms of the step-size h, and so the user may change the step-size h until a desired accuracy is attained.  Although useful, these error bounds do not take account of computational efficiency.  For example, a numerical method for solving a differential equation may have a very impressive bound with respect to step size h, but may require significantly more computational effort than other methods with less impressive error bounds.  The field of scientific computing is a rich source of algorithms such as these, for example the numerical solution of differential equations, the numerical solution of linear systems, and interpolation of functions.  The aim of this project is to work with a variety of algorithms for solving a given problem, and to assess the computational efficiency of these algorithms.
The efficient storage of sparse matrices Jonathan Whiteley B 2017-18  C Many matrix applications involve sparse matrices, i.e. matrices that have a very large number of rows and columns, but only a small number of non-zero entries in each row.  On a given computer we may only store a finite number of matrix entries.  When working with sparse matrices we usually store only the non-zero entries and their locations.  There are several established techniques for storing sparse matrices.  These methods all have individual strengths and weaknesses - some allow efficient multiplication of sparse matrices by vectors, others allow entries to be modified effectively, while others allow the sparsity pattern to be modified dynamically.  The aim of this project is to investigate these sparse storage methods, and to highlight the strengths and weaknesses of each approach.
Model Checking LTL on Markov Chains James Worrell B 2017-18  C

The goal of this project is to write a program that model checks a Markov chain against an LTL formula, i.e., calculates the probability that formula is satisfied. The two main algorithmic tasks are to efficiently compile LTL formulas into automata and then to solve systems of linear equations arising from the product of the Markov chain and the automaton. An important aspect of this project is to make use of an approach that avoids determinising the automaton that represents the LTL formula. This project builds on material contained in the Logic and Proof and Models of Computation Courses.

Reading: J-M. Couvreur, N. Saheb and G. Sutre. An optimal automata approach to LTL model checking of probabilistic systems. Proceedings of LPAR'03, LNCS 2850, Springer 2003.

Topics in Linear Dynamical Systems James Worrell C A linear dynamical system is a discrete- or continuous-time system whose dynamics is given by a linear function of the current state. Examples include Markov chains, linear recurrence sequences (such as the Fibonacci sequence), and linear differential equations. This project involves investigating the decidability and complexity of various reachability problems for linear dynamical systems. It would suit a mathematically oriented student. Linear algebra is essential, and number theory or complexity theory is desirable. A relevant paper is Ventsislav Chonev, Joël Ouaknine, James Worrell: The orbit problem in higher dimensions. STOC 2013: 941-950.
Extension of the interval program analysis for unknown library functions Hongseok Yang B 2017-18  C

The interval program analysis is a well-known algorithm for estimating the behaviour of programs without actually running them. The algorithm takes an imperative program, and returns, at each program point, interval constraints for variables in the program, such as 1 <= x <= 3 && 2 <= y. This algorithm assumes that all the functions called by the input program are defined in the program, so that the source code of every called function can be found in the program. However, in practice, this assumption is not necessarily met. Programs often use library functions whose source code is not available. The goal of this project is to lift this assumption. During the project, a student will develop an interval-analysis algorithm that works in the presence of calls to unknown library functions, implement the algorithm, and evaluate the algorithm experimentally.

Prerequisites:

A prerequisite of this project is the Compiler course.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Yang but should note that the response may be delayed as he is on sabbatical.

Study of probabilistic programs using tools from programming languages Hongseok Yang C

Recently, researchers in machine learning have developed new Turing-complete languages, such as Infer.net, Anglican and Church, for writing sophisticated probabilistic models and performing various inference tasks on those models, such as the computation of posterior probabilities. The goal of this project is to study these languages using tools from programming languages. Specifically, a student will work on developing a new inference algorithm for probabilistic programs that mix techniques from program analysis and those from the Monte Carlo simulation, a common method for performing inference on probabilistic programs. Or the student will explore the connection between the use of computational effects in higher-order functional probabilistic programming languages and the encoding of advanced probability models in those languages (in particular, nonparametric Bayesian models), which has been pointed out by the recent work of Dan Roy and his colleagues.

Prerequisites: Compiler and Machine learning courses. The Programming Language course is not required, but useful for carrying out this project.

Undergraduate students who wish to enquire about a project for 2017-18 are welcome to contact Prof Yang but should note that the response may be delayed as he is on sabbatical.

Topics in Algorithms, Complexity, and Combinatorial Optimisation Standa Živný C Prof Zivny is willing to supervise in the area of algorithms, complexity, and combinatorial optimisation. In particular, on problems related to convex relaxations (linear and semidefinite programming relaxations), submodular functions, and algorithms for and complexity of homomorphisms problems and Constraint Satisfaction Problems.


Advice and Forms

Sample Projects

Writing Skills