Student Projects
Undergraduate students in the third and fourth years of the Final Honour School of Computer Science, and students for the MSc in Computer Science are required to undertake a project. Mathematics & Computer Science undergraduates are required to undertake a Computer Science project or a Mathematics dissertation in their fourth year. Computer Science & Philosophy undergraduates may choose to undertake a Computer Science project or a Philosophy thesis in their fourth year.
This is your chance to work on your own project, something that interests and inspires you. We have put together a brief guidance document on how to do it; please click on the link below. On this site, you will also find a sortable and searchable list of projects and potential supervisors. You are welcome to choose one of the projects on the list, or approach a potential supervisor and negotiate your own topic.
Important Deadlines are:
 All:
 Monday of Week 7, Hilary Term : your online project registration form must be submitted.
 MSc students:
 Monday of Week 1, Trinity Term : submission deadline for your project proposal.
 3rd/4th year students:
 12pm on Monday of Week 4 of Trinity Term : Submission deadline.
 MSc students:
 Tuesday of Week 5, Michaelmas Term : Submission deadline
Project writing handbook
Sample projects
3rd year
4th year
List of projects
Suitable for
Research themes
Supervisors
Project  Supervisors  Themes  B  C  MSc  Description  

Aggregation of Photovoltaic Panels  Alessandro Abate  Automated Verification  C  The increased relevance of renewable energy sources has modified the behaviour of the electrical grid. Some renewable energy sources affect the network in a distributed manner: whilst each unit has little influence, a large population can have a significant impact on the global network, particularly in the case of synchronised behaviour. This work investigates the behaviour of a large, heterogeneous population of photovoltaic panels connected to the grid. We employ Markov models to represent the aggregated behaviour of the population, while the rest of the network (and its associated consumption) is modelled as a single equivalent generator, accounting for both inertia and frequency regulation. Analysis and simulations of the model show that it is a realistic abstraction, and quantitatively indicate that heterogeneity is necessary to enable the overall network to function in safe conditions and to avoid load shedding. This project will provide extensions of this recent research. In collaboration with an industrial partner. Prerequisites: ComputerAided Formal Verification, Probabilistic Model Checking 

Analysis and verification of stochastic hybrid systems  Alessandro Abate  Automated Verification  C  Stochastic Hybrid Systems (SHS) are dynamical models that are employed to characterize the probabilistic evolution of systems with interleaved and interacting continuous and discrete components. Formal analysis, verification, and optimal control of SHS models represent relevant goals because of their theoretical generality and for their applicability to a wealth of studies in the Sciences and in Engineering. In a number of practical instances the presence of a discrete number of continuously operating modes (e.g., in faulttolerant industrial systems), the effect of uncertainty (e.g., in safetycritical airtraffic systems), or both occurrences (e.g., in models of biological entities) advocate the use of a mathematical framework, such as that of SHS, which is structurally predisposed to model such heterogeneous systems. In this project, we plan to investigate and develop new analysis and verification techniques (e.g., based on abstractions) that are directly applicable to general SHS models, while being computationally scalable. Courses: ComputerAided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games Prerequisites: Familiarity with stochastic processes and formal verification 

Automated verification of complex systems in the energy sector  Alessandro Abate  Automated Verification  C  Smart microgrids are smallscale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks. These systems present heterogeneity and complexity coming from 1. local and volatile renewables generation, and 2. the presence of nonlinear dynamics both over continuous and discrete variables. These factors calls for the development of proper quantitative models. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is in particular to focus on the energy production via renewables, such as photovoltaic panels. The project can benefit form a paid visit/internship to industrial partners. Courses: ComputerAided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed. 

Development of software for the verification of MPL models  Alessandro Abate  Automated Verification  C  MSc  This project is targeted to enhance the software tollbox VeriSiMPL (''very simple''), which has been developed to enable the abstraction of MaxPlusLinear (MPL) models. MPL models are specified in MATLAB, and abstracted to Labeled Transition Systems (LTS). The LTS abstraction is formally put in relationship with its MPL counterpart via a (bi)simulation relation. The abstraction procedure runs in MATLAB and leverages sparse representations, fast manipulations based on vector calculus, and optimized data structures such as DifferenceBound Matrices. LTS can be pictorially represented via the Graphviz tool and exported to PROMELA language. This enables the verification of MPL models against temporal specifications within the SPIN model checker. Courses: ComputerAided Formal Verification, Numerical Solution of Differential Equations Prerequisites: Some familiarity with dynamical systems, working knowledge of MATLAB and C 

Formal verification of a software tool for physical and digital components  Alessandro Abate, Daniel Kroening  Automated Verification  MSc  We are interested in working with existing commercial simulation software that is targeted around the modelling and analysis of physical, multidomain systems. It further encompasses the integration with related software tools, as well as the interfacing with devices and the generation of code. We are interested in enriching the software with formal verification features, envisioning the extension of the tool towards capabilities that might enable the user to raise formal assertions or guarantees on models properties, or to synthesise correctbydesign architectures. Within this longterm plan, this project shall target the formal generation of faults warnings, namely of messages to the user that are related to ``bad (dynamical) behaviours'' or to unwanted ``modelling errors''. The student will be engaged in developing algorithmic solutions towards this goal, while reframing them within a formal and general approach. The project is interdisciplinary in dealing with hybrid models involving digital and physical quantities, and in connecting the use of formal verification techniques from the computer sciences with more classical analytical tools from control engineering Courses: ComputerAided Formal Verification, Software Verification. Prerequisites: Knowledge of basic formal verification. 

Innovative Sensing and Actuation for Smart Buildings  Alessandro Abate  Automated Verification  C  MSc  Sensorisation and actuation in smart buildings and the development of smart HVAC (heat, ventilation and airconditioning) control strategies for energy management allow for optimised energy usage, leading to the reduction in power consumption or to optimised demand/response strategies that are key in a rather volatile market. This can further lead to optimised maintenance for the building devices. Of course the sensitisation of buildings leads to heavy requirements on the overall infrastructure: we are interested in devising new approaches towards the concept of using ``humans as sensors''. Further, we plan to investigate approaches to perform metasensing, namely to extrapolate the knowledge from physical sensors towards that of virtual elements (as an example, to infer the current building occupancy from correlated measurements of temperature and humidity dynamics). On the actuation side, we are likewise interested in engineering noninvasive minimalistic solutions, which are robust to uncertainty and performancecertified. The plan for this project is to make the first steps in this direction, based on recent results in the literature. The project can benefit from a visit to Honeywell Labs (Prague). Courses: ComputerAided Formal Verification. Prerequisites: Some familiarity with dynamical systems.  
Model learning and verification  Alessandro Abate, Daniel Kroening  Automated Verification  C  MSc  This project will explore connections of techniques from machine learning with successful approaches from formal verification. The project has two sides: a theoretical one, and a more practical one: it will be up to the student to emphasise either of the two sides depending on his/her background and/of interests. The theoretical part will develop existing research, for instance in one of the following two interdisciplinary domain pairs: learning & repair, or reachability analysis & Bayesian inference. On the other hand, a more practical project will apply the above theoretical connections on a simple models setup in the area of robotics and autonomy. Courses: ComputerAided Formal Verification, Probabilistic Model Checking, Machine Learning 

Precise simulations and analysis of aggregated probabilistic models  Alessandro Abate  Automated Verification  C  MSc  This project shall investigate a rich research line, recently pursued by a few within the Department of CS, looking at the development of quantitative abstractions of Markovian models. Abstractions come in the form of lumped, aggregated models, which are beneficial in being easier to simulate or to analyse. Key to the novelty of this work, the proposed abstractions are quantitative in that precise error bounds with the original model can be established. As such, whatever can be shown over the abstract model, can be as well formally discussed over the original one. This project, grounded on existing literature, will pursue (depending on the student's interests) extensions of this recent work, or its implementation as a software tool. Courses: ComputerAided Formal Verification, Probabilistic Model Checking, Machine Learning 

Safe Reinforcement Learning  Alessandro Abate  Automated Verification  C  Reinforcement Learning (RL) is a known architecture for synthesising policies for Markov Decision Processes (MDP). We work on extending this paradigm to the synthesis of ‘safe policies’, or more general of policies such that a linear time property is satisfied. We convert the property into an automaton, then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the automaton. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is `constrained' by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP. We evaluate the performance of the algorithm on numerous numerical examples. This project will provide extensions of these novel and recent results. Prerequisites: ComputerAided Formal Verification, Probabilistic Model Checking, Machine Learning 

Software development for abstractions of stochastic hybrid systems  Alessandro Abate  Automated Verification  C  Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reachavoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states. The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.). Courses: Probabilistic Model Checking, Probability and Computing, Numerical Solution of Differential Equations Prerequisites: Some familiarity with stochastic processes, working knowledge of MATLAB and C 

Contextuality and quantum advantage  Samson Abramsky  Foundations, Structures, and Quantum  C  "The aim of the project is to explore the use of quantum resources to gain advantage over classical computational models in a range of tasks. The tasks which may be considered include: boundeddepth circuits, nonlocal games, measurementbased computation, communication complexity, and query complexity. Highlevel methods for describing quantum solutions for such problems will be studied, based on recent work on a structural theory of contextuality and its application to quantum advantage developed by Abramsky and coworkers. Both qualitative aspects  existence of perfect strategies or solutions  and quantitative aspects  contextuality monotones and resource inequalities  will be studied. References: Contextual fraction as a measure of contextuality, Abramsky, Barbosa and Mansfield PRL 2017 The quantum monad, Abramsky, Barbosa, de Silva and Zapata, MFCS 2017. "  
Pebble games, monads and comonads in classical, probabilistic and quantum computation  Samson Abramsky  Foundations, Structures, and Quantum  C  Pebble games are an important and widely used tool in logic, algorithms and complexity, constraint satisfaction and database theory. The idea is that we can explore a pair of structures, e.g. graphs, by placing up to k pebbles on them, so we have a window of size at most k on the two structures. If we can always keep these pebbles in sync so that the two ksized windows look the same (are isomorphic) then we say that Duplicator has a winning strategy for the kpebble game. This gives a resourcebounded notion of approximation to graphs and other structures which has a wide range of applications. Monads and comonads are widely used in functional programming, e.g. in Haskell, and come originally from category theory. It turns out that pebble games, and similar notions of approximate or local views on data, can be captured elegantly by comonads, and this gives a powerful language for many central notions in constraints, databases and descriptive complexity. For example, kconsistency can be captured in these terms; another important example is treewidth, a key parameter which is very widely used to give “islands of tractability” in otherwise hard problems. Finally, monads can be used to give various notions of approximate or nonclassical solutions to computational problems. These include probabilistic and quantum solutions. For example, there are quantum versions of constraint systems and games which admit quantum solutions when there are no classical solutions, thus demonstrating a “quantum advantage”. Monads and comonads can be used in combination, making use of certain “distributive laws”. The aim of this project is to explore a number of aspects of these ideas. Depending on the interests and background of the student, different aspects may be emphasised, from functional programming, category theory, logic, algorithms and descriptive complexity, probabilistic and quantum computation. Some specific directions include: 1. Developing Haskell code for the kpebbling comonad and various nonclassical monads, and using this to give a computational toolbox for various constructions in finite model theory and probabilistic or quantum computation. 2. Developing the categorytheoretic ideas involved in combining monads and comonads, and studying some examples. 3. Using the language of comonads to capture other important combinatorial invariants such as treedepth. 4. Developing the connections between category theory, finite model theory and descriptive complexity. Some background reading. 1. Leonid Libkin, Elements of finite model theory. (Background on pebble games and the connection with logic and complexity). 2. The pebbling comonad in finite mode theory. S. Abramsky, A. Dawar and P. Wang. (Technical report describing the basic ideas which can serve as a starting point.)  
Sheaf theoretic semantics for vector space models of natural language  Samson Abramsky  Foundations, Structures, and Quantum  B  C  Contextuality is a fundamental feature of quantum physical theories and one that distinguishes it from classical mechanics. In a recent paper by Abramsky and Brandenburger, the categorical notion of sheaves has been used to formalize contextuality. This has resulted in generalizing and extending contextuality to other theories which share some structural properties with quantum mechanics. A consequence of this type of modeling is a succinct logical axiomatization of properties such as nonlocal correlations and as a result of classical no go theorems such as Bell and KochenSoecker. Like quantum mechanics, natural language has contextual features; these have been the subject of much study in distributional models of meaning, originated in the work of Firth and later advanced by Schutze. These models are based on vector spaces over the semiring of positive reals with an inner product operation. The vectors represent meanings of words, based on the contexts in which they often appear, and the inner product measures degrees of word synonymy. Despite their success in modeling word meaning, vector spaces suffer from two major shortcomings: firstly they do not immediately scale up to sentences, and secondly, they cannot, at least not in an intuitive way, provide semantics for logical words such as `and', `or', `not'. Recent work in our group has developed a compositional distributional model of meaning in natural language, which lifts vector space meaning to phrases and sentences. This has already led to some very promising experimental results. However, this approach does not deal so well with the logical words. The goal of this project is to use sheaf theoretic models to provide both a contextual and logical semantics for natural language. We believe that sheaves provide a generalization of the logical Montague semantics of natural language which did very well in modeling logical connectives, but did not account for contextuality. The project will also aim to combine these ideas with those of the distributional approach, leading to an approach which combines the advantages of Montaguestyle and vectorspace semantics. Prerequisites ========== The interested student should have taken the category theory and computational linguistics courses, or be familiar with the contents of these. 

Joint states for temporally separated quantum systems  Jonathan Barrett  Foundations, Structures, and Quantum  B  C  In ordinary classical probability theory, and to a large extent in classical general relativity, time does not have a privileged role. Yet quantum theory treats space and time very differently. There have been various proposals in the literature for how to define a joint state of several quantum systems that exist at different moments in time. The aim of the project is to evaluate these proposals and to develop them further. The project will investigate, for example, whether there is an operational meaning for a proposed joint state in terms of experimental outcomes, and whether entropies and mutual informations can be defined for temporally separated systems. Prerequisites: Quantum Information  
Quantum causal inference  Jonathan Barrett  Foundations, Structures, and Quantum  C  The aim is to analyse the problem of quantum causal inference, in situations where an agent has only partial information, and the full causal structure is unknown. Develop techniques (more ambitiously, an algorithm), for quantum causal inference, and give examples of situations where quantum causal inference is different from classical causal inference. Prerequisites: Quantum Information. Desirable: Quantum Computer Science  
Interpolation  Michael Benedikt  Information Systems  C  MSc  Let F1 and F2 be sentences (in firstorder logic, say) such that F1 entails F2: that is, any model of F1 is also a model of F2. An interpolant is a sentence G such that F1 entails G, and G entails F2, but G only uses relations and functions that appear in *both* F1 and F2. The goal in this project is to explore and implement procedures for constructing interpolants, particularly for certain decidable fragments of firstorder logic. It turns out that finding interpolants like this has applications in some database query rewriting problems. Prerequisites: Logic and Proof (or equivalent) 

Optimization of Web Query Plans  Michael Benedikt  Information Systems  MSc  This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions. We will look at logicbased methods for analyzing query plans, taking into account integrity constraints that may exist on the data. 

Optimized reasoning with guarded logics  Michael Benedikt  Information Systems  C  "Inference in firstorder logic is undecidable, but a number of logics have appeared in the last decade that achieve decidability. Many of them are guarded logics, which include the guarded fragment of firstorder logic. Although the decidability has been known for some decades, no serious implementation has emerged. Recently we have developed new algorithms for deciding some guarded logics, based on resolution, which are more promising from the perspective of implementation. The project will pursue this both in theory and experimentally." Prerequisites A knowledge of firstorder logic, e.g. from the Foundations of CS or Knowledge Representation and Reasoning courses, would be important.  
Highthroughput Molecular Algorithms  Luca Cardelli  Automated Verification  MSc  Recent technological developments allow massive parallel reading (sequencing) and writing (synthesis) of heterogeneous pools of DNA strands. We are no longer limited to circuits built out of a small number of different strands, nor to reading out a few bits of output by fluorescence. While these emerging capabilities are somewhat stochastic and errorprone, new classes of algorithms should feasibly take advantage of them. We plan to start by investigating new algorithms to record the order in which a series of molecular events occurs, e.g., the appearance of certain chemicals over time in a dynamic chemical soup. A large network of DNA gates acting in parallel records the relationships between the events; the result can be read out by massive DNA sequencing. We plan to study feasible DNA structures to implement this and similar algorithms, simulate them, and investigate their correctness and robustness. Prerequisites: background in distributed algorithms, verification, and/or synthetic biology.  
Noise in Molecular Logic Gates  Luca Cardelli  Automated Verification  MSc  A Boolean variable B that ranges over {true,false} can be represented by two molecular species {B_true,B_false}. Boolean gates like AND can then be described by chemical reactions over those species. These reactions can in turn be implemented by DNA molecules and physically executed. Networks of such Boolean gates can function as controllers for molecularscale devices, including devices we may want to insert into living organisms. We want to investigate, by mathematical analysis, modelchecking, and simulation, the noise behavior of these logical gates, due to both noisy inputs and to the intrinsic molecular fluctuations generated by chemical reactions. How we can we compute reliably in such a regime, and how can we design logic gates that are resistant to noise? Prerequisites: Background in verification and/or simulation 

Merging all the biomedical image processing power of ITK with Matlab  Ramón Casero Cañas, Vicente Grau  Computational Biology and Health Informatics  MSc  The Insight Toolkit (ITK), a C++ library, is the most advanced software base for biomedical image processing. Our group has produced a basic working prototype of a software interface to run ITK image filters from Matlab [1], the biomedical engineer's platform of choice. We are looking for a student who, as a minimum, integrates ~155 existing image filters into this interface to make them available for biomedical research. This is more or less straightforward by hand. The challenge for an exceptional student would be to make a parser to automatically link ITK filters to Matlab. [1] http://code.google.com/p/gerardus/source/browse/trunk/matlab/?r=1082#matlab%2FItkToolbox  
Information processing and conservation laws in general probabilistic theories  Giulio Chiribella  Foundations, Structures, and Quantum  B  C  "Project description and suitability for different students considered. See the appendix. areas in which I could supervise student projects in quantum information theory and quantum foundations. general areas for student supervision: quantum information and foundations of quantum mechanics.  
A generic evaluation of a categorical compositionaldistributional model of meaning  Bob Coecke  Foundations, Structures, and Quantum  MSc  The categorical compositional distributional model of meaning is an emerging field of Computational Linguistics and natural language processing. It has been pioneered in the Quantum and Computational Linguistic groups of the department. The general theoretical underpinnings of the model is based on compact closed categories, inspired by the categorical quantum models of Abramsky and Coecke. In order to apply the model to main stream tasks, one has to instantiate it on concrete linguistic models, and in particular on distributional (vector space) models of meaning. So far, these instantiations have been done on datasets with small sentences containing few language units such as verbs, adjectives, and relative pronouns, all separately. In this project, we aim to connect these single experiments and perform a unified task. This is a termclassification task, the goal of which is to successfully classify a number of dictionary terms to their definitions (or viceversa). Specifically, the project involves the following:
We seek candidates who know Computational Linguistics and Linear Algebra, as well as programming. A knowledge of category theory is encouraged, but not required. The project will be cosupervised by Dimitri Kartsaklis, who has performed the previous termclassification tasks, Mehrnoosh Sadrzadeh and Bob Coecke. The outcome may be considered for publication in a peerreviewed conference or journal. References:


Categorical quantum mechanics and Lie Algebras  Bob Coecke, Amar Hadzihasanovic  Foundations, Structures, and Quantum  MSc  Is there a way to define a category whose objects are root systems (with additional structure), so that the construction of Lie algebras from them is functorial? If so, what are its properties  e.g., is it monoidal, and in that case, is the functor (lax) monoidal? Prerequisites: categorical quantum mechanics and Lie algebras 

Developing conceptual space semantics for compositional distributional models of meaning  Bob Coecke, Martha Lewis  Foundations, Structures, and Quantum  MSc  The compositional distributional model of meaning (Coecke et al., 2010) models the meaning of sentences using a category such as pregroups to model the grammar of the sentences and another category with the same compact closed structure to model the meanings of the individual words. In practice, word meanings are often modelled using finitedimensional vector spaces, built via word cooccurrence in text corpora. The conceptual spaces approach to semantics (Gärdenfors, 2014) views word meanings as inhabiting regions of a conceptual space, with carefully chosen dimensions. This representation has some similarities but also key differences with the compositional distributional model of meaning. The project would examine a few key conceptual spaces such as 'colour', 'taste', or 'shape', for example, and investigate how these can be incorporated within the compositional distributional model of meaning. This might be undertaken by giving a categorytheoretic formalisation of these spaces, and identifying a suitable grammatical structure to implement composition, or alternatively, by analysing the behaviour of nouns, adjectives, and verbs within conceptual spaces, and fitting these to the pregroup grammar. Prerequisites: category theory References: 

Diagrammatic methods in computer security  Bob Coecke, Stefano Gogioso  Foundations, Structures, and Quantum  MSc  Throughout the last decade, categorical and diagrammatic methods have found enormous success in the field of quantum information theory, and are currently being extended to a variety of other disciplines. As part of this project, we will explore their application to classical cryptography and computer security. A number of distinct approaches are available, and the project can be tailored to each student's specific interests in the field. Prerequisites: One of Quantum Computer Science or Categorical Quantum Mechanics required. Computer Security (or equivalent experience) desirable, but not required. 

Effective Categorical Reasoning  Bob Coecke, Dan Marsden  Foundations, Structures, and Quantum  MSc  Category theory is an important tool in theoretical computer science. In introductory courses proofs are generally conducted by pasting commuting diagrams together, or simple equational reasoning. There are further proof styles in category theory, for example using string diagrams, “Yoneda style” proofs and the use of internal logics. The aim of this project would be to investigate and contrast these different approaches, showing how, and when, they can be used effectively, on realistic problems. A concrete starting point would be to understanding the techniques involved, and then apply them to non trivial proofs from the literature in order to demonstrate their relative benefits. An ideal outcome would be an example based account for computer scientists of how to reason efficiently in category theory. Prerequisites: A good understanding of elementary category theory and comfort with mathematical proofs is essential. Some experience with string diagrams, as for example in the quantum computer science courses in the department would also be helpful. 

Extending pregroup grammars for inflected languages  Bob Coecke, Martha Lewis  Foundations, Structures, and Quantum  MSc  Within the compositional distributional model of meaning developed by (Coecke et al., 2010), the model of grammar used is based on pregroup grammars, which have the feature that word order is important. In inflected languages, word order may become less important, and therefore pregroups may not be the best model of grammar. The project would involve extending the pregroup grammar to allow for different grammatical cases and to allow for more flexibility in word order. The model developed might simplify the approach taken in (Casadio and Lambek, 2005), which still uses the fundamentally ordered structure of pregroups as a basis for the grammar. Prerequisites: either category theory or quantum computer science would do References: 

Quantum Computing and Quantum Information, Logic, Category Theory, Fundamental Physics  Bob Coecke  Foundations, Structures, and Quantum  B  C  Bob Coecke is willing to supervise projects in the following areas. Please feel free to contact him.  Development and applications of the categorical quantum mechanics formalism and corresponding graphical languages http://en.wikipedia.org/wiki/Categorical_quantum_mechanics  Development of the quantomatic automated graphical reasoning software https://sites.google.com/site/quantomatic/  Theory development and empirical experiments for meaning in natural language processing http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf http://www.cs.ox.ac.uk/industry/content/IndustryNewsSummer2011.pdf 

Software development: quantomatic, automated graphical reasoning tools  Bob Coecke  Foundations, Structures, and Quantum  B  C  Recent graphbased formalisms for computation provide an abstract and symbolic way to represent and simulate quantum information processing. Manual manipulation of such graphs is slow and error prone. This project employs a formalism, based on monoidal categories, that supports mechanised reasoning with opengraphs. This gives a compositional account of graph rewriting that preserves the underlying categorical semantics.  
Syntax and semantics for nounnoun combinations  Bob Coecke, Martha Lewis  Foundations, Structures, and Quantum  MSc  In English differences in meaning of a compound noun phrase may be signalled by the intonation used. A 'TOY shop' is a shop that sells toys, whereas a 'toy SHOP' is a toy that is a shop. In the latter case the word 'toy' is used as an adjective and is more straightforwardly compositional, but in the former case the meaning of the compound is idiomatic (Kamp and Partee, 1995). The project would model these facets of meaning within the compositional distributional model of meaning (Coecke et al. 2010), and test the model thus developed within a semantic vector space based on text corpora. The project might use methods from (Coecke and Lewis, 2015), which models overextension in noun combinations, and (Kartsaklis and Sadrzadeh, 2015), which model differences in intonation. Prerequisites: Linear Algebra, either category theory or quantum computer science would do, Python (or similar language suitable for text processing) References: 

Tool Support for Graphical Categorical Reasoning  Bob Coecke, Dan Marsden  Foundations, Structures, and Quantum  MSc  Recent work has demonstrated how a suitable form of string diagrams can provide a powerful tool for reasoning in elementary category theory. Currently these diagrams are laid out and manipulated manually, as good layout choices are important for aiding the intuitions of readers. The project would involve investigating appropriate tool support for working with these diagrams, and ideally to provide a prototype implementation to demonstrate some of the ideas involved. The emphasis will be on convenient layout of practically useful diagrams that convey the correct geometric intuitions, rather than proof automation as provided by the Quantomatic tool for similar graphical calculi. Prerequisites: A good knowledge of elementary category theory is essential. Some experience with string diagrams, as for example in the quantum computer science courses in the department would be helpful. Good programming skills in at least one language will be needed for prototyping. 

Kinect interface development for cybersecurity visual analytics tool  Sadie Creese  Security  B  C  (Joint with Michael Goldsmith) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel humancomputer interfaces for security possible, depending on interest and inspiration.


Modelling of securityrelated interactions, in CSP or more evocative notations such as Milner's bigraphs  Sadie Creese  Security  C  (Joint with Michael Goldsmith) Modelling of securityrelated interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/TheSpaceMotionCommunicatingAgents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=82). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue. 

Smartphone security  Sadie Creese  Security  B  C  (Joint with Michael Goldsmith) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying opensource apps), possibly leading to prototyping a software tool to perform runtime checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.
Prerequisites:Concurrency, Concurrent Programming, Computer Security all possibly an advantage. 

Technologylayer social networks  Sadie Creese  Security  C  (joint with Michael Goldsmith) Technologylayer social networks: investigate the potential to identify relationships between people via technology metadata  which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensicstyle work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc. 

General game playing using inductive logic programming  Andrew Cropper  Artificial Intelligence and Machine Learning  B  C  In the general game playing competition, an agent is given the rules of a game (described as a logic program) and then starts playing the game (i.e. the agent generates traces of behaviour). This project will invert the task so that an agent is given traces of behaviour and then has to learn a set of rules that could produce the behaviour. The project will use techniques from inductive logic programming, a form of machine learning which learns computer programs from input/output examples. This work is mainly implementation and experimentation. Prerequisites: familiarity with logic programming (Prolog)  
Lifelong program induction  Andrew Cropper  Artificial Intelligence and Machine Learning  C  "The goal of program induction is to learn computer programs from input/output examples of a target program. This project will develop lifelong program induction techniques [1]. In this approach, rather than learn a single program in isolation, the goal is to learn many programs over time, allowing for solutions to be reused between tasks. The goal is to develop techniques to handle large amounts of background knowledge. This work is a mix of theory, implementation, and experimentation. [1] D. Lin, E. Dechter, K. Ellis, J.B. Tenenbaum, and S.H. Muggleton. Bias reformulation for oneshot function induction. In Proceedings of the 23rd European Conference on Artificial Intelligence (ECAI 2014), pages 525530, Amsterdam, 2014. IOS Press." Prerequisites: familiarity with statistical machine learning and ideally logic programming  
Logical reduction of higherorder Horn clauses  Andrew Cropper  Artificial Intelligence and Machine Learning  C  "The goal of this project is to study the logical redundancy of sets of higherorder Horn clauses. In particular, we want to know whether certain infinite sets of higherorder Horn clauses can be reduced to minimal finite sets. This work has implications for the field of program induction where higherorder Horn clauses are used as a form of inductive bias. This work is purely theoretical and builds on two existing papers [1,2] [1] Cropper A., Tourret S. (2018) Derivation Reduction of Metarules in Metainterpretive Learning. In: Riguzzi F., Bellodi E., Zese R. (eds) Inductive Logic Programming. ILP 2018. Lecture Notes in Computer Science, vol 11105. Springer, Cham [2] Cropper A., Muggleton S.H. (2015) Logical Minimisation of MetaRules Within MetaInterpretive Learning. In: Davis J., Ramon J. (eds) Inductive Logic Programming. Lecture Notes in Computer Science, vol 9046. Springer, Cham" Prerequisites: strong knowledge of mathematical logic and computational logic  
Machine learning efficient timecomplexity programs  Andrew Cropper  Artificial Intelligence and Machine Learning  C  "Metaopt [1,2] is an inductive logic programming (ILP) system that learns efficient logic programs from examples. For instance, given input/output examples of unsorted/sorted lists Metaopt learns quicksort (rather than bubblesort). However, Metaopt does not identify the complexity class of learned programs. The goals of this project are to (1) develop methods to identify the complexity class of a program during the learning, and (2) see whether the complexity information can improve the proof search. This work is a mix of theory, implementation, and experimentation. [1] A. Cropper and S.H. Muggleton. Learning efficient logic programs. Machine learning (2018). https://doi.org/10.1007/s1099401857126 [2] A. Cropper and S.H. Muggleton. Learning efficient logical robot strategies involving composable objects. In Proceedings of the 24th International Joint Conference Artificial Intelligence (IJCAI 2015), pages 34233429. IJCAI, 2015." Prerequisites: familiarity with general machine learning (and preferably computational learning theory) and ideally logic programming  
Machine learning functional programs  Andrew Cropper  Artificial Intelligence and Machine Learning  B  C  "Metainterpretive learning (MIL) [1,2] is a stateoftheart program induction system that learns computer programs from input/output examples of a target program. MIL is based on inductive logic programming, and learns logic programs (Prolog and ASP programs). The goal of this project is to see whether the MIL techniques can be adapted or translated into a functional paradigm as to learn functional programs. This work is a mix of theory, implementation, experimentation, but most of the work will be developing the system. [1] A. Cropper and S.H. Muggleton. Learning higherorder logic programs through abstraction and invention. In Proceedings of the 25th International Joint Conference Artificial Intelligence (IJCAI 2016), pages 14181424. IJCAI, 2016. [2] S.H. Muggleton, D. Lin, and A. TamaddoniNezhad. Metainterpretive learning of higherorder dyadic datalog: Predicate invention revisited. Machine Learning, 100(1):4973, 2015." Prerequisites: comfortable with functional programming and preferably comfortable with logic programming  
Relevancy of background knowledge in inductive logic programming  Andrew Cropper  Artificial Intelligence and Machine Learning  B  C  Inductive logic programming (ILP) is a form of machine learning which learns computer programs from input/output examples of a target program. To improve learning efficiency, ILP systems use background knowledge (i.e. auxiliary functions such as partition and append). However, most ILP systems cannot handle large amounts of background knowledge, and overcoming this limitation is a key challenge in ILP. The goal of this project is to explore techniques to identify relevant background knowledge. There is much freedom with this project, where one could focus on logical aspects, such as finding logically redundant background knowledge, or one could instead focus on statistical aspects, such as finding background knowledge most likely to be relevant for a given task. This work is a mix of theory, implementation, and experiments. Prerequisites: familiarity with statistics, statistical machine learning, and ideally logic programming  
Information Disclosure in Data Integration  Bernardo Cuenca Grau, Michael Benedikt, Egor Kostylev  Artificial Intelligence and Machine Learning, Information Systems  C  Data integration systems allow users to effectively access data sitting in multiple datasources (typically relational databases) by means of queries over a global schema. In practice, datasources often contain sensitive information that the data owners want to keep inaccessible to users. In a recent research paper, the project supervisors have formalized and studied the problem of determining whether a given data integration system discloses sensitive information to an attacker. The paper studies the computational properties of the relevant problems and also identifies situations in which practical implementations are feasible. The goal of the project is to design and implement practical algorithms for checking whether information disclosure can occur in a data integration setting. These algorithms would be applicable to the aforementioned situations for which practical implementations seem feasible. Prerequisites: Familiarity with Databases. The students would also benefit from taking the Knowledge Representation and Reasoning Course and/or Theory of Data and Knowledge Bases.  
Practical Confidentiality Enforcement Algorithms in Ontologybased Information Systems  Bernardo Cuenca Grau  Artificial Intelligence and Machine Learning  MSc  BACKGROUND Preserving confidentiality of information (i.e.,ensuring that sensitive data is only accessible to authorised users) is a critical requirement for the design of information systems. In recent years, Semantic Technologies have become widespread in many application domains. There is consequently a pressing need for suitable confidentiality enforcement infrastructure in information systems which rely on RDF as a data model, SPARQL as a query language, and OWL 2 as a language for describing background knowledge. Controlled Query Evaluation (CQE) is an approach to confidentiality enforcement where system administrators specify in a declarative way the information that cannot be disclosed to users by means of a confidentiality policy. When given a user query, a censor checks whether returning the answer would lead to a violation of the corresponding policy and thus to a disclosure of confidential information to unauthorised users; in that case, the censor returns a distorted answer. CQE has been recently studied in the context of ontologies and algorithms have been proposed in the case where the relevant ontologies are expressed in the OWL 2 RL profile—a fragment of OWL 2 for which query answering is known to be theoretically tractable in the size of both ontology and data, and efficiently implementable by means of rulebased technologies. PROJECT GOALS The main goals of this project are the following: • Implement the basic CQE algorithm for linear OWL 2 RL ontologies. This algorithm will be implemented on top of RDFox (http://www.cs.ox.ac.uk/isg/tools/RDFox/): a query answering engine for OWL 2 RL. • Study and implement potential optimisations. • Evaluate the implementation using both benchmark and realistic datasets. • Design and implement an extension of the basic algorithm that is applicable to a larger fragment of OWL 2 RL. PROJECT PREREQUISITES • Good programming skills in Java and/or C++ • The following courses are relevant for this project: o Knowledge representation and reasoning. o Theory of Data and Knowledge Bases o Databases o Computer Security 

Form Corpus & Benchmark  Tim Furche  Information Systems  B  C  (Supervisor C Schallhart) Web pages are the past since interactive web application interfaces have reshaped the online world. With all their feature richness, they enrich our personal online experience and provide some great new challenges for research. In particular, forms became much complex in assisting the user during the _lling, e.g., with completion options, or through structuring the form _lling process by dynamically enabling or hiding form elements. Such forms are a very interesting research topic but their complexity prevented so far the establishment of a corpus of modern forms to benchmark di_erent tools dealing with forms automatically. This MSC project will _ll this gap in building a corpus of such forms: Based on a number of production sites from one or two domains, we will build our corpus of web interfaces, connected to a (shared) database. Not only will the future evaluations in the DIADEM project rely on this corpus, but we will also publish the corpus { promoting it as a common benchmark for the research community working on forms. Knowledge in Java, HTML, CSS, Javascript, and web application development are required.  
Form Filling & Probing  Tim Furche  Information Systems  B  C  (Joint with C Schallhart) Unearthing the knowledge hidden in queryable web sites requires a good understanding of the involved forms. As part of DIADEM, we are developing OPAL (Ontology based web Pattern Analysis with Logic), a tool to recognize forms belonging to a parameterizable application domain, such as the real estate or used car market. OPAL determines the meaning of individual form elements, e.g., it identi_es the _eld for the minimum or maximum price or for some location. This MSC project will build upon OPAL to not only deal with static forms but also with sequences of interrelated forms, as in case of a rough initial form, followed by a re_nement form, or in case of forms showing some options only after _lling some other parts. Over the course of this MSC project, we will develop a tool which invokes OPAL to analyze a given form, to explore all available submission mechanisms on this form, analyze the resulting pages for forms continuing the initial query, and to combine the outcome all found forms into a single interaction description. Knowledge in Java, HTML, CSS are required, prior experience in logic programming would be a strong plus.  
Benchmarks for Bayesian Deep Learning: Astrophysics  Yarin Gal  Artificial Intelligence and Machine Learning  C  Bayesian deep learning (BDL) is a field of Machine Learning where we develop tools that can reason about their confidence in their predictions. A main challenge in BDL is comparing different tools to each other, with common benchmarks being much needed in the field. In this project we will develop a set of tools to evaluate Bayesian deep learning techniques, reproduce common techniques in BDL, and evaluate them with the developed tools. The tools we will develop will rely on downstream tasks that have made use of BDL in realworld applications such as parameter estimation in Strong Gravitational Lensing with neural networks. Prerequisites: only suitable for someone who has done Probability Theory, has worked in Machine Learning in the past, and has strong programming skills (Python).  
Data Science and Machine Learning Techniques for Model Parameterisation in Biomedical and Environmental Applications  David Gavaghan, Martin Robinson, Michael Clerx  Computational Biology and Health Informatics  B  C  "Time series data arise as the output of a wide range of scientific experiments and clinical monitoring techniques. Typically the system under study will either be undergoing time varying changes which can be recorded, or the system will have a time varying signal as input and the response signal will be recorded. Familiar everyday examples of the former include ECG and EEG measurements (which record the electrical activity in the heart or brain as a function of time), whilst examples of the latter occur across scientific research from cardiac cell modelling to battery testing. Such recordings contain valuable information about the underlying system under study, and gaining insight into the behaviour of that system typically involves building a mathematical or computational model of that system which will have embedded within in key parameters governing system behaviour. The problem that we are interested in is inferring the values of these key parameter through applications of techniques from machine learning and data science. Currently used methods include Bayesian inference (Markov Chain Monte Carlo (MCMC), Approximate Bayesian Computation (ABC)), and nonlinear optimisation techniques, although we are also exploring the use of other techniques such as probabilistic programming and Bayesian deep learning. We are also interested in developing techniques that will speed up these algorithms including parallelisation, and the use of Gaussian Process emulators of the underlying models Application domains of current interest include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. Application domains of current interest include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. " Prerequisites: some knowledge of Python  
Topics in Continuous Maths, Discrete Maths, Linear Algebra, Objectoriented Programming, Machine Learning and Probability and Computing  David Gavaghan  Computational Biology and Health Informatics  B  C  Prof Dave Gavaghan is willing to supervise in the area of Topics in Continuous Maths, Discrete Maths, Linear Algebra, Objectoriented Programming, Machine Learning and Probability and Computing.  
Benchmarking SATsolvers via necklace splitting  Paul Goldberg  Algorithms and Complexity Theory  C  We consider the problem of generating challengeinstances (or benchmarks) for SAT solvers.We propose to investigate the usage of the "NECKLACESPLITTING" problem https://en.wikipedia.org/wiki/Necklace_splitting_problem to create such challenges instances, since this problem is hard but has guaranteed solutions. The project involves generating instances of NECKLACESPLITTING, converting them to SAT instances, and passing them to various SATsolvers to test the SATsolvers' performance. It will also be of interest to know whether certain constrained kinds of solutions are harder to find via this approach. There is scope to experiment with alternative ways of generating the random instances of NECKLACESPLITTING.  
Cakecutting with low envy  Paul Goldberg  Algorithms and Complexity Theory  C  Description: Icutyouchoose is the classical way for two people to share a divisible good. For three people, there exists a sequence of operations using 5 cuts, that is also envyfree, but for 4 or more people, it is unknown whether you can share in an envyfree manner, using a finite number of cuts. (This is with respect to a wellknown class of procedures that can be represented using a tree whose nodes are labelled with basic "cut" and "choose" operations.) The general idea of this project is to generate and test large numbers of potential cakecutting procedures, and measure the "extent of envy" in the case where they are not envyfree. (See wikipedia's page on "cakecutting problem".) It is of interest to find out the amount of envy that must exist in relatively simple procedures. Prerequisites: competance and enthusiasm for program design and implementation; mathematical analysis and proofs. 

Computing prices in a generalisation of the ProductMix Auction  Paul Goldberg  Algorithms and Complexity Theory  C  The ProductMix Auction was devised by Klemperer in 2008 for the purpose of providing liquidity to commercial banks during
the financial crisis; it was used for a number of years by the Bank of England. See the following link: 

Construction of finite automata from examples  Paul Goldberg  Algorithms and Complexity Theory  C  Description: The aim of the project is to allow a user to construct a finite automaton (or alternatively, a regular expression) by providing examples of strings that ought to be accepted by it, in addition to examples that ought not to be accepted. An initial approach would be to test simple finite automata against the strings provided by the user; more sophisticated approaches could be tried out subsequently. One possibility would be to implement an algorithm proposed in a wellknown paper by Angluin, "Learning regular sets from queries and counterexamples". Prerequisites: competance and enthusiasm for program design and implementation; familiarity with finite automata, contextfree langauges etc.; mathematical proofs. 

Medical information extraction with deep neural networks  Paul Goldberg, Alejo NevadoHolgado  Algorithms and Complexity Theory  MSc  Background. Electronic Health Records (EHRs) are the databases used by hospital and general practitioners to daily log all the information they record from patients (i.e. disorders, taken medications, symptoms, medical tests…). In number of subjects (e.g. 50 million patients in the case of EMIF http://www.emif.eu/), EHRs are the largest source of empirical data in biomedical research, allowing for major scientific findings in central disorders such as cancer and Alzheimer’s disease [1]. However, most of the information held in EHRs is in the form of natural language text (written by the physician during each session with each patient), making it inaccessible for research. Unlocking all this information would bring a very significant advancement to biomedical research, multiplying the quantity and variety of scientifically usable data, which is the reason why major efforts have been relatively recently initiated towards this aim (e.g. I2B2 challenges https://www.i2b2.org/NLP/ or the UKCRIS network of EHRs https://crisnetwork.co/ukcrisprogramme) Project. Recent Deep Neural Networks (DNN)[2] architectures have shown remarkable results in traditionally unsolved NLP problems, including some IE tasks such as Slot Filling [3] and Relation Classification [4]. When transferring this success to EHRs, DNNs offer the advantage of not requiring well formatted text, while the problem remains of labelled data being scarce (ranging on the hundreds for EHRs, rather than the tens of thousands used in typical DNN studies). However, ongoing work in our lab has shown that certain extensions of recent NLPDNN architectures can reproduce the typical remarkable success of DNNs in situations with limited labelled data (paper in preparation). Namely, incorporating interaction terms to feed forwards DNN architectures [5] can rise the performance of relation classification in I2B2 datasets from 0.65 F1 score to 0.90, while the highest performance previously reported with the same dataset was 0.74. With an F1 score of 0.90, the quality of the extracted information meets the standards required for such information to be used in subsequent biomedical studies, promising to unlock the scientific data that at present is hidden in the free text of EHRs We therefore propose to apply DNNs to the problem of information extraction in EHRs, using I2B2 and UKCRIS data as a testbed. More specifically, the DNNs designed and implemented by the student should be able to extract medically relevant information, such as prescribed drugs or diagnoses given to patients. This corresponds to some of the challenges proposed by I2B2 during recent years (https://www.i2b2.org/NLP/Medication/), and are objectives of high interest in UKCRIS which have sometimes been addressed with older techniques such as rules [1,6,7]. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. . The DNN should be implemented in Python’s Keras (https://keras.io/), Theano (http://deeplearning.net/software/theano/), Tensorflow (https://www.tensorflow.org/), or PyTorch (http://pytorch.org/). Bibliography: G. Perera, M. Khondoker, M. Broadbent, G. Breen, R. Stewart, Factors Associated with Response to Acetylcholinesterase Inhibition in Dementia: A Cohort Study from a Secondary Mental Health Care Case Register in London, PLOS ONE. 9 (2014) e109484. doi:10.1371/journal.pone.0109484. Y. LeCun, Y. Bengio, G. Hinton, DL  Deep learning, Nature. 521 (2015) 436–444. doi:10.1038/nature14539. G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. HakkaniTur, X. He, L. Heck, G. Tur, D. Yu, G. Zweig, Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding, IEEEACM Trans. Audio Speech Lang. Process. 23 (2015) 530–539. doi:10.1109/TASLP.2014.2383614. C.N. dos Santos, B. Xiang, B. Zhou, Classifying Relations by Ranking with Convolutional Neural Networks, CoRR. abs/1504.06580 (2015). http://arxiv.org/abs/1504.06580. M. Denil, A. Demiraj, N. Kalchbrenner, P. Blunsom, N. de Freitas, Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network, CoRR. abs/1406.3830 (2014). http://arxiv.org/abs/1406.3830. E. Iqbal, R. Mallah, R.G. Jackson, M. Ball, Z.M. Ibrahim, M. Broadbent, O. Dzahini, R. Stewart, C. Johnston, R.J.B. Dobson, Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register, PLOS ONE. 10 (2015) e0134208. doi:10.1371/journal.pone.0134208. R.G. Jackson MSc, M. Ball, R. Patel, R.D. Hayes, R.J. Dobson, R. Stewart, TextHunter – A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research, AMIA. Annu. Symp. Proc. 2014 (2014) 729–738.  
Neural network genomics  Paul Goldberg, Alejo NevadoHolgado  Algorithms and Complexity Theory  MSc  Background. For more than 10 years, GWAS studies have represented a revolution in the study of human disorders and human phenotypes. By measuring how your risk of suffering any given disease changes according to SNP mutations, GWAS studies can measure how relevant each gene is to the disease under study. However, this SNPdisease association is calculated “one SNP at a time”, and ignores all genegene interactions that are often crucially important to the causation of the disorder. If the interactions between a number of genes (e.g. insulin and its receptor in type 2 diabetes; or APP and alpha, beta and gamma secretase in Alzheimer…) is what produces a given disorder, this interaction will not be detected in a GWAS analysis. This shortcoming may not be a problem in monogenetic hereditable disorders, such as Huntington disease, where mutations in a single gene by itself are enough for causing the disease. However, GWAS studies will likely not uncover the mechanisms of complex disorders, where the disease emerges from the interaction of a number of genes. This is likely the source of the so called “missing hereditability” problem observed in most complex traits or diseases, where all the SNP variations taken together cannot account for most of the hereditability of a given trait or disease [1]. In addition, it has been demonstrated that complex traits such as height and BMI are clearly and strongly hereditable [2], but GWAS studies simply cannot detect most of this hereditability. In summary, GWAS analyses detect simple individual genetic factors, but not interactions between genetic factors. Project. In this project, we propose to identify the interacting genetic factors behind Alzheimer’s disease and other neurodegenerations with neural networks, which are known for exploiting the interactions present in the tobe analysed data. While the linear models used in GWAS studies are able to identify only linear and monovariated contributions of each gene to a disorder, neural networks can analyse how genes interact with each other to explain the studied disorder. This versatility is what has allowed neural networks to find such widespread use in industry in the last decade, where they are revolutionizing image, sound and language analysis [3–5]. In our laboratories we already have the hardware and the expertise required to build neural networks, and we have used them in other research areas relevant to Alzheimer’s disease. In addition, we have access and experience using UK Biobank, which is the ideal dataset to implement this project. In UK Biobank they have measured wide array SNPs, all disorders and demographics in ~500,000 participants between the ages of 37 and 73, and more than 5000 of them suffer from Alzheimer’s disease, Parkinson’s disease or other neurodegenerations. We propose the MSc student to build a neural network to predict either diagnoses or diseaserelated endophenotypes (i.e. brain volumes of affected areas, cognitive scores…) of each one of these participants, using only the information present in the wide array SNPs and relevant demographics. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Python’s Keras (https://keras.io/), Theano (http://deeplearning.net/software/theano/), Tensorflow (https://www.tensorflow.org/), or PyTorch (http://pytorch.org/).  
Rank aggregation  Paul Goldberg  Algorithms and Complexity Theory  C  Description: In social choice theory, a general theme is to take a set of rankings of a set of candidates (also known as alternatives) and compile an "overall" ranking that attempts to be as close as possible to the individual rankings. Each individual ranking can be thought of as a vote that we want to compile into an overall decision. The project involves taking some realworld data, for example university league tables, and computing aggregate rankings to see which individual votes are closest to the consensus. In the case of the Kemeny consensus, which is an NPcomplete rank aggregation rule, it is of interest to exploit heuristics that may be effective on realworld data, and see for how large a data set can the Kemeny consensus be computed. Prerequisites: familiarity with polynomialtime algorithms, NPhardness; interest in computational experiments on data 

Rockpaperscissors for the computerised strategist  Paul Goldberg  Algorithms and Complexity Theory  C  In the wellknown game of rockpaperscissors, it is clear that any player can "break even" by playing entirely at random. On the other hand, people do a poor job of generating random numbers, and expert players of the game can take advantage of predictable aspects of opponents' behaviour. In this project, we envisage designing algorithms that adapt to human opponent's behaviour, using for example noregret learning techniques, and modelling the opponent as a probabilistic automaton. Ideally, the student taking this project should manage to persuade some volunteers to test the software! This is needed to provide data for the algorithms, and should provide feedback to the opponent on what mistakes they are making. Depending on how it goes, we could also consider extending this approach to related games. Prerequisite: Interest in working with probability is important. 

Using behavioural data to model a gametheoretic adversary  Paul Goldberg  Algorithms and Complexity Theory  MSc  To take the project, a student should have done the courses in Computational Learning Theory, and Machine Learning. The project is related to the paper (Yang et al.) below, and uses a data set (studied in the paper) obtained from an online game that simulates an adversarial security setting, in which the player selects one of a number of targets to attack, based on information about their values, and the probabilities that they will be defended. The purpose of this game is to understand a player's "bounded rationality", i.e. the ways in which he fails to make the best decision. The project would study a model of player behaviour, related to ones studied in (Yang et al.), and aim to learn the ways in which a player fails to bestrespond. A related objective is to design optimal defender behaviour based on what is learned about attackers' behaviour. @article{YKOTJ13, author="R.\ Yang and C.\ Kiekintveld and F.\ Ord{\'o}{\~n}ez and M.\ Tambe and R.\ John", title="Improving resource allocation strategies against human adversaries in security games: An extended study", journal="Artificial Intelligence", volume="195", pages="440469", year="2013", }  
A Conceptual Model for Assessing Privacy Risk  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  Privacy is not a binary concept, the level of privacy enjoyed by an individual or organisation will depend upon the context within which it is being considered; the more data at attacker has access to the more potential there may be for privacy compromise. We lack a model which considers the different contexts that exist in current systems, which would underpin a measurement system for determining the level of privacy risk that might be faced. This project would seek to develop a prototype model – based on a survey of known privacy breaches and common practices in data sharing. The objective being to propose a method by which privacy risk might be considered taking into consideration the variety of (threat and datasharing) contexts that any particular person or organisation might be subjected to. It is likely that a consideration of the differences and similarities of the individual or organisational points of view will need to be made, since the nature of contexts faced could be quite diverse.  
Applying AI Techniques to Autocategorise Violent Images  Michael Goldsmith, Harjinder Lallie  Security  B  C  "The analysis of violent images in digital forensics cases can take a long time and be traumatising for digital investigators. Artificial intelligence (AI) techniques can aid in automating the categorisation of violent images, and datasets such as the Hollywood violent scenes dataset (https://www.technicolor.com/dream/researchinnovation/violentscenesdataset) can be used to aid the categorisation. This project involves developing a tool which uses AI techniques to categorise violent images. Please note that you will not be required to look at 'real' violent images, the images in the dataset referenced herein are from Hollywood films such as Reservoir Dogs, Armageddon, Harry Potter and the Order of the Phoenix, Billy Elliot, Kill Bill 1 and The Wizard of Oz Useful reference(s): C.H. Demarty,C. Penet, M. Soleymani, G. Gravier. VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. In Multimedia Tools and Applications, May 2014. " Prerequisites: suitable for someone who has some AI experience. Knowledge/experience of digital forensics is not critical  
Applying AI Techniques to Estimate Suspect/Victim Age  Michael Goldsmith, Harjinder Lallie  Security  B  C  "Law enforcement agencies are often tasked with determining the age of a suspect or victim. In these circumstances, age determination is often guided by an investigator's professional judgement and can be subjectively biased. This project requires you to apply AI techniques to estimate the age of people in a dataset of images. Datasets such as DEX (available from https://data.vision.ee.ethz.ch/cvl/rrothe/imdbwiki/) provide exciting opportunities for testing age estimation algorithms. Useful reference(s): Rothe, R., Timofte, R. and Van Gool, L., 2015. Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 1015)." Prerequisites: Suitable for someone who has some AI experience. Knowledge/experience of digital forensics is not critical  
Computer Vision for Physical Security  Michael Goldsmith, Sadie Creese, Jassim Happa  Security  B  C  Computer Vision allows machines to recognise objects in realworld footage. In principle, this allows machines to flag potential threats in an automated fashion based on historical and present images in video footage of realworld environments. Automated threat detection mechanisms to help security guards identify threats would be of tremendous help to them, especially if they have to temporarily leave their post, or have to guard a significant number of areas. In this project, students are asked to implement a system that is able to observe a real environment over time and attempt to identify potential threats, independent of a number of factors, e.g. lighting conditions or wind conditions. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. One approach might be to implement a camera system using e.g. a web camera or a Microsoft Kinect to conduct anomaly detection on realworld environments, and flag any issues related to potential threats. Requirements: Programming skills required 

Considering the performance limiters for controls  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  The behavior of controls will be dependent upon how they are used. One obvious example being the protection afforded by a firewall is dependent upon the maintenance of the rules that determine what the firewall stops and what it does not. The benefit of various technical controls in operational context lacks good evidence and data, so there is scope to consider the performance of controls in a lab environment. This miniproject would select one or more controls from the CIS Top 20 Critical Security Controls (CSC) (version 6.1) and seek to develop laboratory experiments (and implement them) to gather data on how the effectiveness of the control is impacted by its deployment context (including, for example, configuration, dependence on other controls, threat faced). Requirements: Students will need an ability to develop a testsuite and deploy the selected controls. 

Cybersecurity visualization  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jassim Happa  Security  B  C  Cybersecurity visualization helps analysts and risk owners alike to make better decisions about what to do when the network is attacked. In this project the student will develop novel cybersecurity visualizations. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the visualizations they develop. These projects tend to have a focus on network traffic visualization, but the student is encouraged to visualize datasets they would be most interested in. Other interesting topics might for instance include: hostbased activities (e.g. CPU/RAM/network usage statistics), network scans (incl. vulnerabilities scanning). Past students have visualized network traffic patterns and android permission violation patterns. Other projects on visualizations are possible (not just cybersecurity), depending on interest and inspiration. Requirements: Programming skills required. 

Datapoisoning  Michael Goldsmith  Security  B  C  How worried should we be about a malign entity deliberately changing or poisoning our data? Is there risk? What is its nature? What harms might be effected? Can we build dataanalytics that are resistant to such attacks, can we detect them? It is highly unlikely that techniques for handling erroneous data will be sufficient since we are likely to face highly targeted datacorruption.  
Design /Architectural Vulnerability analysis of Distributed Ledgers  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  This project would seek to study the general form of distributed ledgers, and the claimed nuances and general for in implementations, and assess all the possible week points that might make implementations open to compromise. The general approach will be to develop a detailed understanding of the security requirements and interdependencies of functionality – capturing the general security case for a distributed ledger and how it decomposes into lower level security requirements. Then an assessment will be made of each, and the potential for vulnerabilities in design and implementation considered. The ultimate result being a broad analysis of potential weakpoints. If possible these will then be practically investigated in a labbased environment. One output might be a proposal for testing strategies. 

Detecting disguised processes using ApplicationBehaviour Profiling  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  MSc  In order to avoid detection, malware can disguise itself as a legitimate program or hijack system processes to reach its goals. Commonly used signaturebased Intrusion Detection Systems (IDS) struggle to distinguish between these processes and are thus only of limited use to detect these kind of attacks. They also have the shortcoming that they need to be updated frequently to possess the latest malware definitions. This makes them inherently prone to missing novel attacks. Misuse detection IDSs however overcome this problem by maintaining a ground truth of normal application behaviour and reporting deviations as anomalies. In this project, students will be tasked to investigate how a process’ behaviours can be profiled in the attempt to identify whether it is behaving anomalously and if it can be correctly identified as malware. This project will build on existing research in this area. Requirements: Programming skills required. 

Experimenting with anomaly detection features for performance  Michael Goldsmith, Ioannis Agrafiotis, Sadie Creese, Arnau Erola  Security  B  C  Considering relative detection performance using different feature sets, and different anomalies of interest, in the face of varying attacks. Conducted with a view to exploring the minimal sets that would result in the threat detection, and producing guidance that is aimed at determining the critical datasets required for the control to be effective.  
Eyetracker data analysis  Michael Goldsmith, Jassim Happa, Ivan Martinovic  Security  MSc  Eyetracking allows researchers to identify where observers look on a monitor. A challenge in analysing eyetracker output
however is identifying patterns of viewing behaviour from the raw eyetracker logs over time and what they mean in a research
context, particularly as no semantic information about the pixel being viewed is considered. In practice, this means that
we know where someone is looking, but nothing about what it means to look at that pixel or the order of pixels. From a research
perspective, being able to tag areas of interest, and what those areas mean, or conduct other statistical analysis on viewing
patterns would be of significant use to analysing results, particularly if these results could be automated. The purpose of
this project is to conduct an eyetracker experiment, and design and implement useful methods to analyse eyetracker data. We
will be able to provide training for the student, so they are able to use the eyetracking tools themselves. Other eyetracking
projects also possible, depending on interest and inspiration. 

Formal threat and vulnerability analysis of a distributed ledger model  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  This project would utilise the process algebra CSP and associated model checker FDR to explore various types of threat and how they might successfully compromise a distributed ledger. This type of modelling would indicate possible attacks on a distributed ledger, and could guide subsequent review of actual designs and testing strategies for implementations. The modelling approach would be based on the cryptoprotocol analysis techniques already developed for this modelling and analysis environment, and would seek to replicate the approach for a distributed ledger system. Novel work would include developing the model of the distributed ledger, considering which components are important, formulating various attacker models and also formulating the security requirements / properties to be assessed using this modelchecking based approach. Requirements: In order to succeed in this project students would need to have a working knowledge of the machine readable semantics of CSP and also the model checker FDR. An appreciation of threat models and capability will need to be developed. 

High Dynamic Range Imaging for Documentation Applications  Michael Goldsmith, Jassim Happa, Sadie Creese  Security  B  C  High dynamic range imaging (HDRI) allows more accurate information about light to be captured, stored, processed and displayed to observers. In principle, this allows viewers to obtain more accurate representations of realworld environments and objects. Naturally, HDRI would be of interest to museum curators to document their objects, particularly, nonopaque objects or whose appearance significantly alter dependent on amount of lighting in the environment. Currently, few tools exist that aid curators, archaeologists and art historians to study objects under userdefined parameters to study those object surfaces in meaningful ways. In this project the student is free to approach the challenge as they see fit, but would be expected to design, implement and assess any tools and techniques they develop. The student will then develop techniques to study these objects under userspecified conditions to enable curators and researchers study the surfaces of these objects in novel ways. These methods may involve tone mapping or other modifications of light exponents to view objects under nonnatural viewing conditions to have surface details stand out in ways that are meaningful to curators.  
Intelligent user activity timelines  Michael Goldsmith  Security  B  C  "Operating system store temporal data in multiple locations. Digital investigators are often tasked with reconstructing timelines of user activities. Timeline generation tools such as log2timeline can aid in extracting temporal data, similarly, 'Professonal' tools such as Encase and Autopsy build and visualise low level timelines. Collectively, these tools: (1) provide (often high levels of) low level data, and (2) are not able to apply any form of reasoning. This project involves the extraction of temporal data and the application of reasoning algorithms to develop reliable event sequences of interest to an investigator. Useful references: Olsson, J. and Boldt, M., 2009. Computer forensic timeline visualization tool. digital investigation, 6, pp.S78S87. Buchholz, F.P. and Falk, C., 2005, August. Design and Implementation of Zeitline: a Forensic Timeline Editor. In DFRWS."  
International Cybersecurity Capacity Building Initiatives  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  There is a large investment being made by the international community aimed at helping nations and regions to develop their capacity in cybersecurity. The work of the Global Cyber Security Capacity Centre (based at the Oxford Martin School) studies and documents this: https://www.sbs.ox.ac.uk/cybersecuritycapacity/content/front. There is scope to study in more detail the global trends in capacity building in cybersecurity, the nature of the work and the partnerships that exist to support it. An interesting analysis might be to identify what is missing (through comparison with the Cybersecurity Capacity Maturity Model, a key output of the Centre), and also to consider how strategic, or not, such activities appear to be. An extension of this project, or indeed a second parallel project, might seek to perform a comparison of the existing efforts with the economic and technology metrics that exist for countries around the world, exploring if the data shows any relationships exist between those metrics and the capacity building activities underway. This analysis would involve regression techniques.  
Modelling of securityrelated interactions, in CSP or more evocative notations such as Milner's bigraphs  Michael Goldsmith  Security  B  C  (Joint with Sadie Creese) Modelling of securityrelated interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/TheSpaceMotionCommunicatingAgents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=82). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue.  
Organisational Resilience and Self* replanning  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jassim Happa  Security  MSc  Key Performance Indicator (KPI) is a type of performance measurement in organisations. They evaluate the success of activities in which an organisation or system engages. Often success is measured in terms of whether an activity has been able to complete and able to repeat (e.g. zero defects, customer satisfaction or similar), other times it is measured in terms of making progress towards a strategic goal. Wellchosen KPIs allows us to reflect upon performance of an organisation, and possibly identify potential future degradation issues in systems. In this project, students are tasked to investigate how KPIs can be used to measure and improve cyberresilience in an organisation. Particularly, we are interested in investigating how the performance of an organisation, particularly with respects to security as well as mission performance. After identifying which aspects of an organisation or the mission are prone to error, it may be beneficial to propose solutions for how these issues can be addressed. With "self*replanning", we believe it would be possible for a system to suggest and automate certain aspects of how the mission, infrastructure or organisation can be repurposed to not fail. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. For instance, the project may be networksecurity focused, missionprocesses focused or otherwise. It may also investigate more formal approaches to self* replanning methods. Requirements: Programming and/or Social Science Research Methods. 

Penetration testing for harm  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  Current penetration testing is typically utilised for discovering how organisations might be vulnerable to external hacks, and testing methods are driven by using techniques determined to be similar to approaches used by hackers. The result being a report highlighting various exploitable weakpoints and how they might result in unauthorised access should a malign entity attempt to gain access to a system. Recent research within the cybersecurity analytics group has been studying the relationship between these kinds of attack surfaces and the kinds of harm that an organisation might be exposed to. An interesting question would be whether an orientation around intent, or harm, might result in a different test strategy; would a different focus be given to the kinds of attack vectors explored in a test if a particular harm is aimed at. This miniproject would aim to explore this question by designing penetration test strategies based on a set of particular harms, and then seek to consider potential differences with current penetration practices by consultation with the professional community. Requirements: Students will need to have a working understanding of penetration testing techniques.  
Photogrammetry  Michael Goldsmith, Jassim Happa  Security  B  C  Photogrammetry is a set of techniques that allows for 3D measurements from 2D photographs, especially those measurements pertaining to geometry or surface colours. The purpose of this project is to implement one or more photogrammetry techniques from a series of 2D photographs. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the tool they develop.  
Predicting exposure to risk for active tasks  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  Prior research has been considering how we might better understand and predict the consequences of cyberattacks based on knowledge of the business processes, people and tasks and how they utilise the information infrastructure / digital assets that might be exposed to specific attack vectors. However, this can clearly be refined by moving to an understanding of those tasks live or active at the time of an attack propagating across a system. If this can be calculated, then an accurate model of where risk may manifest and the harm that may result can be constructed. This project would explore the potential for such a model through practical experimentation and development of software monitors to be placed on a network aimed at inferring the tasks and users that are active based from network traffic. If time allows then hostbased sensors might also be explored (such as on an application server) to further refine the understanding of which users and live on which applications etc. Requirements: Students must be able to construct software prototypes and have a working knowledge of network architectures and computer systems. 

Procedural Methods in Computer Graphics  Michael Goldsmith, Jassim Happa  Security  B  C  Procedural methods in computer graphics help us develop content for virtual environments (geometry and materials) using formal grammars. Common approaches include fractals and lsystems. Examples of content may include the creation of cities, planets or buildings. In this project the student will develop an application to use create content procedurally. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the methods they develop. These projects tend to have a strong focus designing and implementing existing procedural methods, but also includes a portion of creativity. The project can be based on reality  e.g. looking at developing content that has some kind of basis on how the real world equivalent objects were created (physicallybased approaches), or the project may be entirely creative in how it creates content. Past students for instance have built tools to generate cities based on real world examples and nonexistent city landscapes, another example include building of procedural planets, including asteroids and earthlike planetary bodies. 

Reflectance Transformation Imaging  Michael Goldsmith, Jassim Happa, Stefano Gogioso  Foundations, Structures, and Quantum, Security  B  C  Reflectance Transformation Imaging (RTI) is a powerful set of techniques (the first of which known as Polynomial Texture Maps, PTMs) that enables us to capture photographs of objects under a several lighting conditions. Combined, these RTI images form a single photograph in which users can relight these objects by moving the light sources around the hemisphere in front of the object, but also specify userdefined parameters, including removing colour, making the objects more specular or diffuse in order to investigate the surface details in depth. It can be used for forensic investigation of crime scenes as well as cultural heritage documentation and investigation. The purpose of this project is to implement RTI methods of their preference.  
Resilience – metrics and tools for understanding organisational resilience  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola, Jassim Happa  Security  MSc  Resilience in the face of cyberattack is considered a key requirement for organisations. Prior work within the cybersecurity analytics group has been developing a resilience model for organisations, but there remains a commonly agreed set of metrics that organisations can use to determine just how resilient they are (at a given point in time and continuously). This miniproject would seek to propose a set of metrics, and if time allows tools for applying them (although the latter may well better suit a following DPhil). The approach will be to consider the development of metrics to measure the various characteristics or organisational capabilities / behaviours that are determined to be necessary in order to be resilient. It will be necessary to consider how the metrics might vary according to, or take account of, different threat environments. It may also be interesting to consider if there are refinements of the resilience requirements that apply to different types of organisations. Data analytics approaches will need to be considered, in order to apply the metrics, and students might consider the use of visualisation to help with this. The resulting metrics will be validated in consultation with security professionals and organisations possessing resilience related experience. In the context of the miniproject this is most likely achievable via a small number of interviews, with a more detailed validation and iterative design approach being supported by a longitudinal study working with 2 or 3 organisations which might adopt and test the metrics 

Smartphone security  Michael Goldsmith  Security  B  C  (Joint with Sadie Creese) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying opensource apps), possibly leading to prototyping a software tool to perform runtime checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration. Prerequisites:Concurrency, Concurrent Programming, Computer Security all possibly an advantage. 

Technologylayer social networks  Michael Goldsmith  Security  C  (Joint with Sadie Creese) Technologylayer social networks: investigate the potential to identify relationships between people via technology metadata  which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensicstyle work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc.  
Tripwires  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  At Oxford we have developed a framework for understanding the components of an attack, and documenting known attack patterns can be instrumental in developed tripwires aimed at detecting the presence of insiders in a system. This project will seek to develop a library of such tripwires based on a survey of openly documented and commented upon attacks, using the Oxford framework. There will be an opportunity to deploy the library into a livetrial context which should afford an opportunity to study the relative utility of the tripwires within large commercial enterprises. The miniproject would also need to include experimenting with the tripwires in a laboratory environment, and this would involve the design of appropriate test methods  
Understanding Enterprise Infrastructure Dependencies  Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola  Security  B  C  There are many tools available for detecting and monitoring cyberattacks based on network traffic and these are accompanied by a wide variety of tools designed to make alerts tangible to security analysts. By comparison, the impact of these attacks on an organisational level has received little attention. An aspect that could be enhanced further is the addition of a tool facilitating management and updating of our understanding of business processes, but also how those processes are dependent on a network infrastructure. This tool could facilitate the mapping between company strategies, activities needed to accomplish company goals and map these down to the network and people assets. At the top of the hierarchy lies the board, responsible for strategic decisions. These decision are interpreted in the managerial level and could be captured and analysed with business objective diagrams. These diagrams in return could be refined further to derive business processes and organisational charts, ensuring that decision made in the top level will be enforced in the lower levels. The combination of business processes and organisation charts could eventually provide the network infrastructure. For this project we suggest a student could develop novel algorithms for mapping of business processes to network infrastructures in an automated way (given the updated business process files). That said, the student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. Other projects on business process modelling also possible, depending on interest and inspiration. 

 Parity Games over Random Graphs  Julian Gutierrez  Artificial Intelligence and Machine Learning  C  Parity games are a tool for formal verification. For instance, many model checking problems can be reduced to the solution of parity games. The exact complexity of these games is unknown: they can be solved in NP and in coNP, but it is not known whether a polynomial time algorithm exists, despite more than 3 decades of research. However, in practice, most parity games are solved efficiently. This project aims at defining new algorithms for parity games and evaluating them in practice, in particular, over random graphs. Prerequisites: Discrete Mathematics, Intro. to Formal Proof, Logic and Proof Desirable: ComputerAided Formal Verification, Computational Complexity  
Bisimilarity in Logics for Strategic Reasoning  Julian Gutierrez  Artificial Intelligence and Machine Learning  C  Nash equilibrium is the standard solution concept for multiplayer games. Such games have multiple applications in Logic and Semantics, Artificial Intelligence and MultiAgent Systems, and Verification and Computer Science. Unfortunately, Nash equilibria is not preserved under bisimilarity, one of the most important behavioural equivalences for concurrent systems. In a recent paper it was shown that this problem may not arise when certain models of strategies are considered. In this project the aim is to investigate further implications of considering the new model of strategies. For instance, whether a number of logics for strategic reasoning become invariant under bisimilarity if the new model of strategies is considered, whether some logics that are unable to express Nash equilibria can do so with respect to the new model of strategies, and whether the results already obtained still hold more complex classes of systems, for instance, where nondeterminism has to be considered. Prerequisites: Discrete Mathematics, Introduction to Formal Proof, Logic and Proof Desirable: ComputerAided Formal Verification, Computational Complexity Project type: theory 

Rational Verification in EVE (replace Bisimilarity in Logics for Strategic Reasoning)  Julian Gutierrez  Artificial Intelligence and Machine Learning  C  EVE (Equilibrium Verification Environment) is a formal verification tool for the automated analysis of temporal equilibrium properties of concurrent and multiagent systems represented as multiplayer games. In EVE, systems are modelled using the Simple Reactive Module Language (SRML) as a collection of independent system components (players/agents in a game), which are assumed to have goals expressed using Linear Temporal Logic (LTL) formulae. EVE can be used to check the existence of Nash equilibria in such systems and verify which temporal logic properties are satisfied in the equilibria, problems that belong to the Rational Verification framework. The goal of this project is to extend the verification capabilities of EVE, either through the implementation of new or already existing algorithms for Rational Verification. Prerequisites: Discrete Mathematics; ComputerAided Formal Verification, Computational Complexity Desirable: Automata, Logic and Games; Computational Game Theory Project type: Practice  
Solving Parity Games in Practice (to replace Parity Games over Random Graphs)  Julian Gutierrez  Artificial Intelligence and Machine Learning  C  Parity games are a tool for formal verification. For instance, many model checking problems can be reduced to the solution of parity games. The exact complexity of these games is unknown: they can be solved in NP and in coNP, but it is not known whether a polynomial time algorithm exists, despite more than 3 decades of research. Recently, a quasipolynomial algorithm for parity games was developed, which reinforces the believe that parity games may be efficiently solved in practice. The goal of this project is to produce an implementation of the new algorithm and to evaluate its performance in practice, in particular, with respect to the set of algorithms already implemented, for instance, those that can be solved using PGSolver. The study should be conducted using a set of benchmarks as well as on random graphs. Prerequisites: Discrete Mathematics; ComputerAided Formal Verification, Computational Complexity Desirable: Automata, Logic and Games; Computational Game Theory Project type: Practice  
Towards Tractable Strategic Reasoning in Strategy Logic  Julian Gutierrez  Artificial Intelligence and Machine Learning  C  MSc  Strategy Logic (SL) is a temporal logic to reason about strategies in multiplayer games. Such games have multiple applications in Logic and Semantics, Artificial Intelligence and MultiAgent Systems, and Verification and Computer Science. SL is a very powerful logic for strategic reasoning  for instance, Nash equilibria and many other solution concepts in game theory can be easily expressed  which has an undecidable satisfiability problem and a nonelementary model checking problem. In this project, the aim is to study fragments of SL which can potentially have better results (decidability and complexitity) with respect to the satisfiability and model checking problems. The fragments to be studied can be either syntactic fragments of the full language or semantic fragments where only particular classes of models are considered. Prerequisites: Discrete Mathematics, Introduction to Formal Proof, Logic and Proof; ComputerAided Formal Verification, Computational Complexity Desirable: Automata, Logic and Games, Computational Game Theory. Project type: theory 

Boolean operations on semilinear sets  Christoph Haase  Automated Verification  C  Semilinear sets are the sets of integers definable in Presburger arithmetic, the firstorder theory of the integers with addition and order. Their property of being closed under all Boolean operations makes semilinear sets a prime choice for finitely representing infinite sets of integers. The goal of this project is to give an exposition of algorithms for performing Boolean operations on semilinear sets, as well as developing a prototype implementation of those algorithms and integrating them into the opensource SMTsolver Z3.  
Convex quadratic optimisation modulo linear arithmetic  Christoph Haase  Automated Verification  C  Optimisation modulo theories is a new paradigm in which one optimises an objective function subject to a Boolean combination of linear constraints. The opensource SMTsolver Z3 currently only supports optimising linear objective functions. The goal of this project is to add support for optimising convex quadratic functions to Z3 based on the classical FrankWolfe algorithm.  
Decision Procedures for Linear Arithmetic Theories  Christoph Haase  Automated Verification  C  Arithmetic theories are logical theories for reasoning about number systems, such as the integers, rationals, and reals, together with associated arithmetic operations such as addition and multiplication. The goal of this project is to give a comprehensive overview over different approaches to deciding linear arithmetic theories based on quantifier elimination, finitestate automata and generating sets  
Parallel Algorithms for Computing Hilbert Bases  Christoph Haase  Automated Verification  B  C  Given a homogeneous system of linear equations A x = 0, a Hilbert basis is a unique finite minimal set of nonnegative solutions from which every nonnegative solution of the system can be generated. Computing Hilbert bases is a fundamental problem encountered in various areas in computer science and mathematics, for instance in decision procedures for arithmetic theories, the verification of infinitestate systems and pure combinatorics. In this project. we plan to revisit an approach to computing Hilbert bases described in [1] that is highly parallelizable. With the ubiquity of multicore architectures, it seems conceivable that, with proper engineering, this approach will outperform existing approaches. The goal of this project is to deliver an implementation of the algorithm of [1] and benchmark it against competing tools. If successful, an integration into the SageMath platform could be envisioned. [1] https://pdfs.semanticscholar.org/6e84/f9ccfdaa37cb33cfc2024aec1dc7d13964c7.pdf Prerequisites: good knowledge of concurrent programming and data structures, and linear algebra  
The Complexity of Deciding Whether Ordering is Necessary in a Presburger Formula  Christoph Haase  Automated Verification  C  It has been shown in [1] that the sets of integers definable in the firstorder theory of the integers with addition and order, FO(Z,+,<=), is a strict superset of those sets of integers definable when only equality instead of order is allowed (i.e. those definable in FO(Z,+,=)). The goal of this project is to determine the computational complexity of deciding whether a given sentence of FO(Z,+,<=) is definable in FO(Z,+,=). [1] C. Choffrut and A. Frigeri. Deciding whether the ordering is necessary in a Presburger formula. Discrete Mathematics and Theoretical Computer Science, 12(1):20–37, 2010. URL https://www.dmtcs.org/dmtcsojs/index.php/dmtcs/article/view/1300/0.html.  
The Complexity of the FirstOrderTheory of the Integers with Addition  Christoph Haase  Automated Verification  B  C  The goal of this project is to determine the computational complexity of the firstorder theory of the integers with addition and equality FO(Z,+,=), but without order. This theory is a strict fragment of Presburger arithmetic, an arithmetic theory that finds numerous applications, for instance, in the verification of infinitestate systems. We aim for making an original scientific contribution by determining what time and space resources are needed in order to decide a sentence in FO(Z,+,=). If time permits, an implementation of the decision procedure developed in this project could be envisioned. Prerequisites: good familiarity with firstorder logic, linear algebra and computational complexity  
Bioinformatics Projects  Jotun Hein  Computational Biology and Health Informatics  B  C  Jotun Hein has been in Oxford since 2001 and has supervised many students from Computer Science. His main interest is computational biology and combinatorial optimization problems in this area, especially from phylogenetics, biosequence analysis, population genetics, grammars and the origin of life. Some idea for projects can be obtained by browsing these pages: https://heingroupoxford.com/learningresources/topicsincomputationalbiology/ https://heingroupoxford.com/previousprojects/ A student is also welcome to propose his or her own project. 

Analysing useralgorithm interaction  Marina Jirotka  Human Centred Computing  MSc  "Members of the Human Centred Computing Group are currently involved in the UnBias research project ( http://unbias.wp.horizon.ac.uk/). This investigates the user experience of algorithmdriven internet platforms. As part of this project we have been conducting observational data collection in which pairs of volunteers are video recorded whilst browsing online. The on screen activity is also captured and the data are being analysed to identify instances of useralgorithm interaction; in particular instances in which an algorithmic outcome – such as the auto complete suggestion in a search bar or the filtering mechanisms on a recommendation platform – might shape the trajectory of browsing activity. We will shortly undertake a second wave of data collection in which users browse online but are placed in situations in which their usual experience of certain algorithmic processes will be disrupted. We are looking for a student to conduct analysis on the collected data and to then build on this analysis by designing and conducting their own disruption experiment. The student can set their own specific research questions for the analysis, and determine what particular kind of methodological approach they would like to undertake. :There are no prerequisites. The study will suit a student interested in developing skills in social research methods and human computer interaction approaches. We will assist in securing ethical clearance and recruiting research participants for the data collection activities.  
User testing of ‘fairness toolkit’ to address algorithmic bias  Marina Jirotka, Helena Webb  Human Centred Computing  MSc  Members of the Human Centred Computing Group are currently involved in the UnBias research project ( http://unbias.wp.horizon.ac.uk/). This investigates the user experience of algorithmdriven internet platforms and explores contemporary concerns that algorithmic processes can lead to inaccurate, unfair or even discriminatory outcomes. As part of the project we are developing a ‘fairness toolkit’ of material resources that are designed to raise awareness and debate over key project themes. The toolkit will consist of: 1) Awareness Cards – a set of cards intended to help young people (secondary school age, roughly 1217) devise and explore scenarios that illustrate how bias in algorithmic systems can affect them; 2) TrustScapes – a tool for mapping (on large sheets of paper) how young people visualise how they perceive the issues of algorithmic bias, data protection and online safety and what they would like to see done to make the online world fair and trustworthy; and 3) Empathy MetaMaps – a tool for “stakeholders”  such as employees of online platforms, developers, technologists, researchers, policymakers, regulators, public agency officials, civil servants etc.  to collaboratively/cooperatively respond to the TrustScapes shared by young people as a sector. We are looking for a student to conduct several rounds of user testing of the tools. This testing will take the form of group activities, focus groups, interviews and surveys etc. In addition to conducting the testing the student will analyse the data collected in order to assess the quality of the toolkit, identify ways in which it might be refined and devise an impact strategy for the toolkit in practice. Prerequisites: There are no prerequisites. The study will suit a student interested in developing skills in social research methods and human computer interaction approaches, as well as in gaining experience in public engagement. It would therefore particularly suit a student interested in undertaking doctoral study. We will assist in providing training in the methods to be used in the study as well as assistance with securing ethical clearance and recruiting research participants 

Compilation of a CSPlike language  Geraint Jones  Programming Languages  B  C  This is a compiler project, also requiring familiarity with concurrency. The parallel programming language occam is essentially an implementable sublanguage of CSP. The aim of this project is to produce a small portable implementation of a subset of occam; the proposed technique is to implement a virtual machine based on the inmos transputer, and a compiler which targets that language. One of the aims of the project is to implement an extension to occam which permits recursion; more ambitious projects might implement a distributed implementation with several communicating copies of the virtual machine. Other possibilities are to produce separate virtual machines, optimised for displaying a simulation, or for efficiency of implementation, or translating the virtual machine code into native code for a real machine. 

Logic circuit workbench  Geraint Jones  Programming Languages  B  C  This is an interactive programming project with some logic simulation behind it. The idea is to produce a simulator for a traditional logic breadboard. The user should be able to construct a logic circuit by drawing components from a library of standard gates and latches and so on, and connecting them together in allowable ways. It should then be possible to simulate the behaviour of the circuit, to control its inputs in various ways, and to display its outputs and the values of internal signals in various ways. The simulator should be able to enforce design rules (such as those about not connecting standard outputs together, or limiting fanout) but should also cope with partially completed circuits; it might be able to implement circuits described in terms of replicated subcircuits; it should also be able to some sort of standard netlist. It might be that this would make a project for two undergraduates: one implementing the interface, the other implementing a simulator that runs the logic. 

Modelling of arithmetic circuits  Geraint Jones  Programming Languages  B  C  This is a project in the specification of hardware, which I expect to make use of functional programming. There is a great deal of knowledge about ways (that are good by various measures) of implementing standard arithmetic operations in hardware. However, most presentations of these circuits are at a very low level, involving examples, diagrams, and many subscripts, for example these descriptions. The aim of this project is to describe circuits like this in a higherlevel way by using the higher order functions of functional programming to represent the structure of the circuit. It should certainly be possible to execute instances of the descriptions as simulations of circuits (by plugging in simulations of the component gates), but the same descriptions might well be used to generate circuit netlists for particular instances of the circuit, and even to produce the diagrams. 

Toys for Animating Mathematics  Geraint Jones  Programming Languages  B  C  The aim is to take some mathematics that would be within the grasp of a mathematicallyinclined sixthformer and turn it into some attentiongrabbing web pages. Ideally this reveals a connection with computing science. I imagine that this project necessarily involves some sort of animation, and I have visions of Open University television maths lectures. The programming need not be the most important part of this project, though, because some of the work is in choosing a topic and designing problems and puzzles and the like around it. There's a lot of this sort of thing about, though, so it would be necessary to be a bit original. 

Extensible SearchBased Theorem Prover  Mark Kaminski  Information Systems  C  "Background: Searchbased proof methods such as those based on DPLL, tableau and connection calculi are widely used in automated theorem proving, ontological reasoning, and automated verification. As most implementations of such methods are developed with a particular logic, algorithm, and set of applications in mind, it is often difficult to extend them with new features. Project goals: The project aims to develop a modular, extensible base for implementing searchbased decision procedures for propositional, modal, and description logics. A key objective is to develop an abstract proof search engine that can be instantiated with different logics, proof rules, and search strategies. The capabilities of the engine should be demonstrated by implementing a simple tableaubased or DPLLbased reasoner for propositional or modal logic. Prerequisites: good functional programming skills, familiarity with logic and automated theorem proving.  
Effect of Noise in NNTraining  Varun Kanade  Artificial Intelligence and Machine Learning  B  C  One of the interesting approaches to reducing overfitting in neural networks is to add noise to the inputs and activations before performing a gradient step. The key insight is that this noise injection prevents the learnt weights from being too delicately balanced to fit the data; some kind of robustness is necessary to fit noisy data. Another interesting consequence of noisy data is that recent work shows that learning algorithms using noisy data may be better at protecting privacy of the data. Thus, there may be twin advantages to this approach. This project will involve understanding the backgroud in this topic, performing simulations to understand the behaviour and hopefully developing new theories. This project may involve collaboration with Mr. Alexis Poncet and Dr. Thomas Steinke.  
Evolution as Computational Learning  Varun Kanade  Artificial Intelligence and Machine Learning  B  C  Evolution is basically a form of learning, where the search happens through variation caused by mutations, recombination and other factors, and (natural) selection is a feedback mechanism. There has been much recent work in understanding evolution through a computational lens. One of the fundamental building blocks of life is circuits where the production of protein is controlled by other ones (transcription factors); these circuits are known as transcription networks. Mathematical models of transcription networks have been proposed using continuoustime Markov processes. The focus of the project is to use these models to understand the expressive power of these networks and whether simple evolutionary algorithms, through suitably guided selection, can result in complex expressive patterns. The work will involve both simulations and theory.  
Predicting Election Results Using Social Media & Fundamentals  Varun Kanade  Artificial Intelligence and Machine Learning  B  C  With the upcoming French elections and the mixed performance of polls in recent months in mind, the goal of this project will be to see to what extent fundamentals, such as economic indicators, demographic indicators, incumbents, etc. and the use of social media such as twitter, facebook etc. can be used to predict election results. The project will involve a survey of past methods used, as well as data collection and model fitting. This project is openended (and hence potentially risky); the main aim is to get good results using historical and current data. It would be helpful if the student has good programming experience as wel as knowledge of various different machine learning techniques. The project will involve collabortion with Dr. Vincent CohenAddad (Copenhagen)  
A fast numerical solver for multiphase flow  David Kay  Computational Biology and Health Informatics  B  C  The AllenCahn equation is a differential equation used to model the phase separation of two, or more, alloys. This model may also be used to model cell motility, including chemotaxis and cell division. The numerical approximation, via a finite difference scheme, ultimately leads to a large system of linear equation. In this project, using numerical linear algebra techniques, we will develop a computational solver for the linear systems. We will then investigate the robustness of the proposed solver. 

Efficient solution of one dimensional airflow models  David Kay  Computational Biology and Health Informatics  B  C  In this project we will investigate the use of numerical and computational methods to efficiently solve the linear system of equations arising from a one dimensional airflow model within a network of tree like branches. This work will build upon the methods presented within the first year Linear Algebra and Continuous Maths courses. All software to be developed can be in the student’s preferred programming language. General area: Design and computational implementation of fast and reliable numerical models arising from biological phenomena.  
The Square Root Law of Steganography: Empirical Validation  Andrew Ker  Security  B  C  Steganography means hiding a hidden payload within an apparentlyinnocent cover, usually an item of digital media (in this project: images). Steganalysis is the art of detecting that hiding took place. A key question is how the amount of information that can be securely hidden (i.e. such that detectors have a high error rate) scales with the size of the cover. In 2008 I coauthored a paper showing that my theoretical "square root law" was observed experimentally, using stateoftheart (for 2008) hiding and detection methods. This project is to run similar experiments using methods 10 years more modern. It would involve combining offtheshelf code (some in MATLAB, some in Python) from various researchers and running fairly large scale experiments to measure detection accuracy versus cover size and payload size in tens of thousands of images, then graphing the results suitably. Prerequisites: No particular prerequisites. Ability to piece together others' code and draw graphs nicely.  
Transparent SessionLayer Steganography  Andrew Ker  Security  B  C  Steganography means hiding a hidden payload within an apparentlyinnocent cover, usually an item of digital media. This project is to implement a transparent proxy which uses something like a webcam video stream to hide steganographic packets. A local process should receive communication on a local socket and merge it with the webcam data stream, a process which can be reversed at the receiver. Prerequisites: It is necessary to have some experience working with video codecs, for example experience contributing to the H.264/5 codec, in order to place the payload in a webcam stream. It is definitely NOT possible to learn the format within the time available. Don't ask if it is possible to read up on it instead: it isn't.  
Twitter Pictures as a Steganographic Channel  Andrew Ker  Security  B  C  Steganography means hiding a hidden payload within an apparentlyinnocent cover, usually an item of digital media. Most pure research focuses on bitmap or JPEG images, or simple video codes. In practice, there are often further constraints: the carrier might recompress or damage the object. This project is primiarly programming. First we must characterize the properties of image transmission through Twitter, and then provide an implementation of image steganography through it. This might be a complete piece of software with a nice interface, if the channel is straightforward, or a proofofconcept if the channel causes difficult noise (in which case we will need suitable errorcorrecting codes.) Prerequisites: Linear algebra. You will need a Twitter account and to learn the Twitter API without help from me. It would be useful to know a little about the JPEG image format before starting the project.  
Visualiation of Steganalysis  Andrew Ker  Security  B  C  Steganography means hiding a hidden payload within an apparentlyinnocent cover, usually an item of digital media (in this project: images). Steganalysis is the art of detecting that hiding took place. The most effective ways to detect steganography are machine learning algorithms applied to "features" extracted from the images, trained on massive sets of known cover and stego objects. The images are thus turned into points in highdimensional space. We have little intuition as to the geometrical structure of the features (do images form a homogeneous cluster? do they scale naturally with image size? how much and in what ways do images from different cameras differ?), or how they are altered under embedding (do they move in broadly the same direction? is there is linear separator of cover from stego?). This is a programming project that creates a visualization tool for features, extracting them from images and then projecting them onto 2dimensional space in interesting ways, while illustrating the effects of embedding. Prerequisites: Linear Algebra, Computer Graphics. Machine Learning an advantage.  
Faceted Search over (Semantic) Web  Evgeny Kharlamov  Information Systems  MSc  An increasing number of applications are relying on RDF, OWL and SPARQL for storing, publishing and querying data. SPARQL however, is not targeted towards endusers, and suitable query interfaces are needed. Faceted search is a prominent endusers oriented approach to data access; during the last decade it became a mainstream commercial technology ubiquitous in ecommerce (e.g., Amazon.com, Booking.com). This student’s project aims at development and implementation of novel algorithms and/or interfaces for faceted search over RDF databases enhanced with OWL ontologies. The student’s project will be conducted within the SemFacet project (www.cs.ox.ac.uk/isg/tools/SemFacet/). Preferable knowledge: Java, RDF, and HCI, 

Query formulation interfaces over semantically enhanced Big Data.  Evgeny Kharlamov  Information Systems  MSc  Enabling endusers access to industrial Big Data is a challenging and important problem in many large companies. A largescale EU funded project Optique (www.optiqueprojectspage) addresses this problem with Semantic Web technologies and aims at developing a system for Big Data access deployed at Statoil and Siemens. In Oxford (www.oxfpageforoptique) we contribute in both research and development for several components of this system. In particular, we develop query formulation interfaces (QFIs) powered with Semantic Web technologies that endusers, such as geologists or energy engineers, can use to access Big Data. This student’s project aims at development of novel solutions for QFIs over Semantically enhanced Big Data: development (and adaptation) of techniques for ranking query answers and components of visual query formulation interfaces; development of techniques to support visual formulation of aggregate queries; implementation and experiments with real and synthetic data. Preferable knowledge: Java, Databases, Semantic Web technologies. 

Semiautomatic installation support for OntologyBased Data Access systems (OBDA).  Evgeny Kharlamov  Information Systems  MSc  OBDA is a prominent approach for an easy, endusers oriented, access to collections of databases. The approach relies on the Semantic Web technologies and its key idea is to use an ontology to mediate between the user and databases. The ontology provides a semantic abstractions from the technical details of databases. It is connected to the databases by means of socalled mappings. This student’s project aims at development and implementation of novel techniques to (i) extract ontologies from relational metadata (ii) extract mappings to existing ontologies. The implemented techniques would be tested on different databases. The student’s project will be conducted within a large scale Optique project (www.optiqueprojectspage) that aims at development of a system for endusers access to industrial Big Data in Statoil and Siemens. Preferable knowledge: Java, Databases, Semantic Web technologies. 

Bots for a Board Game  Stefan Kiefer  Automated Verification  B  C  The goal of this project is to develop bots for the board game Hex. In a previous project, an interface was created to allow future students to pit their gameplaying engines against each other. In this project the goal is to program a strong Hex engine. The student may choose the algorithms that underly the engine, such as alphabeta search, MonteCarlo tree search, or neural networks. The available interface allows a comparison between different engines. It is hoped that these comparisons will show that the students' engines become stronger over the years. The project involves reading about gameplaying algorithms, selecting promising algorithms and datastructures, and design and development of software (in Java or Scala).  
Small matrices for irrational nonnegative matrix factorisation  Stefan Kiefer  Automated Verification  B  C  Recently it was shown (https://arxiv.org/abs/1605.06848) that there is a 6x11matrix M with rational nonnegative entries so that for any factorisation M = W H (where W is a 6x5matrix with nonnegative entires, and H is a 5x11matrix with nonnegative entries) some entries of W and H need to be irrational. The goal of this project is to explore if the number of columns of M can be chosen below 11, perhaps by dropping some columns of M. This will require some geometric reasoning. The project will likely involve the use of tools (such as SMT solvers) that check systems of nonlinear inequalities for solvability. Prerequisites: The project is mathematical in nature, and a good recollection of linear algebra is needed. Openness towards programming and the use of tools is helpful.  
"KLEEPROVER"  Daniel Kroening  Automated Verification  MSc  Alter the CBMC symbolic execution engine to create a dynamic symbolic execution tool similar to KLEE. Possible features include: • Distributed and incremental support using the caching scheme here • A feature to run BMC as you go to explore the traces 'around' the execution traces • Use path exploration as the witness generation part of an ACDL process. • Perform precondition inference, possibly using ACDL.  
Applications of deductive machine learning  Daniel Kroening  Automated Verification  C  The goal of deductive machine learning is to provide computers with the ability to automatically learn a behaviour that
provably satisfies a given highlevel specification. As opposed to techniques that generalise from incomplete specifications
(e.g. examples), deductive machine learning starts with a complete problem description and develops a behaviour as a particular
solution. 

Automatic Diverse Machine Code Generator  Daniel Kroening, Tom Melham  Automated Verification  MSc  Supervisors: Daniel Kroening, Tom Melham, Alastair Donaldson The target of this project is safety relevant, embedded, automotive applications (steering, braking, airbags, drive by wire controls) where the computational state of an embedded controller can be disturbed by transient effects (EM and RF fields, ESD and voltage transients). The suggested project is to demonstrate a system taking one image of machine code and using a binary to binary translator, transform it into a functionally equivalent image (a diverse image) which has a high degree of architectural diversity (differing opcode, differing intermediate values etc.). The two images can then be independently executed in realtime embedded controllers and the outputs crosschecked to hopefully detect disturbances to the machine state and operations. Having a diverse executable is advocated in the new ISO26262 automotive functional safety standard  but no tools or methods exist today to automatically derive diverse images. A related project could be to measure the degree of diversity between code images, although the best measure might be based on the hardware footprint of the software  at a lower level, it could be measured via some architectural properties, unique instructions/register resources/memory address ranges/etc. There is scope in this project for collaboration project with Infineon Technologies (Bristol, UK).  
Bug Hunting in Linux  Daniel Kroening  Automated Verification  MSc  Model Checkers for concurrent programs have made tremendous progress in the past years. It's time to revisit the Linux Kernel with these methods, and investigate scalability and applicability of automated program analysis in this context. This is an applied project. 

CORAL / Q  Daniel Kroening  Automated Verification  MSc  Build something similar to the Q tool in SLAM.  
Coding Standards Checker  Daniel Kroening  Automated Verification  MSc  Build instrumenters for various coding standards, e.g., AutoSar, MISRAC, CERT C HIC++, etc.  
Combine Uninitialized Variables Analysis with Alias Analysis  Daniel Kroening  Automated Verification  MSc  gotoinstrument features an instrumentation to track cases of usebeforedefine, i.e., reading of uninitialized variables. At present, the analysis does not take into account aliasing information. As such, the following code will yield a spurious counterexample: int main() { int x; int *ptr; ptr=&x; *ptr=42; if(x < 42) return 1; return 0; }  
Compute Program Dependence Graphs  Daniel Kroening  Automated Verification  MSc  Program dependence graphs (PDGs) form an essential element for slicing. The reaching definitions analysis is particularly well suited to derive a clean and precise implementation of program dependence graphs, because it is fieldsensitive. Make SAMATE Tests Work Read this paper http://www.sciencedirect.com/science/article/pii/S0950584913000384 and make the tests mentioned in there work.  
Connect to GDB  Daniel Kroening  Automated Verification  MSc  GDB has a client/server system to allow the debugger to be controlled from other programs. It would be awesome to have a feature for CBMC that allows a trace to be run via gdb. This would allow easy integration with existing debugging tools (an IDE plugin could simply drop the programmer into the debugger with an execution trace that causes the problem), some really cool CEGAR like tricks and some fun with parallel programs using multiple debuggers / the thread support and possibly reverse debugging to allow error traces to be explored. To do this will require working out which functions are nondeterministic models of real functions and accounting for this. This would also be necessary for some of the preceeding ideas.  
Constant Folding  Daniel Kroening  Automated Verification  MSc  With the recently added reaching definitions analysis, it is straightforward to implement a program transformation that performs constant folding.To ensure soundness in concurrent settings, it should be combined with the existing is_threaded analysis. (A concurrencyaware version of reaching definitions now exists and is awaiting a merge.) Along the way, assertions that become trivially true should be eliminated.  
Dalvik FrontEnd  Daniel Kroening  Automated Verification  MSc  Connect to an existing unpacker or add to the Java byte code frontend so that it can handle Dalvik / Android binaries. Some work implementing the Android libraries will be needed to make this useful.  
Decision Procedures  Daniel Kroening  Automated Verification  MSc  A list of projects can be found here  
Enhancing the CBMC symbolic execution tool to support mixed concrete + symbolic execution  Daniel Kroening  Automated Verification  MSc  The purpose of this enhancement is to solve cases where pure symbolic execution fails (incomplete decision procedures and handling of native libraries) while still maintaining its advantages. Previous work in [1] and [2] have mixed concrete and symbolic execution by using run time values of program variables to simplify constraints that could not be handled previously, or using concrete solutions of the solvable constraints in the current symbolic path condition. The project would involve implementing an approach similar to [2] into the existing CBMC tool. [1] P. Godefroid, N.Klarlund, and K.Sen. DART: Directed Automated Random Testing. SIGPLAN Not., 40(6):213223, 2005. [2] C. Pasareanu, N. Rungta, and W. Visser, Symbolic Execution with Mixed ConcreteSymbolic Solving. In International Symposium on Software Testing and Analysis (ISSTA), July 2011.  
Error Explanation  Daniel Kroening  Automated Verification  MSc  Reimplement the algorithm described in this paper. http://www.kroening.com/papers/STTTerrorexplanation2005.pdf  
FrontEnd Testing  Daniel Kroening  Automated Verification  MSc  Run the regressions mentioned here and the GCC C Torture tests with gotocc and a gotoprogram interpreter (mild dependency on the interpreterrelated task above).  
Hardware footprinting of software  Daniel Kroening, Tom Melham  Automated Verification  MSc  The target of this project is safety relevant, embedded, automotive applications (steering, braking, airbags, drive by wire controls) where the computational state of an embedded controller must be checked before the system can be enabled for each drive cycle. The suggested topic is to outline and demonstrate a system of determining the hardware (gates/transistors/memories) within a controller architecture that is required to execute a given binary image for all possible data input  the 'silicon footprint' of an executable. This 'silicon footprint' would be useful in verifying the diagnostic coverage of a given selfchecking code, for targeting error injection into the computation for test purposes, and to optimise power required for common used items. Such proof of diagnostic coverage (ability to detect static and transient faults in the processing core) is requested in the new ISO26262 automotive functional safety standard  but no tools or methods exist today to measure the 'silicon footprint' of an algorithm, so claims on diagnostic coverage are mostly derived from generic silicon failure rates, which is a suboptimal approach. There is scope in this project for collaboration project with Infineon Technologies (Bristol, UK). 

Improved Interpreter  Daniel Kroening  Automated Verification  MSc  CProver has an interpreter for goto programs but it could be improved, for example adding support for dynamic analysis, calls to external functions and possibly even mixed concrete and symbolic execution. It would also be good to be able to interpret traces found by BMC to catch mismatches / bugs between the logical model and the bit blasting models. This would also be useful for more advanced CEGAR like techniques  
Projects being offered by D Kroening  Daniel Kroening  Automated Verification  B  C  Daniel Kroening is happy to supervise projects related to
For a list of sample projects, see here.


Separation Logic in SMT Solvers  Daniel Kroening  Automated Verification  MSc  Reasoning about the heap is one of the biggest challenges in program analysis. The goal of this project is to evaluate ways of combining Satisfiability Modulo Theories with a decision procedure for a fragment of Separation Logic. 

Static Detection of Data Races  Daniel Kroening  Automated Verification  MSc  It would be good to have a tool that checks for data races in a similar fashion to DRD / Helgrind. The current pointer analysis can over approximate shared variables, so possibly some combination of this, trace generation, DRD / Helgrind and then refinement of which traces to pick? Perhaps this is even something ACDL like?  
Unspecified C Behaviours  Daniel Kroening  Automated Verification  MSc  Build instrumentation to check for unspecified behaviours in C, as per Section J.2 (and maybe also J.3) of the C standard. Talk to Michael Tautschnig or Martin Brain for this item. You can point to http://www.cprover.org/wiki/starter_projects for further information.  
kliveness  Daniel Kroening  Automated Verification  MSc  Implement kliveness for softwareliveness checking.  
Automated protocol synthesis from requirements  Marta Kwiatkowska  Automated Verification  MSc  Network protocols are sets of rules that define how information is exchanged between computers or agents on a network. Many communication protocols, to mention Bluetooth, can be modelled as timed automata or their extensions. Protocols typically have the same algorithmic structure but may differ in values of parameters, e.g. timing delays. This project aims to develop a method for synthesising protocols from requirements given in temporal logic so that they are correct by construction. The idea is to combine templatebased synthesis (“Templatebased Program Verification and Program Synthesis”, Journal on Software Tools for Technology Transfer, 2012) with parameter synthesis for timed automata developed and implemented recently (http://www.veriware.org/bibitem.php?key=DKKM14). This way, the algorithmic structure will be given as a parametric template, and appropriate parameter values can be selected automatically to optimise a given quantitative requirement. This project would suit a student interested in theory and/or constraint solving.  
Autonomous urban driving  Marta Kwiatkowska  Automated Verification  MSc  This project is concerned with synthesising strategies for autonomous driving directly from requirements expressed in temporal logic, so that they are correct by construction. Probability is used to quantify information about hazards, such as accidents hotspots. Inspired by the DARPA Urban Challenge, a method for synthesising strategies (controllers) from multiobjective requirements was developed and validated on map data for villages in Oxfordshire (http://www.prismmodelchecker.org/bibitem.php?key=CKSW13). The idea is to develop the techniques further, to allow highlevel navigation based on waypoints, and to develop strategies for avoiding threats, such as road blockage, at runtime. In the longer term, the goal is to validate the methods on realistic scenarios in collaboration with the Mobile Robotics Group. The project will suit a student interested in theory or software implementation. For more information about the project see http://www.veriware.org/autonomous.php  
Controller synthesis for robot coordination  Marta Kwiatkowska  Automated Verification  MSc  Autonomous robots have numerous applications in scenarios such as warehouse management, planetary exploration, or search and rescue. In view of environmental uncertainty, such scenarios are modelled using Markov decision processes. The goals (e.g. the goods should be delivered to location A while avoiding the hazardous location B) can be conveniently specified using temporal logic, from which correctbyconstruction controllers (strategies) can be generated. This project aims to develop a PRISM model of a system of robots for a particular scenario so that safety and effectiveness of their cooperation is guaranteed. Techniques based on machine learning, and specifically realtime dynamic programming (http://www.prismmodelchecker.org/papers/atva14.pdf), will be utilised to generate controllers directly from temporal logic goals. This project will suit a student interested in machine learning and software implementation.  
Driver modelling  Marta Kwiatkowska  Automated Verification  MSc  Semiautonomous driving is intended to improve safety, and involves, for example, taking control of the car from the driver if a collision is predicted. Probabilistic models of driver behaviour are needed in order to monitor whether the driver is attentive or distracted, and to predict if the manoeuvre, such as the driver swerving to the neighbouring lane, is safe. Recently, a driver model was developed and analysed using PRISM (http://www.prismmodelchecker.org/bibext.php?search=driver) and validated against data from the Carsim simulator. This project aims to extend this model in a number of ways, firstly, with more realistic threats, and, secondly, with active threats; the latter can be achieved using stochastic games and the PRISMgames model checker (http://www.prismmodelchecker.org/games/). In the longer term, the goal is to validate the model with real traffic data in collaboration with the Mobile Robotics Group. This project will suit a student interested in modelling.  
Heartbeat identification for smartphones  Marta Kwiatkowska  Automated Verification  MSc  Today’s smartphones and fitness gadgets provide a variety of apps for health monitoring. Monitoring the heartbeat has also been proposed for use as a password, to ensure security and privacy, and this idea is currently being developed in e.g. the Nymi wristband. As part of ongoing research, techniques have been developed to analyse the ECG signal to facilitate personalisation of pacemaker devices, see http://www.veriware.org/bibitem.php?key=KMP14. This project aims to implement the method proposed in the paper “Human Identification Using Heartbeat Interval Features and ECG Morphology”, Proceedings of Seventh International Conference on BioInspired Computing: Theories and Applications (BICTA 2012), Springer, 2013. The implementation will be evaluated using a hardware testbed to be provided. The project will suit a student interested in embedded software. For more information about the pacemaker project see http://www.veriware.org/pacemaker.php.  
Modelling and analysis of energy usage for RFID protocols  Marta Kwiatkowska  Automated Verification  MSc  Radio frequency identification (RFID) is used for object identification with application to devices connected to the Internet of Things. RFID protocols can benefit from formal verification techniques, and particularly quantitative verification with which a number of cost measures can be analysed. Recently, a Markov model was developed for performance analysis (“Performance Analysis of RFID Protocols: CDMA Versus the Standard EPC Gen2”, IEEE Transactions on Automation Science and Engineering, vol.11, no.4, pp.12501261, 2014). The idea is to develop a parametric PRISM model (www.prismmodelchecker.org) based on this Markov model and analyse its energy usage characteristics. The advantage of the parametric model is that it can be used to automatically determine optimal parameter values that guarantee that a given property is satisfied. The PRISM model of the ZigBee protocol (http://www.prismmodelchecker.org/casestudies/zigbee.php) may be useful for this purpose. This project would suit a student interested in probabilistic modelling and Internet of Things.  
Modelling and verification of DNA programs  Marta Kwiatkowska  Automated Verification  MSc  DNA molecules can be used to perform complex logical computation. DNA computation differs from conventional digital computation and is sometimes referred to as ‘computing with soup’ http://www.economist.com/node/21548488. Correct design of DNA devices is errorprone and the task is supported by tools such as the DNA Strand Displacement (DSD) programming language and simulator (http://research.microsoft.com/enus/projects/dna/default.aspx) developed at Microsoft. The DSD tool enables the probabilistic model checking of DNA circuits and has been used to identify a flaw in a DNA transducer gate (see http://qav.comlab.ox.ac.uk/bibitem.php?key=LPC+12). The aim of this project is to model and analyse DNA implementation of logic inference proposed in “Autonomous Resolution Based on DNA Strand Displacement”, DNA Computing and Molecular Programming, LNCS 6937, 2011. The project will suit a student interested in modelling DNA programs using DSD and/or PRISM. For more information about the DNA computing project see http://www.veriware.org/dna.php.  
Modelling trust in humanrobot interaction  Marta Kwiatkowska  Automated Verification  MSc  When human users interact with autonomous robots, appropriate notions of computational trust are needed to ensure that their interactions are safe and effective: too little trust can lead to user disengagement, and too much trust may cause damage. Trust management systems have been introduced for autonomous agents on the Internet, but need to be adapted to the setting of mobile robots, taking into account intermittent connectivity and uncertainty on sensor readings. Recently, a quantitative reputationbased trust model for usercentric networks has been modelled and analysed using PRISMgames, http://www.prismmodelchecker.org/games/, an extension of the PRISM model checker which supports stochastic multiplayer games as models and objectives expressed in temporal logic (http://www.prismmodelchecker.org/bibitem.php?key=KPS13). This project aims to develop a quantitative trust management system that is suitable for mobile robots. Initially, a simplified setting will be considered and an approach to modelling trust developed for this setting. The project will suit a student interested in modelling and/or software implementation.  
Probabilistic model checking for cellular microcolonies  Marta Kwiatkowska  Automated Verification  MSc  The gro language enables the programming, modelling, specifying and visually simulating the behaviour of cells in growing microcolonies of microorganisms (http://depts.washington.edu/soslab/gro/index.html). gro has been used for teaching of synthetic biology. Microcolonies develop through e.g. cell division and cell growth, and can be described in a simple guarded command language. This project aims to extend the gro framework with probabilistic model checking by translating the language to the PRISM modelling language (www.prismmodelchecker.org) and extending the functionality to allow for model checking queries specified in probabilistic temporal logic. The project would suit a student interested in programming and synthetic biology.  
Quantitative/Probabilistic Modelling, Verification and Synthesis  Marta Kwiatkowska  Automated Verification  C  Professor Marta Kwiatkowska is happy to supervise projects in the area of quantitative/probabilistic modelling, verification and synthesis, particularly those relating to the PRISM model checker. PRISM is an open source formal verification tool for analysis of probabilistic systems. PRISM has an extensive website which includes software for download, tutorial, manual, publications and many case studies. Students' own proposals in the broad area of theory, algorithms and implementation techniques for software verification/synthesis will also be considered. Below are some concrete project proposals:
A Molecular Recorder (with Luca Cardelli). Recent technological developments allow massive parallel reading (sequencing) and writing (synthesis) of heterogeneous pools of DNA strands. We are no longer limited to simple circuits built out of a small number of different strands, nor to reading out a few bits of output by fluorescence. While these 

Safety Assurance for Deep Neural Networks  Marta Kwiatkowska  Automated Verification  C  Professor Marta Kwiatkowska is happy to supervise projects in the area of safety assurance and automated verification for deep learning. This is a new research topic initiated with this paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=HKW+17), see also https://www.youtube.com/watch?v=XHdVnGxQBfQ. Below are some concrete project proposals:


An Innovative Network Scanning Framework  Harjinder Lallie, Michael Goldsmith  Security  B  C  Networks often contain multiple vulnerabilities and weaknesses which can be exploited by attackers. Security analysts often find it necessary to perform n Network scanning, probing and vulnerability testing aids the process of discovering and correcting network vulnerabilities. There exist a number of challenges in terms of gathering network data in this manner:  Scalability. Scanning/probing techniques do not scale well to large networks (Lippmann et al., 2006, Bopche and Mehtre, 2014).  Semantic problems. A number of approaches – such as that by Roschke et al. (2009) and Cheng et al. (2011) attempted to combine and correlate the results returned from multiple vulnerability databases. However, the databases return the results in different formats  some in textual format, others in XML format. This means that the results have to be unified into a common meaningful format  Data Consolidation. Other than Rosschke et al’s (2009) study into combining network scanning methods, there have been very few studies which critically evaluate methods of combining the use and results of multiple scanning tools  Performance. Scanning and then combining the results from multiple scanners takes a lot of time (Cheng et al., 2011). Quite often the network configuration changes during the scan – which means that the results are quite often inaccurate. This thesis addresses one or more of the challenges presented above. Prerequisites: This project involves practical work which may involve setting up a virtual network and applying a range of scanning, probing and vulnerability testing mechanisms on the network.  
Calculating Network Security Metrics  Harjinder Lallie, Michael Goldsmith  Security  B  C  The calculation of network security metrics is a complex problem which involves understanding the state and configuration of network connections, devices and protocols. This research analyses the problem of calculating network security metrics and proposes a framework which can calculate security metrics for a typical small network comprising of numerous devices, operating systems and hosts. The project will involve the configuration of virtual networks for testing the framework. CVSS or other metric scores may be used to aid the calculation of the metric. This dissertation involves some practical work which may involve setting up a virtual network which has a number of machines on the network and applying a range of scanning, probing and vulnerability testing mechanisms on the network.  
Analysing concurrent datatypes in CSP  Gavin Lowe  Automated Verification  C  The aim of this project would be to model some concurrent datatypes in CSP, and to analyse them using the model checker FDR. The concurrent datatypes could be some of those studied in the Concurrent Algorithms and Datatypes course. Typical properties to be proved would be linearizability (against a suitable sequential specification) and lock freedom. Prerequisites: Concurrency and Concurrent Algorithms and Datatypes. Reading: Analysing LockFree Linearizable Datatypes using CSP, Gavin Lowe.  
Analysis of Parameterised Systems using CSP and FDR  Gavin Lowe  Automated Verification  C  Many systems are built from some number N of similar components; a common verification problem is to verify such systems for all values of the system size N. Clearly one cannot perform such a verification directly using a model checker such as FDR. I recently proposed an approach to this problem [1]. The idea is to build a process that:
for all possible system sizes, and for all possible values of the underlying types. The abstraction is formed via an iterative approach: the abstraction is guessed, and the model checker used to test whether it is suitable for small values of the system size and the types; if not, the counterexample returned by the model checker is used to improve the guess. Once a suitable abstraction has been found that works for small values of the system size and types, techniques from data independence are used to generalise the result to all such values. This technique has been applied to verify a wellknown security protocol in a system with arbitrarily many honest agents. A paper describing this technique in more detail is available on request. The aim of this project would be to assess the generality of this approach, by using it to analyse more examples, for example common distributed algorithms, such as those described in [2]. The Concurrency course would be a prerequisite. References


Understanding the Java Memory Model  Gavin Lowe  Automated Verification  C  MSc  Concurrent programs on modern architectures can often behave in surprising ways: the presence of caches means that writes can take time to propagate from one thread to another; the compiler can perform optimisations that cause operations to be reordered; similarly, the hardware may execute operations out of order. For example, given the Scala program var x = 0; var y = 0 def P = { val r1 = x; y = 1 } def Q = { val r2 = y; x = 1 } Executing P and Q in parallel may lead to a state where r1 = r2 = 1, counterintuitively, for example as a result of P's two statements being reordered. The Java Memory Model (JMM) gives a formal definition of the behaviours that are allowed for such programs. Unfortunately, the definition is convoluted and hard to understand. The aim of this project is to aid better understanding of the JMM, by producing a tool that, given a small program (like the one above or the ones in [4]), returns all its valid executions. References: [1] Java Memory Model Pragmatics (transcript), Aleksey Shipilev, http://shipilev.net/blog/2014/jmmpragmatics. [2] Formalising Java's Data Race Free Guarantee, David Aspinall and Jaroslav Sevcik. [3] Jeremy Manson, William Pugh and Sarita Adve, The Java Memory Model. [4] William Pugh and Jeremy Manson, Java memory model causality test cases (2004), http://www.cs.umd.edu/~pugh/java/memoryModel/CausalityTestCases.html.  
Analysis of wildlife tracking data  Andrew Markham  Cyber Physical Systems  MSc  We have developed ultralow power wireless tracking tags which have been deployed on badgers, hares and swans in the wild. The sensors mainly measure 3D acceleration, but some data is also from our worldfirst underground 3D tracking system. The sensors are typically sampled at 8 Hz and collars run for between 2 and 3 months. This has generated a large dataset (> 500 million samples) of animal behaviour traces. For some of these deployments, there are secondary (ground truth) data including video and RFID. The challenge is to develop tools to analyse these datasets and present results to endusers (zoologists) in a userfriendly way. Such a tool would perform classification (typically unsupervised) on behaviour traces, generating classes of behaviour. At a very coarse level, this could simply be sleeping vs active, but the richer the classification, the better for automatic generation of ethograms. This classification can also be used online to alter the behaviour of the tracking tags themselves, such as to reduce the sampling rate when an animal is resting to prolong the lifetime of the collar. Suitable for MSc student interested in statistical analysis/data mining and interdisciplinary science  
Automatic Heap Layout Manipulation for Exploitation of Javascript Interpreters  Tom Melham, Daniel Kroening  Automated Verification  C  In order to exploit heapbased buffer overflows an exploit developer must place the target object relative to the source of the overflow. This has been automated for situations in which the starting state of the heap is known and the memory allocator behaves deterministically. However, the allocators found in the Javascript interpreters of modern web browsers behave nondeterministically. This project aims to develop automated heap layout manipulation for such situations. The goal will be to build a system that takes as input a proofofconcept exploit that triggers memory corruption on the heap, and produces as output a new exploit that places an 'interesting' corruption target after the overflow source with a high degree of probability. The proposed projects will expand on the work of the listed Dphil student (Sean Heelan). Prerequisites: * (Required) Ability to comprehend large systems written in C++ * (Required) Familiarity with operating systems concepts such as virtual memory, heap allocators and low level security vulnerabilities * (Optional) Some knowledge of modern web browser internals * (Optional) Some knowledge of exploitation of memory corruption vulnerabilities  
Converting Vulnerabilities into Exploitation Primitives  Tom Melham, Daniel Kroening  Automated Verification  C  Fuzzing is a popular approach for producing inputs that cause a program to crash. In the case of heap overflow vulnerabilities, to use them in an exploit it is necessary to determine what can be corrupted by the overflow, how to manipulate the heap to ensure that this corruption occurs, and then how to make use of the corrupted data. The goal of this project is to investigate how to automate this process and develop a system that can convert a vulnerability trigger into into a new input that can be used as part of an exploit to hijack the control flow of the target. The proposed projects will expand on the work of the lidted Dphil studen (Sean Heelan) Prerequisites: * (Required) Ability to comprehend C/C++ * (Required) Comfort developing an experimental system of moderate size * (Required) Familiarity with operating systems concepts such as virtual memory, heap allocators and low level security vulnerabilities * (Optional) Some knowledge of exploitation of memory corruption vulnerabilities 

Assessing the performance of stochastic optimization methods in structural modelling  Peter Minary  Computational Biology and Health Informatics  B  C  The efficiency of Monte Carlo (MC) based stochastic optimization methods (e.g. Kirkpatrick, et al. Science, 220, 671–680 (1983)) are compared to find low energy conformational states of a given system. We are particularly interested in MC protocols where the temperature varies as a series of impulses followed by relaxation and only the temperature of a given part (e.g. site of interest) of the system is changed. Prerequisites: Recommended for students who has done the Probability and Computing and Geometric Modelling courses and have interest in randomized search methods and their applications in structural biology. 

Software development for computational structural biology  Peter Minary  Computational Biology and Health Informatics  B  C  Several projects are available to implement new algorithms or protocols into the MOSAICS (http://www.cs.ox.ac.uk/mosaics) software package. The first project would aim for the development of new analysis tools to interpret the simulation result. For example, the new protocol would take (input) a structural similarity mearure and a trajectory of simulated conformations and would produce (output) a measure of structural diversity of conformations visited. In the second project, students would implement and compare different physical models to describe hydrogen bonding, which is among the most important canonical interactions that stabilize the double helical DNA. Prerequisites: The project is recommended for students who took the Geometric Modelling and Computer Animation courses as well as have some interest in Numerical Methods (e.g. solution of Ordinary Differential Equations) and their applications to atomistic simulations. 

Tools and applications in computational epigenetics  Peter Minary  Computational Biology and Health Informatics  C  Chemical modifications such as (hydroxy)methylation on nucleic acids are used by the cell for silencing and activation of genes. These so called epigenetic marks can be recognized by ‘protein readers’ indirectly due to their structural ‘imprints', the effects they impose on DNA structure. The project include the development of computational protocols to assess the effect of epigenetic modifications on DNA structure. This research may shed light on how different epigentic modifications affect the helical parameters of the double stranded DNA. Prerequisites: Strong interest in visualizing, analysing and comparing 3D objects and modelling molecular structures. The project can be tailored to suit those from a variety of backgrounds but would benefit from having taken the following courses: Computer Graphics, Geometric Modelling and Computer Animation. 

Topics in Automata Theory, Program Verification, Lambda Calculus and Types  Andrzej Murawski  Programming Languages  C  MSc  Prof Murawski is willing to supervise in the area of automata theory, program verification, lambda calculus and types. For a taste of potential projects, follow this link. 

An Electronic Commerce Protocol  Hanno Nickau  Foundations, Structures, and Quantum  B  C  Commercial use of the Internet is becoming more and more common, with an increasing variety of goods becoming available for purchase over the Net. Clearly, we want such purchases to be carried out securely: a customer wants to be sure of what (s)he's buying and the price (s)he's paying; the merchant wants to be sure of receiving payment; both sides want to end up with evidence of the transaction, in case the other side denies it took place; the act of purchase should not leak secrets, such as credit card details, to an eavesdropper. The aim of this project is to find out more about the protocols that are used for electronic commerce, and to implement a simple ecommerce protocol. In more detail: Understand the requirements of ecommerce protocols; Specify an ecommerce protocol, both in terms of its functional and security requirements;
A variant of this project would be to implement a protocol for voting on the web (which would have a different set of security properties). Prerequisites for this project include good program design and implementation skills, including some experience of objectoriented programming, and a willingness to learn about protocols and cryptography. The courses on concurrency and distributed systems provide useful background for this project. 1 Jonathan Knudsen, Java Cryptography, O'Reilly, 1998. 

Bootstrapping document annotation with Schema.org  Nadeschda Nikitina  Information Systems  MSc  The recently started Schema.org initiative of the major search engine providers aims at fostering semantic annotations across the Web. You can read about it here. Semiautomatic annotation of natural language documents is a longstanding problem area. The goal of this project would be to apply stateoftheart annotation techniques to a large corpus based on the Schema.org semantic model. Prerequisites Some familiarity with topics from Computational Linguistics and Knowledge, Representation and Reasoning. 

Factorised Databases  Dan Olteanu  Information Systems  B  C  More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students. 

Branching Temporal Logics, Automata and Games  Luke Ong  Programming Languages  B  C  Model checking has emerged as a powerful method for the formal verification of programs. Temporal logics such as CTL (computational tree logic) and CTL* are widely used to specify programs because they are expressive and easy to understand. Given an abstract model of a program, a model checker (which typically implements the acceptance problem for a class of automata) verifies whether the model meets a given specification. A conceptually attractive method for solving the model checking problem is by reducing it to the solution of (a suitable subclass of) parity games. These are a type of two player infinite game played on a finite graph. The project concerns the connexions between the temporal logics CTL and / or CTL*, automata, and games. Some of the following directions may be explored. 1. Representing CTL / CTL* as classes of alternating tree automata. 2. Intertranslation between CTL / CTL* and classes of alternating tree automata 3. Using B¨uchi games and other subclasses of parity games to analyse the CTL / CTL* model checking problem 4. Efficient implementation of model checking algorithms 5. Application of the model checker to higherorder model checking. References:Orna Kupferman, Moshe Y. Vardi, Pierre Wolper: An automatatheoretic approach to branchingtime model checking. J. ACM 47(2):
312360 (2000). Rachel Bailey: A Comparative Study of Alorithmics for Solving B¨uchi Games. University of Oxford MSc Dissertation,
2010. 

Foundational Structures for Concurrency  Luke Ong  Programming Languages  MSc  Dr Ong is willing to supervise projects in this area.  
Game Semantics and Linear Logic  Luke Ong  Programming Languages  MSc  Dr Ong is willing to supervise projects in this area.  
Lambda Calculus  Luke Ong  Programming Languages  MSc  Dr Ong is willing to supervise projects in this area  
Luke Ong's Projects  Luke Ong  Programming Languages  B  C  See http://users.comlab.ox.ac.uk/luke.ong/projects/ for more information  
Semantics of Programming Languages  Luke Ong  Programming Languages  MSc  Dr Ong is willing to supervise projects in this area  
Types, Proofs and Categorical Logic  Luke Ong  Programming Languages  MSc  Dr Ong is willing to supervise projects in this area.  
Building databases of mathematical objects in Sagemath (Python)  Dmitrii Pasechnik  Algorithms and Complexity Theory  B  C  MSc 
Prerequisites: Interest in open source software, some knowledge of Python, some maths background. 

Computing with semialgebraic sets in Sage (math)  Dmitrii Pasechnik  Algorithms and Complexity Theory  B  C  Semialgebraic sets are subsets of R^n specified by polynomial inequalities. The project will extend capabilities of Sage (http://www.sagemath.org) to deal with them, such as CADs computations or sums of squares based (i.e. semidefinite programming based) methods. There might be a possibility for participation in Google Summer of Code (GSoC) or Google Semester of Code with Sage as a GSoC organisation. Prerequisites Interest in open source software, some knowledge of Python, appropriate maths background. 

Implementing and experimenting with variants of WeisfeilerLeman graph stabilisation  Dmitrii Pasechnik  Algorithms and Complexity Theory  B  C  WeisfeilerLeman graph stabilisation is a procedure that plays an important role in modern graph isomorphism algorithms (L.Babai 2015) and enjoyed attention in machine learning community recently (cf. "WeisfeilerLehman graph kernels"). While it is relatively easy to implement, efficient portable implementations seem to be hard to find. In this project we will work on producing such an opensource implementation, either by reworking our old code from http://arxiv.org/abs/1002.1921 or producing new one; we will also work on providing Python/Cython interfaces for a possible inclusion in SageMath (http://sagemath.org) library, and/or elsewhere.  
An interactive visual tutorial for Bayesian parameter estimation  Joe PittFrancis, Michael Clerx  Computational Biology and Health Informatics  B  C  The aim of this project is to build an educational tool which enables the progress of a Bayesian parameter estimation algorithm to be visualised. The model to be fitted might be (but is not limited to) a system of Ordinary Differential Equations and the Bayesian estimation tools might be build around an existing system such as Stan, PyML or Edward. A good tutorial system should be able to let the user change the underlying model system, introduce noise to a system, visualise interactive updates to probability distributions, explore the progress of a chosen sampling method such as MetropolisHastings and provide enough information that a novice student can get an intuition into all aspects of the process.  
Automatic translation to GPGPU  Joe PittFrancis  Computational Biology and Health Informatics  B  C  This project involves running cardiac cell models on a highend GPU card. Each model simulates the electrophysiology of a single heart cell and can be subjected to a series of computational experiments (such as being paced at particular heart rates). For more information about the science and to see it in action on CPU see "Cardiac Electrophysiology Web Lab" at https://travis.cs.ox.ac.uk/FunctionalCuration/ An existing compiler (implemented in Python) is able to translate from a domain specific XML language (http://models.cellml.org) into a C++ implementation. The goal of the project is to add functionality to the compiler in order to get OpenCL or CUDA implementations of the same cell models and to thus increase the efficiency of the "Web Lab". 

General graphics projects  Joe PittFrancis  Computational Biology and Health Informatics  B  C  I am interested in supervising general projects in the area of computer graphics. If you have a particular area of graphicsrelated research that you are keen to explore then we can tailor a bespoke project for you. Specific projects I have supervised in the past include "natural tree generation" which involved using Lindenmayer systems to grow realistic looking bushes and trees to be rendered in a scene; "procedural landscape generation" in which an island world could be generated onthefly using a set of simple rules as a user explored it; "gesture recognition" where a human could control a simple interface using handgestures; "parallel raytracing" on distributedmemory clusters and using multiple threads on a GPU card; "radiosity modelling" used for analysing the distribution of RFID radio signal inside a building; and "nonphotorealistic rendering" where various models were rendered with toon/cel shaders and a set of pencilsketch shaders. 

Graphics pipeline animator  Joe PittFrancis  Computational Biology and Health Informatics  B  C  Prerequisites: Computer graphics, Objectoriented programming The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process 

Intuitive exploration through novel visualisation  Joe PittFrancis  Computational Biology and Health Informatics  B  C  I am interested in novel visualisation as a way to represent things in a more appealing and intuitive way. For example the Gnome disk usage analyzer (Baobab) uses either a "ring chart" or "treemap chart" Representation to show us which subfolders are using the most disk. In the early 1990s the IRIX file system navigator used a 3D skyscraper representation to show us similar information. There are plenty more ways of representing disk usage: from DAGs to centralised Voronoi diagrams. What kind of representation is most intuitive for finding a file which hogging diskspace and which is most intuitive for helping us to remember where something is located in the filesystem tree? The aim is to explore other places where visualisation gain intuition: for example, to visualise the output of a profiler to find bottlenecks in software, to visual a code coverage tool in order to check that testsuites are are testing the appropriate functionality or even to visualise the prevalence of diabetes and heart disease in various regions of the country.  
Computation of particle interactions in ndimensional space  Martin Robinson  Computational Biology and Health Informatics  C  "Aboria (https://github.com/martinjrobins/Aboria) is a C++ library for evaluating and solving systems of equations that can be described as interactions between particles in ndimensional space. It can be used as a high performance library to implement numerical methods such as Molecular Dynamics in computational chemistry, or Gaussian Processes for machine learning. Project 1: Aboria features a radial neighbour search to find nearby particles in the ndimensional space, in order to calculate their interactions. This project will implement a new algorithm based on calculating the interactions between neighbouring *clusters* of particles. Its performance will be compared against the existing implementation, and across the different spatial data structures used by Aboria (Celllist, Octree, Kdtree). Prerequisites: C++ Project 2: Aboria features a serial Fast Multipole Algorithm (FMM) for evaluating smooth long range interactions between particles. This project will implement and profile a parallel FMM algorithm using CUDA and/or the Thrust library. Prerequisites: C++, Knowledge of GPU programming using CUDA and/or Thrust Project 3: The main bottleneck of the FMM is the interactions between wellseparated particle clusters, which can be described as lowrank matrix operations. This project will explore different methods compressing these matrices in order to improve performance, using either Singular Value Decomposition (SVD), Randomised" SVD, or Adaptive Cross Approximation Prerequisites: C++, Linear Algebra 

Statistical shape atlas of cardiac anatomy  Blanca Rodriguez, Pablo Lamata  Computational Biology and Health Informatics  MSc  Description: Cardiac remodelling is the change in shape of the anatomy due to disease processes. 3D computational meshes encode shape variation in cardiac anatomy, and render higher diagnostic than conventional geometrical metrics. Better shape metrics will enable an earlier detection, a more accurate stratification of disease, and more reliable evaluation of remodelling response. This project will contribute to the development of a toolkit for the construction of anatomical atlases of cardiac anatomy, and its translation to clinical adoption. The student will learn about the challenges and opportunities of the crossdisciplinary field between image analysis, computational modelling and cardiology, and be part of a project with a big potential impact on the management of cardiovascular diseases. Prerequisites: Motivation. Good programming skills. Experience with computational graphics and image analysis is an advantage. 

A functional/dataflow programming language for acoustic signal processing  Alex Rogers  Cyber Physical Systems  B  C  Many detection tasks for acoustic sensors deployed in environmental monitoring (for example, triggering an extended recording after analysing a small sample of sound in order to capture the sound of a particular species of animal, or to detect human activity such as hunting and logging) can be readily described using functional or dataflow programming languages. Prototyping algorithms at this level is intuitive, and with the correct tools can be interactive, enabling nonspecialist programmers to make rapid progress. However, to run on small embedded devices the resulting program typically needs to be converted to lowlevel C code. In this project, you will build a toolchain to automate this process. The resulting system will allow a functional or dataflow programming language (possibly one of your own design) to be used to prototype and evaluate acoustic algorithms. It will then automatically generate low level C code that can be compiled to run on an opensource resourceconstrained acoustic sensor (www.openacousticdevices.info). Prerequisites: Compilers and Principles of Programming Languages useful but not essential  
A processor design toolkit for an opensource FPGA toolchain  Alex Rogers  Cyber Physical Systems  B  C  Fieldprogrammable gate arrays (FPGA) are integrated circuits that can be configured after manufacture into almost any conceivable logic circuit combining both combinatorial and synchronous elements. They can be used to explore real hardware implementations of processor designs from simple accumulator machines, through to register machines, and more unusual stack machines. Their use outside industry has previously been limited due to the high cost and complexity of their associated development software. However, this is changing, with an opensource toolchain for popular FPGA devises becoming available in the last years. In this project, you will build on this toolchain to develop the additional software necessary to allow students to design and explore simple processor designs on a custom FPGA development board. The current toolchain requires the use of the Verilog hardware description language. Verilog is very powerful but also very general. This project will develop a highlevel language (possibly a graphical language) focused on the development of simple processor designs from a small number of standard components (such as RAM, multiplexers and registers). Prerequisites: Digital Systems, Compilers & Computer Architecture useful but not essential  
An operating system for embedded devices  Alex Rogers  Cyber Physical Systems  B  C  Most small embedded devices do not separate highlevel application code from the lowlevel devicespecific code that interacts with hardware peripherals. As a result, small changes to application logic often results in the entire firmware of a device being updated. In many settings this is inconvenient and results in inefficient use of constrained resources such as communication bandwidth. In this project, you will design and implement a lowlevel operating system for embedded devices that separates relocatable application logic from lowlevel I/O logic. The resulting operating system should allow easy access to only that hardware functionality that the device designer wishes to make available to application developers. The project will target a typical small microcontroller for embedded applications that uses an ARM M0+ core with 64KB flash and 8KB RAM. Prerequisites: Digital Systems and some knowledge of C programming is useful but not essential.  
Educational tools for micro:bit  Alex Rogers  Cyber Physical Systems  B  C  The micro:bit is an ARMbased embedded development board intended for STEM education (https://microbit.org). To date over two million micro:bit devices have been manufactured and distributed. In this project, you will work with the Micro:bit Educational Foundation to extend the capabilities of this device. Potential projects include (i) developing interaction mechanisms to facilitate local largescale multiplayer games and simulations (e.g. modeling virus contagion throughout a classroom or synchronisation of fireflies) using the existing I/O and peertopeer networking capabilities of the micro:bit board, (ii) a datalogging application that allows micro:bit devices to collect experimental data, exposes that data to a webpage through a WebUSB interface, and presents useful data analysis and plotting tools, or (ii) deploying small convolution neural networks on the device to perform image recognition tasks using a low resolution camera made from an optical mouse sensor. You will be encouraged to explore other ideas of your own. The focus is to provide a complete endtoend solution, balancing the needs of both students and instructors, that can ultimately be incorporated into publicly available micro:bit educational resources. Prerequisites: No specific prerequisites.  
Resurrecting Extinct Computers  Alex Rogers  Cyber Physical Systems  B  C  While the architecture of current reduced instruction set processors is well established, and relatively static, the early days of computing saw extensive experimentation and exploration of alternative designs. Commercial processors developed during the 1960s, 1970s and 1980s included stack machines, LISP machines and massively parallel machines, such as the Connection Machine (CM1) consisting of 65,536 individual onebit processors connected together as a 12dimensional hypercube. This period also saw the development of the first single chip microprocessors, such as the Intel 4004, and the first personal computers, such as the Altair 8800 using the Intel 8080 microprocessor. This project will attempt to resurrect one of these extinct designs (or a scaled down version if necessary) using a modern lowcost fieldprogrammable gate array (FPGA). You will be required research the chosen processor, using both original and modern sources, and then use Verilog to develop a register level description of the device that can be implemented on a FPGA. The final device should be able to run the software of the original and could be realised in a number of different forms depending on the chosen processor (e.g. an Altair 8800 on a small USB stick running Microsoft BASIC). Prerequisites: Digital Systems or Computer Architecture useful but not essential 

Applications of ad hoc security  Bill Roscoe  Security  MSc  My group have developed novel technology for developing security between pairs and groups of devices such as mobile phones. This technology clearly has many potential applications: the objective of this project is to build an implementation directed at an application of the student's choosing, for example email, healthcase or social networking. See my papers with Bangdao Chen, Ronald Kainda and Long Nguyen on my web page. Prerequisites: Security  
Developing and using SVA  Bill Roscoe  Security  MSc  SVA is a front end for FDR that analyses shared variable programs. See Chapters 18 and 19 of Understanding Concurrent Systems. It has the potential to support a number of projects including: (a) Building a version that accepts a subset of some standard programming language as opposed to the custom language it presently uses. (b) Extending its functionality (c) Case studies using it. (a) and (b) would both involve programming in CSP and JAVA, the two languages SVA is written in. Prerequisites: Concurrency for (a) and (b)  
Modelling and verifying systems in Timed CSP and FDR  Bill Roscoe  Security  B  C  Timed CSP reinterprets the CSP language in a realtime setting and has a semantics in which the exact times of events are recorded as well as their order. Originally devised in the 1980s, it has only just been implemented as an alternative mode for FDR. The objective of this project is to take one or more examples of timed concurrent system from the literature, implement them in Timed CSP, and where possible compare the performance of these models with similar examples running on other analysis tools such as Uppaal. References:(Reference Understanding Concurrent Systems, especially Chapter 15, and Model Checking Timed CSP, from AWR's web list of publications) 

A simple objectoriented language  Michael Spivey  Programming Languages  B  C  Use Keiko to implement a simple language that is purely objectoriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. NB. This project is not available to MSc students in 201617. Undergraduate students who wish to enquire about a project for 201718 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical. 

Better JIT translation  Michael Spivey  Programming Languages  B  C  The existing JIT for Keiko is very simpleminded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pureJIT implementation. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. N.B This project is not available to MSc students in 201617. Undergraduate students who wish to enquire about a project for 201718 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical. 

Better performance for GeomLab  Michael Spivey  Programming Languages  B  C  At present, GeomLab programs show a performance that competes favourably with Python, making it possible to address tasks like computing images of the Mandelbrot set using a purely functional program that calls a function once for each pixel. But there is still a gap between the performance of GeomLab programs and similar ones written in Java or C, and more ambitious imageprocessing tasks would be made possible by better performance, particularly in the area of arithmetic. Explore ways of improving performance, perhaps including the possibility of improving the performance of GeomLab by allowing numbers to be passed around without wrapping them in heapallocated objects, or the possibility of compiling the code for Haskellstyle pattern matching in a better way. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. N.B. This project is not available to MSc students in 201617. Undergraduate students who wish to enquire about a project for 201718 are welcome to contact Prof Spivey but should
note that the response may be delayed as he is on sabbatical. 

Eliminating array bounds checks  Michael Spivey  Programming Languages  B  C  The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is nonnull. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. N.B. This project is not available to MSc students in 201617. Undergraduate students who wish to enquire about a project for 201718 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical. 

GeomLab and Mindstorms  Michael Spivey  Programming Languages  B  C  GeomLab has a turtle graphics feature, but the pictures are drawn only on the screen. It should be possible to make a turtle out of Lego Mindstorms, then control it with an instance of Geomlab running on a host computer, with communication over Bluetooth. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. N.B. This project is not available to MSc students in 201617. Undergraduate students who wish to enquire about a project for 201718 are welcome to contact Prof Spivey but should note that the response may be delayed as he is on sabbatical. 

GeomLab on Android  Michael Spivey  Programming Languages  B  C  Produce an implementation of GeomLab's GUI and graphics library that works on the Android platform. Either use an interpreter for GeomLab's intermediate code to execute GeomLab programs, or investigate dynamic translation of the intermediate code into code for Android's virtual machine Dalvik. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.


Heapbased activation records  Michael Spivey  Programming Languages  B  C  At present, Keiko supports only conventional Pascallike language implementations that store activation records on a stack. Experiment with an implementation where activation records are heapallocated (and therefore recovered by a garbage collector), procedures are genuinely firstclass citizens that can be returned as results in addition to being passed as arguments, and tail recursion is optimised seamlessly. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.


Keiko on Mindstorms  Michael Spivey  Programming Languages  B  C  Alternative firmware for the Mindstorms robot controller provides an implementation of the JVM, allowing Java programs to run on the controller, subject to some restrictions. Using this firmware as a guide, produce an interpreter for a suitable bytecode, perhaps some variant of Keiko, allowing Oberon or another robot language of your own design to run on the controller. Aim to support the buttons and display at first, and perhaps add control of the motors and sensors later. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.


Typechecking for GeomLab  Michael Spivey  Programming Languages  B  C  The GeomLab language is untyped, leading to errors when expressions are evaluated that would be better caught at an earlier stage. Most GeomLab programs, however, follow relatively simple typing rules. The aim in this project is to write a polymorphic type checker for GeomLab and integrate it into the GeomLab system, which is implemented in Java. A simple implementation of the typechecker would wait until an expression is about to be evaluated, and typecheck the whole program at that point. As an extension of the project, you could investigate whether it is possible to typecheck function definitions one at a time, even when some of the functions they call have not yet been defined. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details.


MSc and Undergraduate Projects  Bernard Sufrin  Programming Languages  MSc  Bernard Sufrin's most recent collection of detailed project descriptions can always be found at http://www.cs.ox.ac.uk/people/bernard.sufrin/personal/projects.html. In addition to the projects described there, he is happy to discuss supervising projects in concurrent programming, userinterface design, programming language implementation, program transformation, and proof support. 

Algorithmic Explanations  Max Van Kleek, Jun Zhao  Human Centred Computing  B  C  Description: Machine learning is a range of methods for classifying or predicting realworld phenomena using data. When ML models are used to predict employee success, or set insurance prices, they can have significant effects on people’s lives. Understandably, in such circumstances people might want explanations for why a system has given them a negative score. Various methods have been proposed to generate interpretable ML models; including systematically perturbing the inputs to observe effects on the output, or providing power indices for each input. However, few of these techniques have been translated into userfriendly explanation interfaces. In this project, you would explore ways to improve explanation systems. This could involve generating novel explanation methods; translating existing ones into graphical or natural language formats; and / or testing them out on users for different purposes.  
App and loT XRay  Max Van Kleek, Jun Zhao  Human Centred Computing  B  C  The XRay project enables people to see all of the first and third parties that receive data from the smartphone apps and IoT devices in their homes, so that they can make better privacy decisions. There are several pieces to the XRay: the Observatory, which discovers and retrieves smartphone software such as smartphone apps, the analyser, a static analysis method for analysing these apps to derive information flow models, and enduser clients which provide visualisations and ways for people to understand these models in the context of specific privacy choices. In this project, you will be invited to extend the XRay in some way: such as by introducing a new module for the observatory (to ingest new kinds of content), the analyser, or a new way for endusers to view and use the models derived.  
GDPR and Personal Data  Max Van Kleek  Human Centred Computing  MSc  Description: Data protection law imposes various duties on companies to protect enduser’s rights and freedoms with respect to their ‘personal data’ (any data that be used to identify a person). This project will explore ways to automatically assess whether organisations are complying with their duties, using static and dynamic analyses of websites, apps and privacy policy text documents. Your proposed compliance detection methods can then be validated on a database containing information about flows of personal data between ~1 million companies.  
Understanding App Data Collection in Chinese Smartphone  Max Van Kleek, Jun Zhao  Human Centred Computing  B  C  China has the world’s largest online population, and smartphones are used in every part of daily life, from social networking to digital payment in even the most remote corners of the country. Unlike the highly platformconsolidated app markets of western countries, China uniquely has more than 20 app stores operated by private enterprises. This raises unique challenges and questions regarding the app ecosystem and the data monetization model behind it. In this project, we will build upon our existing research experience of having collected and analysed >1million mobile apps from the UK app market to collect popular mobile apps from the top 10 mobile app stores in China, and analyse the data tracking activities behind the apps. In this way we will generate some firsthand understanding about the Chinese app market ecosystem. Prerequisites: Knowledge of Chinese language is beneficial but not required.  
3D demos of geometric concepts  Irina Voiculescu  Computational Biology and Health Informatics  B  The Geometric Modelling course (Part B) deals with several interesting concepts which would be best visualised in a suite of applications. A coherent suite of 3D demos could easily become a useful tool for this course, as well as for users worldwide. This project would be most suitable for a candidate who already have some experience using a 3D graphics library of their choice and want to improve this skill. The mathematical concepts are welldocumented.  
3D environment for Hand Physiotherapy  Irina Voiculescu  Computational Biology and Health Informatics  B  C  After hand surgery, it is almost always necessary for patients to have physiotherapy afterwards to help with their recovery. As part of this, the patient will need to perform hand exercises at home. However, the patient may not always do the exercises correctly, or they might forget to do their exercises. The goal of this project is to use the Leap Motion to create a userfriendly GUI which a patient could use to aid them with their home exercises. The interface would show the user where their hand should be and they would then need to follow the movements. It could work from a webbased software or a downloaded software. It would need to be tailored to the patient so it contained their specific required exercises, which could be input by the physiotherapist. It would need to store data on how the patient is doing and feedback this data to the patient, and possibly also to the physiotherapist via the internet. If internetbased, patient confidentiality and security would need to be considered. This project would be performed in close collaboration with a physiotherapist, an orthopaedic hand surgeon, and a postdoctoral researcher based at the Nuffield Orthopaedic Centre.  
3D printing medical scan data  Irina Voiculescu  Computational Biology and Health Informatics  B  C  Computed tomography (CT) scanning is a ubiquitous scanning modality. It produces volumes of data representing internal parts of a human body. Scans are usually output in a standard imaging format (DICOM) and come as a series of axial slices (i.e. slices across the length of the person's body, in planes perpendicular to the imaginary straight line along the person's spine.) The slices most frequently come at a resolution of 512 x 512 voxels, achieving an accuracy of about 0.5 to 1mm of tissue per voxel, and can be viewed and analysed using a variety of tools. The distance between slices is a parameter of the scanning process and is typically much larger, about 5mm. During the analysis of CT data volumes it is often useful to correct for the large spacing between slices. For example when preparing a model for 3D printing, the axial voxels would appear elongated. These could be corrected through an interpolation process along the spinal axis. This project is about the interpolation process, either in the raw data output by the scanner, or in the postprocessed data which is being prepared for further analysis or 3D printing. The output models would ideally be files in a format compatible with 3D printing, such as STL. The main aesthetic feature of the output would be measurable as a smoothness factor, parameterisable by the user. Existing DICOM image analysis software designed within the Spatial Reasoning Group at Oxford is available to use as part of the project. 

3D stereo display of medical scan data  Irina Voiculescu, Stuart Golodetz  Computational Biology and Health Informatics  B  C  The Medical Imaging research group has been working with a variety of data sourced from CT and MRI scans. This data comes in collections of (generally greyscale) slices which together make up 3D images. Our group has developed software to generate 3D models of the major organs in these images. This project aims to develop a simple augmented reality simulation for the Oculus Rift which will render these organs within a transparent model of a human and allow the user to walk around the model so as to view the organs from any angle. This has a number of possible applications, including to train medical students and to help surgeons to explain medical procedures to their patients.  
Exact Algorithms for Complex Root Isolation  Irina Voiculescu  Computational Biology and Health Informatics  B  C  Not available in 2013/14 Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and HansenSengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU. 

Gesture recognition using Leap Motion  Irina Voiculescu  Computational Biology and Health Informatics  B  C  Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia  a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computerbased tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users being asked to carry out static gestures above a Leap Motion sensor, and these gestures being scored according to a variety of criteria. A prototype has been constructed to gather data, and some data has been gathered from a few controls and patients. In order to deploy this as a clinical tool into the NHS, there is need for a systematic data collection and analysis tool, based on machine learning algorithms to help classify the data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. Also, the graphical user interface needs to be extended to give particular kinds of feedback to the patient in the form of home exercises, as part of a rehabilitation programme. This project was originally set up in collaboration with Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new cosupervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016. 

Identifying features in MRI scan data  Irina Voiculescu  Computational Biology and Health Informatics  B  C  In recent years, medical diagnosis using a variety of scanning modalities has become quasiuniversal and has brought about the need for computer analysis of digital scans. Members of the Spatial Reasoning research group have developed image processing software for CT (tomography) scan data. The program partitions (segments) images into regions with similar properties. These images are then analysed further so that particular features (such as bones, organs or blood vessels) can be segmented out. The team's research continues to deal with each of these two separate meanings of medical image segmentation. The existing software is written in C++ and features carefullycrafted and welldocumented data structures and algorithms for image manipulation. In certain areas of surgery (e.g. orthopaedic surgery involving hip and knee joint) the magnetic resonance scanning modality (MRI) is preferred, both because of its safety (no radiation involved) and because of its increased visualisation potential. This project is about converting MRI scan data into a format that can become compatible with existing segmentation algorithms. The data input would need to be integrated into the group's analysis software in order then to carry out 3D reconstructions and other measurements. This project is cosupervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS.  
Reinforcement learning techniques for games  Irina Voiculescu  Computational Biology and Health Informatics  B  C  This project is already taken for 20172018
Psychology has inspired and informed a number of machine learning methods. Decisions within an algorithm can be made so as to improve an overall aim of maximising a (cumulative) reward. Supervised learning methods in this class are known as Reinforcement Learning. A basic reinforcement learning model consists of establishing a number of environment states, a set of valid actions, and rules for transitioning between states. Applying this model to the rules of a board game means that the machine can be made to learn how to play a simple board game by playing a large number of games against itself. The goal of this project is to set up a reinforcement learning environment for a simple board game with a discrete set of states (such as Backgammon). If time permits, this will be extended to a simple geometric game (such as Pong) where the states may have to be parameterised in terms of geometric actions to be taken at each stage in the game. 

Simple drawing analysis  Irina Voiculescu  Computational Biology and Health Informatics  B  C  Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia  a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computerbased tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users drawing simple figures on a tablet, and these figures being scored according to a variety of criteria. Data has already been gathered from 200 or so controls, and is being analysed for a range of parameters in order to assess what a neurotypical person could achieve when drawing such simple figures. Further machine learning analysis could help classify such data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. This project was originally cosupervised by Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new cosupervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016. 

Surgery teaching and training tool  Irina Voiculescu  Computational Biology and Health Informatics  B  C  Knee replacement surgery involves a precise series of steps that a surgeon needs to follow. Trainee surgeons have traditionally mastered these steps by learning from textbooks or experienced colleagues. Surgeons at the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS) in Oxford have been working on a standardised method to help trainees internalise the sequence of events in an operation. It is proposed to construct a computerbased tool which would help with this goal. Apart from the choice of tools and materials, the tool would also feature a virtual model of the knee. The graphical user interface will present a 3D model of a generic knee to be operated, and would have the ability for the user to make cuts necessary to the knee replacement procedure. There would be predefined parameters regarding the type and depth of each cut, and an evaluation tool on how the virtual cuts compared against the parameters. The project goals are quite extensive and so this would be suitable for an experienced programmer. This project is cosupervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS. 

Textile fabric detail simulation  Irina Voiculescu  Computational Biology and Health Informatics  B  C  Conventional computeraided design (CAD) software uses methods such as extrusion and revolution tools that the user can apply to create the 3D shape of a part. These tools are based on traditional manufacturing methods and work very well for most CAD applications. One application which these tools do not work well for is creating 3dimensional representations of textiles. The exact path of each fibre within the textile is dependent upon the other fibres, and the flexibility of the fibres. The purpose of this project is to create a simple software tool/algorithm into which a user can input a weave pattern (flat), or a braid pattern (cylindrical), and the flexibility of the fibres and it will create a 3dimensional representation of the structure.  
The efficiency of numerical algorithms  Jonathan Whiteley  Computational Biology and Health Informatics  B  C  Many numerical algorithms have error bounds that depend on some user provided input. For example, the error in a numerical method for solving a differential equation is bounded in terms of the stepsize h, and so the user may change the stepsize h until a desired accuracy is attained. Although useful, these error bounds do not take account of computational efficiency. For example, a numerical method for solving a differential equation may have a very impressive bound with respect to step size h, but may require significantly more computational effort than other methods with less impressive error bounds. The field of scientific computing is a rich source of algorithms such as these, for example the numerical solution of differential equations, the numerical solution of linear systems, and interpolation of functions. The aim of this project is to work with a variety of algorithms for solving a given problem, and to assess the computational efficiency of these algorithms.  
The efficient storage of sparse matrices  Jonathan Whiteley  Computational Biology and Health Informatics  B  Many matrix applications involve sparse matrices, i.e. matrices that have a very large number of rows and columns, but only a small number of nonzero entries in each row. On a given computer we may only store a finite number of matrix entries. When working with sparse matrices we usually store only the nonzero entries and their locations. There are several established techniques for storing sparse matrices. These methods all have individual strengths and weaknesses  some allow efficient multiplication of sparse matrices by vectors, others allow entries to be modified effectively, while others allow the sparsity pattern to be modified dynamically. The aim of this project is to investigate these sparse storage methods, and to highlight the strengths and weaknesses of each approach.  
Model Checking LTL on Markov Chains  James Worrell  Automated Verification  B  C  The goal of this project is to write a program that model checks a Markov chain against an LTL formula, i.e., calculates the probability that formula is satisfied. The two main algorithmic tasks are to efficiently compile LTL formulas into automata and then to solve systems of linear equations arising from the product of the Markov chain and the automaton. An important aspect of this project is to make use of an approach that avoids determinising the automaton that represents the LTL formula. This project builds on material contained in the Logic and Proof and Models of Computation Courses. Reading: JM. Couvreur, N. Saheb and G. Sutre. An optimal automata approach to LTL model checking of probabilistic systems. Proceedings of LPAR'03, LNCS 2850, Springer 2003. 

Topics in Linear Dynamical Systems  James Worrell  Automated Verification  C  A linear dynamical system is a discrete or continuoustime system whose dynamics is given by a linear function of the current state. Examples include Markov chains, linear recurrence sequences (such as the Fibonacci sequence), and linear differential equations. This project involves investigating the decidability and complexity of various reachability problems for linear dynamical systems. It would suit a mathematically oriented student. Linear algebra is essential, and number theory or complexity theory is desirable. A relevant paper is Ventsislav Chonev, Joël Ouaknine, James Worrell: The orbit problem in higher dimensions. STOC 2013: 941950.  
Children, WellBeing, and Apps  Jun Zhao  Human Centred Computing  MSc  Description: Tablet computers are becoming the primary means for young children to go online. However, few studies have examined how young children perceive and cope with personal data privacy during their interactions with these mobile technologies. Our current research has shown that young children and their parents largely have little awareness of personal data collection by mobile apps. Parents’ primary concerns are misplaced, and remain focused on familiar risks, like content appropriateness or inapp purchases. As a result, children are regularly exposed to personal data risks and have little skills to recognise or cope with them. In the KOALA project, we will explore different ways to mitigate these risks, including recognising and detecting apps that designed for children in the general app store, and analysing personal data collection practices of existing mobile apps from leading kids app stores.  
Topics in Algorithms, Complexity, and Combinatorial Optimisation  Standa Živný  Algorithms and Complexity Theory  C  Prof Zivny is willing to supervise in the area of algorithms, complexity, and combinatorial optimisation. In particular, on problems related to convex relaxations (linear and semidefinite programming relaxations), submodular functions, and algorithms for and complexity of homomorphisms problems and Constraint Satisfaction Problems. 