## List of projects

Project Supervisors Parts Description
Developing stochastic hybrid models for smart microgrid systems Alessandro Abate C

Smart microgrids are small-scale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks. Modeling of a microgrid requires special attention due to the existence of continuous and discrete variables. The existence of components with stochastic behaviour points out the use of stochastic hybrid systems as a model of such microgrids. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is to model the evolution of electrical and physical variables of the microgrid by stochastic hybrid systems.

Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed.

Development of software for the verification of MPL models Alessandro Abate B  C

This project is targeted to enhance the software tollbox VeriSiMPL (very simple''), which has been developed to enable the abstraction of Max-Plus-Linear (MPL) models. MPL models are specified in MATLAB, and abstracted to Labeled Transition Systems (LTS). The LTS abstraction is formally put in relationship with its MPL counterpart via a (bi)simulation relation. The abstraction procedure runs in MATLAB and leverages sparse representations, fast manipulations based on vector calculus, and optimized data structures such as Difference-Bound Matrices. LTS can be pictorially represented via the Graphviz tool and exported to PROMELA language. This enables the verification of MPL models against temporal specifications within the SPIN model checker.

Prerequisites: some familiarity with dynamical systems, working knowledge of MATLAB and C

Software development for abstractions of stochastic hybrid systems Alessandro Abate B  C

Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reach-avoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states. The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.).

Prerequisites: some familiarity with stochastic processes, working knowledge of MATLAB and C

Computational Higher Algebra Samson Abramsky, Jamie Vicary B  C

Some modern approaches to algebraic topology and quantum information make use of compound structures, such as matrices of matrices of complex numbers, which have a rich and beautiful theory. We have a computer algebra system written in Mathematica to help us manipulate these structures, but there are many gaps, and much scope for finding ingenious algorithms to improve performance. This project involves looking at the existing code (which is really quite simple!), understanding the mathematical problems it is solving, and working out how to do things better. Alternatively, there are opportunities to implement new and exciting features. There are many directions in which this project could be taken. The emphasis can be shifted depending on the student; it is appropriate for students interested in algorithms, rewriting, topology, or quantum information.

Projects for Samson Abramsky Samson Abramsky C Samson Abramsky is happy to supervise projects in the following areas: - study of nonlocality and contextuality in quantum information and beyond - sheaf theory and contextual semantics - applications of coalgebra in game theory and economics - intensional forms of recursion in computation theory and applications Please contact him to discuss any of these in more detail
Sheaf theoretic semantics for vector space models of natural language Samson Abramsky B  C

Contextuality is a fundamental feature of quantum physical theories and one that distinguishes it from classical mechanics. In a recent paper by Abramsky and Brandenburger, the categorical notion of sheaves has been used to formalize contextuality. This has resulted in generalizing and extending contextuality to other theories which share some structural properties with quantum mechanics. A consequence of this type of modeling is a succinct logical axiomatization of properties such as non-local correlations and as a result of classical no go theorems such as Bell and Kochen-Soecker. Like quantum mechanics, natural language has contextual features; these have been the subject of much study in distributional models of meaning, originated in the work of Firth and later advanced by Schutze. These models are based on vector spaces over the semiring of positive reals with an inner product operation. The vectors represent meanings of words, based on the contexts in which they often appear, and the inner product measures degrees of word synonymy. Despite their success in modeling word meaning, vector spaces suffer from two major shortcomings: firstly they do not immediately scale up to sentences, and secondly, they cannot, at least not in an intuitive way, provide semantics for logical words such as and', or', not'. Recent work in our group has developed a compositional distributional model of meaning in natural language, which lifts vector space meaning to phrases and sentences. This has already led to some very promising experimental results. However, this approach does not deal so well with the logical words.

The goal of this project is to use sheaf theoretic models to provide both a contextual and logical semantics for natural language.  We believe that sheaves provide a generalization of the logical Montague semantics of natural language which did very well in modeling logical connectives, but did not account for contextuality. The project will also aim to combine these ideas with those of the distributional approach, leading to an approach which combines the advantages of Montague-style and vector-space semantics.

Prerequisites

==========

The interested student should have taken the category theory and computational linguistics courses, or be familiar with the contents of these.

Optimization of Web Query Plans Michael Benedikt

This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions.

We will look at logic-based methods for analyzing query plans, taking into account integrity constraints that may exist on the data.

XML transformation and update languages Michael Benedikt B  C

Professor Michael Benedikt ( benedikt@cs.ox.ac.uk) would be delighted to supervise projects in XML transformation and update languages. Please feel free to contact him.

Building a population of human cardiac cell models using public participation to explore and compute the parameter space Kevin Burrage C

Computing a population of models (that is a model with differing sets of valid parameters) is intensive, inherently parallel, and so well suited to the BOINC distributed volunteer computing environment that allows many computational jobs to be run independently and in parallel to each other. This project will focus on the construction of the BOINC project environment, that will allow a population of cardiac action potential models to be built by comparing the model dynamics against a set of experimental biomarkers. This project will suit a student with a computational background and interest in using distributed computing via the concept of public participation on an important application problem in cardiac research.

Please note Professor Burrage is currently on sabattical and will not be supervising any undergraduate projects at present.

Modelling and reasoning about agent-based, networked complex systems such as financial markets, biological systems and the Internet Ani Calinescu C

Complex systems are characterized by variety and uncertainty. A heterogeneous agent-based modelling approach to such systems would need to specify the types of agents, their properties, the structure of the network, and the information flows among them. Generic and domain-specific research questions could be investigated within this context. This topic could be adapted to the student's background and interests. From a theoretical perspective, the project could aim to produce a formal specification of properties of such as adaptiveness, emergence and robustness. From a more applied perspective, the project could implement and extend existing approaches to modelling complex systems such as the Internet, biological systems and financial markets.

### Prerequisites:

Intelligent Systems or Machine Learning
At least one of: Introduction to Specification (MSc), Probability and Computing, or Probabilistic Model Checking

Please feel free to contact Dr Ani Calinescu to discuss any of these project topics in more detail.

Building an Animation or Simulation System Stephen Cameron B

The simulation of objects and beings within a computer has become a hot topic over the last few years, with applications to virtual reality, the design of safe stadia, synthetic actors and environments for films, and computer games, amongst others. This project would look at the design and construction of a relatively simple animation or simulation system, that would probably sacrifice ease of animation in order to focus on the modelling of interactions between the actors and their environments.

Such a project would probably make use of existing collision detection code within the computing laboratory, coupled with algorithms that simulate the physical interaction of bodies. (An alternative would be to use one of the physical modelling APIs that are now becoming available.)

Developing the Robot Sheepdog Stephen Cameron B In 1998 a consortium consisting of the Computing Laboratory, Silsoe Research Institute, and the Universities of Leeds and Bristol, successfully demonstrated a robotic device that was capable of herding live ducks, and thus demonstrated the efficacy of a mathematical model of flocking behaviour. In order to carry on with this work (in a way that avoids having to muck out real animals!) we are repeating the work but this time with robot ducks. Projects to help continue this work are available, and suggested topics include (but are not limited to): building better sensing or (radio) communication hardware for existing ducks; designing (and possibly testing) novel algorithms to look at cooperation between sheepdogs; using machine learning to allow the shepdogs to learn; and building better simulations of the robots.
Helicopter Robots Stephen Cameron B  C Small quad-rotor helicopters are now available that can fly indoors; for example, the AR-Drone (ardrone.parrot.com) carries a video camera and can be driven from an Android or iOS device, for which a suitable software library is now available. This project will develop software for such a helicopter. Possible specific projects might: attempt to fly the helicopter automatically; get the helicopter to look for open doorways; analyse the camera images; develop a game; etc.
Meet & Greet Robot Stephen Cameron B  C One interesting use of a small mobile robot is to go up to people in a room, and offer to provide them with information. For example, at an Admissions Open Day such a robot could supply them with information about where to find the lecture theatre, or how to find a particular College. We have a suitable small robot (www.turtlebot.com), which contains a Kinect sensor to allow it to sense its surroundings, and a netbook for its brain'. This project would either work with the real robot, or develop software for a simulated robot, and could either focus on the low-level navigation software; or more on the high-level functions; or even on speech recognition. The project has the potential to be split among a small group of students.
Object-Oriented Splines Library Stephen Cameron C Many objects are described to computer design and animation systems as a number of curved surface pieces, called splines patches. There are many different families of such splines (with different mathematical -- and thus aesthetic -- properties), including Bezier, B-spline, and NURBS. One way of describing these patches is to use (normally cubic) polynomial equations, and many implementations make this view foremost to the user. However an alternative viewpoint is to treat the patches as objects that are combined using a small number of operations to make arbitrarily complex objects. The aim of this project would be to design and construct an object-oriented software library for constructing and using splines in this form. The system would include a demonstration testbed (probably written in C++ or Java) that would allow splines curved to be traced out, combined to form one or more surface patches, and displayed in a graphics window. Experience of graphics or splines would be an advantage, and with object-oriented design techniques a necessity.
Robot Path Planning Stephen Cameron B

A robot cannot find its way by looking at a map designed for humans; rather, it must make a complex series of geometric calculations. Many algorithms have been developed for robot path planning, and the purpose of this project is to investigate (and improve upon) some of these. The robot' might be an arbitrary convex polygon that can slide in a world composed of polygonal obstacles; an industrial robot arm; or a human trying to find their way across a strange city.

Which algorithm(s) you investigate is largely up to you: they range from pragmatic, suck-it-and-see' methods to mathematically-complicated methods that are guaranteed to succeed (after a lot of CPU time!). There is also considerable scope to think of novel robots. For example, previous groups have considered:

• a planetary exploration vehicle;
• a smart torpedo;
• and a model railway track.
Robot Sheepdog Stephen Cameron C In 1998 a consortium consisting of the Computing Laboratory, Silsoe Research Institute, and the Universities of Leeds and Bristol, successfully demonstrated a robotic device that was capable of herding ducks, and thus demonstrated the efficacy of a mathematical model of flocking behaviour. In order to carry on this work (in a way that avoids having to muck out real animals!) we are repeating the work but this time with robot ducks. Currently a group of undergraduate project students are building the first prototype devices, which should be commissioned by Easter. There is then scope for one or more MSc projects, using either these devices, or testing new devices, or just experimenting with new control regimes under simulation to improve the capabilities of the system. This project might suit students with an engineering bent, or an inclination towards AI or artificial life.
Robot Soccer Simulation Stephen Cameron B

For several years now FIRA (Federation of International Robot-Soccer Association) has promoted the development of robot soccer, with the stated goal of producing a team of robots that can play (and beat!) humans by the year 2050. Their main events are tournaments in which research teams play each other using small wheeled robots, but more recently they have introduced a simulation league (Simurosot'), in which a simulation server program is provided as the playing surface'. The programmers then construct their teams as client programs, and two such teams can then play each other with the server acting a referee and providing an overhead view of the play.

The key to producing a (simulated) robot team is deciding how each player will move in response to (a) its position relative to the ball and goals, and (b) the positions of the other players --- this information is provided to both teams by the server on each cycle. Robot soccer is in its infancy, and most play is laughable; in order to produce interesting play it is necessary to encode {\em strategy} and {\em tactics}, and so it is expected that most of the effort in this project would focus on providing such a capability. Apart from that, the project is open-ended; a really interesting (but difficult!) extension of this project would have your team capable of learning about your opponent's tactics, and reacting to them!

Requirements No physical robots need be touched to do this project. The FIRA server as supplied runs on PCs under Windows, and similar servers are available for Linux. Team strategy can be encoded in a standard language (e.g., C), or using a language that is designed for describing agents.

This project will be more fun if multiple teams are constructed, so that they can play each other!

Links I've constructed a web-page for robot soccer links at: http://www.comlab.ox.ac.uk/~cameron/soccer/, which includes links to the FIRA site and the basic rules of the game.

Three Dimensional Sculptured Surface Graphics Stephen Cameron B

The aim of this project is to produce a system on which it is possible to design and render (display) a complicated surface, of the sort used in many graphics systems. These surfaces are defined by a number of points in space, with the surface itself fitting smoothly between these points. Thus the required system must make it easy for the user to define these points. This can be done by starting with a regular array of points in a plane, which the user can tweak using the computer's mouse. To be useful as a design tool the system should be totally interactive: the user should be able to use the windowing system to view the grid of control points from any viewing angle, using icons to control the interaction. When the user is satisfied with the set of control points he should be able to see a display of the smooth surface itself.

Several variations of this project are possible, depending on the way that the graphical user interface is constructed, the types of surfaces considered, and the degree of realism in the display of the surface.

The system could be written on some machine that you have easy access to (e.g., a PC), or using a Sun (in which case it would probably have to be written in C or Java to make full use of the Sun graphics)

Quantum Computing and Quantum Information, Logic, Category Theory, Fundamental Physics Bob Coecke B  C

Bob Coecke is willing to supervise projects in the following areas. Please feel free to contact him.

- Development and applications of the categorical quantum mechanics formalism and corresponding graphical languages

http://en.wikipedia.org/wiki/Categorical_quantum_mechanics

- Development of the quantomatic automated graphical reasoning software

- Theory development and empirical experiments for meaning in natural language processing http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf

http://www.cs.ox.ac.uk/industry/content/IndustryNewsSummer2011.pdf

Software development: quantomatic, automated graphical reasoning tools Bob Coecke B  C Recent graph-based formalisms for computation provide an abstract and symbolic way to represent and simulate quantum information processing. Manual manipulation of such graphs is slow and error prone. This project employs a formalism, based on monoidal categories, that supports mechanised reasoning with open-graphs. This gives a compositional account of graph rewriting that preserves the underlying categorical semantics.
Kinect interface development for cybersecurity visual analytics tool Sadie Creese B  C

(Joint with Michael Goldsmith) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces for security possible, depending on interest and inspiration.

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs Sadie Creese B  C

(Joint with Michael Goldsmith)

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue.

Smartphone security Sadie Creese B  C

(Joint with Michael Goldsmith)

Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.

### Prerequisites:

Concurrency, Concurrent Programming, Computer Security all possibly an advantage.

Technology-layer social networks Sadie Creese C

(joint with Michael Goldsmith)

Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc.

Concurrent tableau construction for Description Logic concepts Andrew Bate, Bernardo Cuenca Grau B  C

Tableau model construction algorithms have seen a great deal of attention in Knowledge Representation and Reasoning research in recent years.

However, to the best of our knowledge, all of the well-known reasoners that implement such algorithms only do so in a single-threaded manner. This project would explore the possibility constructing the model of ALC concepts concurrently.

This project has a well-focused core problem, but also offers many possible extensions, such as concurrent computation of the taxonomy of ALC concepts of a knowledge base.

Prerequisites: Logic and Proof, Knowledge Representation and Reasoning; Concurrent Programming

Exploiting background knowledge in ontology matching Bernardo Cuenca Grau B  C

LogMap is a highly optimised ontology matching system developed by members of the Knowledge Representation and Reasoning Group with EPSRC funding (see http://www.cs.ox.ac.uk/projects/LogMap/index.html for details). The NCBO Bioportal (http://bioportal.bioontology.org) is a large repository of bio-medical ontologies and cross-references between them. The goal of this project is to exploit the information provided by BioPortal as background knowledge to improve ontology matching results in LogMap. Main tasks associated to the project: - Understanding how other matching tools (such as AgrMaker and GOMMA) exploit BioPortal to improve their results. - Understanding communication with Bioportal web services to query ontologies, mappings and other resources. - Understanding the LogMap's algorithm and devising a suitable way to communicate with BioPortal without harming scalability. - Implementing BioPortal access in LogMap. - Comparison with other matching tools.

Interactive ontology matching in LogMap Bernardo Cuenca Grau B  C

LogMap is a highly optimised ontology matching system developed by members of the Knowledge Representation and Reasoning Group with EPSRC funding (see http://www.cs.ox.ac.uk/projects/LogMap/index.html for details). In applications requiring very accurate mappings, user intervention during the matching process becomes essential. LogMap implements basic features allowing domain experts to interactively contribute to the matching process. These features are, however,still rather rudimentary. The main goal of this project is to extend and improve LogMap's support for interactive ontology matching. From an algorithmic point of view, the main challenge is to limit to a minimum the number of questions that need to be asked to a human expert, while maximising the benefit of experts' feedback. From an application development point of view, the main goal is to develop a graphical front-end for LogMap that facilitates user interaction.

Resolution-based Knowledge Compilation Algorithms for Ontology Reasoning Andrew Bate, Bernardo Cuenca Grau B  C In a seminal paper, Selman and Kautz proposed a resolution-based algorithm for compiling a set of propositional clauses S into a set of Horn clauses H such that every Horn consequence in S is derivable from H too. This algorithm was subsequently extended to first-order logic, but without any termination guarantees. Recently, this classical knowledge compilation technique based on resolution has found a novel application in the area of ontology reasoning. In particular, it can be used for “encoding away” disjunctive axioms in ontologies while preserving (for every data set) answers to practically relevant queries. The main goal of the project is to implement the classical resolution-based knowledge compilation algorithm for a given fragment of first-order logic, optimize it and test it on a library of realistic ontologies. The results of the project could be used to assess the applicability of mainstream knowledge compilation techniques to ontology reasoning. Prerequisites: Logic and Proof, Knowledge Representation and Reasoning; Optional: Theory of Data and Knowledge Bases
Clean up your Extraction: Deduplication for Data Extraction Tim Furche B  C Deduplication is the problem of identifying and removing duplicates in data. Though well studied in data integration and cleaning, nowadays data is increasingly extracted automatically from web sources. This yields noisy data that is not easily treated with existing deduplication algorithms. However, the extraction process can often supply strong clues on whether two data items are the same, e.g., the URL of a canonical page describing that item. In this project, we will investigate deduplication in the context of data extraction. We will consider rst deduplication within a site, both in presence and absence of unique item URLs. Second, we consider deduplication in result extracted from di erent sites with the aim to align record structures from both pages where possible to nd duplicates. For this project some knowledge of databases and data cleaning is useful. The project should be driven by extensive evaluation.
Docu-Research. Object Annotation and Visualization for PDF Documents Tim Furche B  C (Joint supervisor with G Orsi) PDF is the de-facto standard for unstructured documents on the web. Being able to e_ectively analyse, annotate and manipulate PDF documents is a challenging area of research and a pro_table commercial opportunity. In particular, current PDF processing tools (e.g., Adobe Acrobat) allow users to search the full content of the PDF but not to locate interesting objects in it (e.g., what is the contribution of this paper? Is there any budget in this project proposal?). Two particularly challenging aspects of this problem are the annotation and the visualization of such objects. This project is concerned with the semantic annotation of structured object in scienti_c PDF documents (e.g., conference-papers, journals, and project proposals) as well as their e_ective visualization. Good Knowledge of Java is required. Knowledge about (semantic-)web technology and natural- language processing is preferred.
Form Corpus & Benchmark Tim Furche B  C (Supervisor C Schallhart) Web pages are the past since interactive web application interfaces have reshaped the online world. With all their feature richness, they enrich our personal online experience and provide some great new challenges for research. In particular, forms became much complex in assisting the user during the _lling, e.g., with completion options, or through structuring the form _lling process by dynam-ically enabling or hiding form elements. Such forms are a very interesting research topic but their complexity prevented so far the establishment of a corpus of modern forms to benchmark di_erent tools dealing with forms automatically. This MSC project will _ll this gap in building a corpus of such forms: Based on a number of production sites from one or two domains, we will build our corpus of web interfaces, connected to a (shared) database. Not only will the future evaluations in the DIADEM project rely on this corpus, but we will also publish the corpus { promoting it as a common benchmark for the research community working on forms. Knowledge in Java, HTML, CSS, Javascript, and web application development are required.
Form Filling & Probing Tim Furche B  C (Joint with C Schallhart) Unearthing the knowledge hidden in queryable web sites requires a good understanding of the involved forms. As part of DIADEM, we are developing OPAL (Ontology based web Pattern Analysis with Logic), a tool to recognize forms belonging to a parameterizable application domain, such as the real estate or used car market. OPAL determines the meaning of individual form elements, e.g., it identi_es the _eld for the minimum or maximum price or for some location. This MSC project will build upon OPAL to not only deal with static forms but also with sequences of interrelated forms, as in case of a rough initial form, followed by a re_nement form, or in case of forms showing some options only after _lling some other parts. Over the course of this MSC project, we will develop a tool which invokes OPAL to analyze a given form, to explore all available submission mechanisms on this form, analyze the resulting pages for forms continuing the initial query, and to combine the outcome all found forms into a single interaction description. Knowledge in Java, HTML, CSS are required, prior experience in logic programming would be a strong plus.
Good Wrappers are Lazy Wrappers: Avoiding Page Rendering Tim Furche B  C The wealth of information on the web is exceeding any other human-created source of information by orders of magnitude. Harnessing that data, however, requires increasingly automated methods that can collect relevant data from many websites and lter it,e .g., for further human consumption. Automatizing such tasks, however, increasingly requires rendering all web pages involved, as scripted interfaces only work if the page is properly rendered and wrappers increasingly use visual features for robust extraction. However, page rendering comes at a signi cant cost, e.g., in OXPath it dominates OXPath evaluation by a wide margin. Therefore, we investigate in this project how to automatically detect when a page accessed by an (OXPath) wrapper needs to be rendered. We aim to do so by a combination of static analysis of the expression and statistics gathered from previous runs. This project is particularly suited for students interested in browser technologies and their use in data extraction. Familiarity with Javascript and the DOM API would be helpful. Since 1 OXPath is implemented in Java and the method developed will be evaluated by implementation as an extension to OXPath, extensive knowledge of Java is needed.
Mine4Nuggets. Searching for objects into images Tim Furche B  C

(Joint supervisior with G Orsi) Most of the information on commercial websites (e.g., Amazon, Rightmove, and Autotrader) is rep-resented by (semi-)structured pages where the interesting data is structured into de_ned structured or visual patterns. However, some critical information is often represented with an image or a chart. As an example, on many real-estate websites, data about the energy e_ciency of the property is represented by an EPC Chart (http://www.epcchart.com/), while the actual size of the rooms is represented through a oor-plan http://en.wikipedia.org/wiki/Floor_plan). Have you ever tried to search for an energy-e_cient property or one with a large bedroom? The answer is: you can't. This project is concerned with the study of semantic image analysis techniques that can be used to locate and extract interesting objects from pictures and charts on the web. The goal is to enhance current search technology with object-search capabilities for images, a _eld that o_ers challenging research and potential commercial opportunities. The project will be conducted within the DIADEM project (http://diadem.cs.ox.ac.uk). Good knowledge of Java is required, previous knowledge about image analysis, computer vision and OCR technology is preferred.

One Path to Find them All: Learning OXPath Wrappers Tim Furche B  C

The wealth of information on the web is exceeding any other human-created source of information by orders of magnitude. Harnessing that data, however, requires increasingly automated methods that can collect relevant data from many websites and lter it,e .g., for further human consumption. OXPath (http://diadem.cs.ox.ac.uk/OXPath/) is a wrapper language developed at Oxford in the DIADEM (diadem-project.info) project. It is particularly well suited for extraction from rich internet applications with sophisticated client-side interfaces. Manually creating such wrappers, however, still remains a daunting task. In this project, we therefore aim to develop a method for learning OXPath wrappers from sets of annotated examples, that indicate, e.g., what data to extract or which actions to perform. The challenge in such a learning algorithm lies in the need to abstract from multiple examples to a single, compact OXPath expression that should be close to optimal under a set of additional criteria. Though there is a signi cant body of literature on learning XPath expressions, OXPath is a richer language and introduces new challenges. We also speci cally aim at extensive experimental validation of the developed techniques. This project is particularly suited for students interested in modern web technology and familiar with compiler design or query optimization. The learner will be implemented in Java, where we provide a set of customized APIs for action execution and DOM interaction.

Automating algebraic reasoning for CSP processes Thomas Gibson-Robinson, Gavin Lowe C It is often useful to rewrite a CSP process into a sequential form, using a combination of algebraic laws, particularly step laws, and the Unique Fixed Point rule. Many exercises on the Concurrency course ask students to perform such rewriting. The aim of this project is to automate this rewriting. It is hoped that the resulting processes will be easier to understand. Also, the tool should be useful pedagogically. The project will build on the Haskell CSP evaluator [1]. This project would require the Concurrency course and experience of programming in Haskell. [1] libcspm: https://github.com/tomgr/libcspm
Visualising CSP Compressions Thomas Gibson-Robinson, Gavin Lowe C

FDR3 has a number of inbuilt compression functions, such as normal, diamond and sbisim, that aim to reduce the number of states in the system. Typically, compressions can yield substantial reductions in the number of states, and thus the time taken to check a particular problem, but it can be somewhat difficult to reason about why a compression does or does not work. The aim of this project would be to produce a tool that allows the results of a compression function to be visualised, in order to allow a user to understand why a compression is or is not working. Essentially, the tool would display both the original CSP process (as a labelled transition system), and the resulting compressed transition system, and would provide a way of indicating how the states of the two systems are related. This tool would build on top of the compression functions of FDR3 [1], and would hopefully be integrated on successful completion. This project would require the Concurrency course. Some knowledge of computer graphics concepts would be useful, but not required.

Kinect interface development for cybersecurity visual analytics tool Michael Goldsmith B  C

(Joint with Sadie Creese) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces for security possible, depending on interest and inspiration.

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs Michael Goldsmith B  C (Joint with Sadie Creese) Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue.
Smartphone security Michael Goldsmith B  C

(Joint with Sadie Creese) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.

### Prerequisites:

Concurrency, Concurrent Programming, Computer Security all possibly an advantage.

Technology-layer social networks Michael Goldsmith C (Joint with Sadie Creese) Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc.
Classifying and Benchmarking Web Automation Tools Georg Gottlob B  C

Background: The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on.

Principal goal of the MSc or Honour School project:

This project consists in a case-study on web automation tasks in order to validate the ease-of-use of existing tools of web automation such as OXPath and ChickenFoot.
In particular, given some (complex) real tasks on the web, we want to study how straighforward would be solving them by using such tools.
This proposal could lead to a survey on existing web automation tools, as well as for improving the usability of OXPath.

Skills Needed: This project requires good analytic and theoretical skills. Also, knowledge of XPath and Java is preferred.

Supervision: This project will be co-supervised by Dr. Tim Furche

Domain-Based Structural Web Page Analysis and Data Extraction Georg Gottlob B  C

Background: There are several projects available. The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on. Ultimately, we want to construct a system that is able to automatically navigate Web pages within a given application domain and extract relevant data from that pages. For example a system for the real-estate domain should accept as input an URL of an estate agent and output all properties (houses or flats) that are currently advertised by this agent to be for sale. The output should be a highly structured XML file obeying a certain pre-defined schema.

Principal goal of the MSc or Honour School project: Contribute to the DIADEM project by providing useful studies and building blocks. Examples of such studies or building blocks are:

• A characterization of typical patterns on web pages of a certain application domain.
• A characterisation of access patterns for search masks and navigation sequences on web pages belonging to specific domains.
• Software for the automatic construction of dictionaries of domain-specific terms extracted from Web pages.
• Software for identifying compound data objects on domain-specific web pages.
• Implementation of building blocks of a reasoner over web data objects.
• Selection and integration of existing annotators for Web page analysis.
• Definition of new annotators.
• Implementation of a scheduler for highly parallel web data extraction through cloud computing.
• Testing components of the DIADEM system.

Note: The DIADEM project runs from 2010 to 1015. Every year only a few MSc or Honour School projects will be available. Some will be co-supervised by DIADEM research staff.

Skills Needed: Each of these projects requires good analytic and theoretical skills, software engineering skills, knowledge of web programming and web data processing, good knowledge of Java and Eclipse.

Machine learning with Hadoop Georg Gottlob, Andy Twigg B  C

This project is about implementing a machine learning algorithm called random forests on top of Hadoop, using the Apache Mahout open-source project. Mahout (http://mahout.apache.org) is a popular collection of open-source machine learning algorithms, that a re designed to be scalable and run on top of Hadoop (http://hadoop.apache.org), the popular open-source implementation of Google's map-reduce framework.

Random forests are a particularly effective and simple machine learning algorithm (see http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm and http://oz.berkeley.edu/users/breiman/randomforest2001.pdf for descriptions of the algorithm). At present, Mahout only has a simple implementation of random forests that does not scale. This project is about implementing a scalable version of random forests using Mahout and Hadoop. If the project is successful, your work will be used by many others!

See http://comments.gmane.org/gmane.comp.apache.mahout.devel/23326 for an initial discussion of this on the mailing list.

Prerequisites: Java, basic algorithms knowledge

Ideally: some understanding of distributed computing (map-reduce, Hadoop), some interest in machine learning (everything needed can be learned)

Web Form Probing Georg Gottlob B  C

Background: The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on.

Principal goal of the MSc or Honour School project:

A relevant task in Deep Web data extraction system is the so-called Form-Probing. Roughly, given a web form where each field is annotated with some concept, probing a form involves tasks such as:
(i) validating its annotations by submitting and checking the response page, (ii) forming meaningful queries given a database of values to submit, (iii) devise smart approaches to reduce the number of possible queries (e.g. in case of fields dependent on each other, like Min and Max price).
This proposal aims at designing probing heuristics and developing  such techniques in an extensible framework (plug-in).

Skills Needed: This project requires good theoretical, analytic and software engineering skills. Knowledge of Java and web-technologies is also required.

Supervision: This project will be co-supervised by Dr. Tim Furche.

Bioinformatics Projects Jotun Hein B  C

Bioinformatics Projects can be found here.

If you choose any of these projects you must find a joint supervisor from the Department of Computer Science. Pete Jeavons can advise on a suitable choice.

Compilation of a CSP-like language Geraint Jones B  C

This is a compiler project, also requiring familiarity with concurrency.

The parallel programming language occam is essentially an implementable sublanguage of CSP. The aim of this project is to produce a small portable implementation of a subset of occam; the proposed technique is to implement a virtual machine based on the inmos transputer, and a compiler which targets that language.

One of the aims of the project is to implement an extension to occam which permits recursion; more ambitious projects might implement a distributed implementation with several communicating copies of the virtual machine. Other possibilities are to produce separate virtual machines, optimised for displaying a simulation, or for efficiency of implementation, or translating the virtual machine code into native code for a real machine.

Logic circuit workbench Geraint Jones B  C

This is an interactive programming project with some logic simulation behind it.

The idea is to produce a simulator for a traditional logic breadboard. The user should be able to construct a logic circuit by drawing components from a library of standard gates and latches and so on, and connecting them together in allowable ways. It should then be possible to simulate the behaviour of the circuit, to control its inputs in various ways, and to display its outputs and the values of internal signals in various ways.

The simulator should be able to enforce design rules (such as those about not connecting standard outputs together, or limiting fan-out) but should also cope with partially completed circuits; it might be able to implement circuits described in terms of replicated sub-circuits; it should also be able to some sort of standard netlist.

It might be that this would make a project for two undergraduates: one implementing the interface, the other implementing a simulator that runs the logic.

Modelling of arithmetic circuits Geraint Jones B  C

This is a project in the specification of hardware, which I expect to make use of functional programming.

There is a great deal of knowledge about ways (that are good by various measures) of implementing standard arithmetic operations in hardware. However, most presentations of these circuits are at a very low level, involving examples, diagrams, and many subscripts, for example these descriptions.

The aim of this project is to describe circuits like this in a higher-level way by using the higher order functions of functional programming to represent the structure of the circuit. It should certainly be possible to execute instances of the descriptions as simulations of circuits (by plugging in simulations of the component gates), but the same descriptions might well be used to generate circuit netlists for particular instances of the circuit, and even to produce the diagrams.

Mosaic player Geraint Jones B

This is an interactive graphics programming project.

This is an idea based on a child's toy, consisting of a collection of acrylic pieces. There are several colours, and two shapes: isosceles right-angled triangles, and rhombuses with 45 degree and 135 degree angles and sides to match the equal sides of the triangles. The player arranges (all) the pieces in one of the many ways that produce a square of the right size, with a pattern that has several (reflectional or rotational) symmetries.

The project is to design and build a small program that can be used to simulate the play: perhaps the user specifies symmetries, and then places a single piece which is replicated in several places as required by the symmetry.

There is a small subtlety in that the target square has a side which is an irrational number of the unit length (edge of the rhombus), so there is no integer grid on whcih the pieces lie. (It is very easy to make a mistake with the real toy which almost produces a square of the wrong size.)

Toys for Animating Mathematics Geraint Jones B  C

The aim is to take some mathematics that would be within the grasp of a mathematically-inclined sixth-former and turn it into some attention-grabbing web pages. Ideally this reveals a connection with computing science. I imagine that this project necessarily involves some sort of animation, and I have visions of Open University television maths lectures.

The programming need not be the most important part of this project, though, because some of the work is in choosing a topic and designing problems and puzzles and the like around it. There's a lot of this sort of thing about, though, so it would be necessary to be a bit original.

Think of GeomLab (and then perhaps think of something a little less ambitious). It might involve logic and proof, it might be about sequences and series, it might be about graphs, it might be about the mathematics of cryptography... you might have something else in mind.

Comp Bio Projects David Kay C

Dr David Kay will be offering the following project titles:

• Fractional diffusion models for the simulation of electrical propagation in the heart
• Creating patient-specific models of the lung to investigate chronic obstructive pulmonary disease
• Modelling air flow in lungs
• Modelling gas exchange within the lungs, a porous media model
• Flow of particles within the bloodstream
• Modelling Airway flow in lungs
• Modelling gas exchange within the lungs, a porous media model
• Developing an integrated multi-scale model of pulmonary blood vessels coupled to fluid flow
• Pathophysiology of pulmonary function of the anaesthetised horse
• Creating patient-specific models of the lung to investigate chronic obstructive pulmonary disease
• Computational modelling of coupled electrophysiology, tissue mechanics and fluid flow in the heart

The project description for each can be found here.

Solving simple steganography games Andrew Ker C

Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. Steganalysis is the art of detecting that such hiding took place. We know understand that these aims form a "game", in the sense that the optimal strategy for the steganography depends on the behaviour of the steganalyst, and vice versa. This project aims to describe and solve very simple steganography games, establishing some "toy" baseline results. This theoretical and/or numerical work is not set out in any literature (yet) and will require a lot of initiative from the student.

Prerequisites: Linear algebra, understanding of nash equilibrium an advantage

Transparent session-layer steganography Andrew Ker B  C

Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. This project is to implement a transparent proxy which uses something like a webcam video stream to hide steganographic packets. A local process should receive communication on a local socket and merge it with the webcam data stream, a process which can be reversed at the receiver.

### Prerequisites:

It is necessary to understand something of video formats, for example experience with working on the H.264/5 codec, in order to place the payload in a webcam stream. Note that it is NOT possible to learn a video codec within the time available.

Visualization of steganalytic features Andrew Ker B  C

Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media (in this project: images). Steganalysis is the art of detecting that hiding took place. The most effective ways to detect steganography are machine learning algorithms applied to "features" extracted from the images, trained on massive sets of known cover and stego objects. The images are thus turned into points in high-dimensional space. We have little intuition as to the geometrical structure of the features (do images form a homogeneous cluster? do they scale naturally with image size?), or how they are altered under embedding (do they move in broadly the same direction? is there is linear separator of cover from stego?). This project involves the implementation of visualization tools for features, extracting them from images and then projecting them onto 2-dimensional space in interesting ways, while illustrating the effects of embedding. Visual style and clarity will be important.

### Prerequisites:

Computer graphics and linear algebra

mp3 steganography Andrew Ker B  C

Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. There is lots of literature on hiding, and detection of hiding, in images, but rather little in audio. But mp3 audio is widely transmitted on file-sharing networks and thus provides an excellent cover. One slightly old-fashioned example of mp3 steganography can be found here http://www.petitcolas.net/fabien/steganography/mp3stego/ This project is to explore and develop mp3 steganography, adapting modern image hiding techniques to the audio format and implementing hiding and extraction programs.

### Prerequisites:

No course prerequisites, but this project is only suitable for students who are already familiar with the details of mp3 format and encoding, for example those who have worked on open-source audio codecs. It is NOT possible to learn the format within the time available.

Projects being offered by D Kroening Daniel Kroening B  C

Daniel Kroening is happy to supervise projects related to

• automated hardware and software verification
• program analysis
• SAT and decision procedures
• applications of SAT or SMT in other areas, e.g., systems biology or finance

For a list of sample projects, see here.

Algorithms for computing pseudometrics on probalistic models Taolue Chen, Marta Kwiatkowska C

Pseudometrics on labelled Markov chains or labelled Markov decision processes have been introduced in literature. They generalise the notion of probabilistic bisimilarity by assigning a similarity distance to pairs of states of the model under consideration. The smaller the distance, the more alike are the states, with states having distance zero if and only if they are probabilistic bisimilar. The pseudometrics have found applications in, e.g., systems biology, planning, and security. Theoretical algorithms based on linear programming have been proposed to compute such a distance. However, they perform poorly in practice. This project aims to develop more practical algorithms, based on value iteration, and incorporate the notion of “on-the-fly” analysis. It is expected that the algorithms will be implemented as an extension of PRISM and, ideally, also applied to some real case studies.

Prerequisites: Some familiarity of basic probability theory and probalistic models

Probabilistic Modelling and Verification Marta Kwiatkowska

Professor Marta Kwiatkowska is happy to supervise projects in the area of probabilistic modelling and verification, particularly those relating to the PRISM model checker. PRISM is an open source formal verification tool for analysis of probabilistic systems. PRISM has an extensive website which includes software for download, tutorial, manual, publications and many case studies.

Below are some concrete proposals for modelling and verification case studies in PRISM:

• Modelling and analysis of user-centric network protocols. User-centric networks are designed so that users cooperate to share network resources such as bandwith between them, and are encouraged to behave unselfishly through incentives. Recently we have analysed a trust model for user-centric networks using PRISM-games, http://www.prismmodelchecker.org/games/, an extension of the PRISM model checker which supports stochastic multi-player games as models and objectives expressed in probabilistic alternating-time temporal logic. See http://www.prismmodelchecker.org/bibitem.php?key=KPS13 for the outcome of the analysis in which a weakness was discovered and corrected in a protocol, and http://www.prismmodelchecker.org/bibitem.php?key=CFK+13 for the PRISM-games tool paper. This project aims to extend the analysis with more complex objectives such as limit averages or discounted sums.
• Probabilistic model checking for cellular microcolonies. The gro language enables the programming, modelling, specifying and visually simulating the behaviour of cells in growing microcolonies of microorganisms (http://depts.washington.edu/soslab/gro/index.html ).  gro has been used for teaching of synthetic biology.  Microcolonies can be described in a simple guarded command language and develop through e.g. cell division and cell growth.  This project aims to extend the gro framework with probabilistic model checking by translating the language to the PRISM modelling language and extending the functionality to allow for model checking queries specified in probabilistic temporal logic.
• Modelling and analysis of DNA biosensors.DNA molecules can be used to perform complex logical computation. DNA computation differs from conventional digital computation and is sometimes referred to as ‘computing with soup’ http://www.economist.com/node/21548488. Correct design of DNA devices is error-prone and the task is supported by tools such as the DNA Strand Displacement (DSD) programming language and simulator (http://research.microsoft.com/en-us/projects/dna/default.aspx) developed at Microsoft. The DSD tool enables the probabilistic model checking of DNA circuits and has been used to identify a flaw in a DNA transducer gate (see http://qav.comlab.ox.ac.uk/bibitem.php?key=LPC+12). The aim of this project is to model and analyse DNA biosensors (see DNA biosensors that reason, Biosystems. 2012 Aug;109(2):91-104. http://www.sciencedirect.com/science/article/pii/S0303264712000470.
• Modelling and analysis of energy usage for RFID protocols. Designers of devices based on RFID (Radio Frequency Identification protocol) can benefit from formal verification techniques, and particularly quantitative verification with which a number of cost measures can be analysed.  The PRISM probabilistic model checker has been successfully used to analyse the low-cost RFID protocol proposed by Song and Mitchell (Ioannis Paparrizos, Stylianos Basagiannis and Sophia Petridou. Quantitative Analysis for Authentication of Low-cost RFID Tags. In Proc. 36th IEEE Conference on Local Computer Networks (LCN'11), IEEE. 2011. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6115307). This project aims to build upon the modelling approach developed in that paper to analyse energy usage of RFID-based devices.
• Development of a GUI for a stochastic hybrid model of a human heart (with Alex Mereacre).Cardiac pacemakers contain embedded software that monitors the electrical signal of the heart and stimulates it if the heartbeat is too slow. Quantitative analysis can be employed to establish, e.g., if the pacemaker can correct the slow heartbeat. In order to check the correctness of pacemakers, appropriate models of the heart have to be developed, see e.g. http://qav.comlab.ox.ac.uk/bibitem.php?key=CDKM12b. This project aims to support the model-based design for pacemaker software by developing a graphical user interface (GUI) in Matlab for a stochastic hybrid model for the human heart (see http://qav.comlab.ox.ac.uk/bibitem.php?key=CDKM13 for more detail of the model). The purpose of the GUI is to visualize the conduction of electricity thorough the human heart. The software will enable the generation of various heart diseases by setting appropriate parameters.  In addition, the GUI will depict the configurations of an artificial pacemaker used to correct the heart behaviour.
• Algorithms for computing pseudometrics on probabilistic models (with Taolue Chen). Pseudometrics on labelled Markov chains or labelled Markov decision processes have been introduced in literature. They generalise the notion of probabilistic bisimilarity by assigning a similarity distance to pairs of states of the model under consideration. The smaller the distance, the more alike are the states, with states having distance zero if and only if they are probabilistic bisimilar. The pseudometrics have found applications in, e.g., systems biology, planning, and security. Theoretical algorithms based on linear programming have been proposed to compute such a distance. However, they perform poorly in practice. This project aims to develop more practical algorithms, based on value iteration, and incorporate the notion of “on-the-fly” analysis. It is expected that the algorithms will be implemented as an extension of PRISM (www.prismmodelchecker.org) and, ideally, also applied to some real case studies.

Secondly, there are various opportunities to enhance or improve the functionality of the PRISM model checker itself. Examples include:

• development of a framework for using external input languages in PRISM, e.g. SBML, process algebras
• implementation of a new property editor for PRISM, including a translator from English specifications to temporal logic
• development of a graphical modelling language for PRISM
Analysis of Security Protocols Gavin Lowe B  C

A security protocol consists of an exchange of messages between two or more agents, with goals such as establishing a cryptographic key, or authenticating the identities of the agents. These protocols are designed to operate in particularly hostile environments, where an adversary or intruder may be trying to attack the protocol, for example to learn the value of a key. Designing secure protocols has proven to be remarkably difficult; in some cases, attacks have been discovered several years after the protocol was first suggested. We have developed a systematic technique for analysing these protocols. Briefly the technique is as follows:

• The protocol is described formally in an abstract language.
• A compiler called Casper is used to compile the protocol description into a description in the process algebra CSP.
• The CSP description is fed to the model checker FDR, which tests whether the protocol correctly achieves its goals. In the case where the system does not meet a goal, FDR returns a sequence of messages that demonstrates how the goal fails, which is equivalent to the sequence of steps the intruder must take in order to attack the protocol.

A recent extension of these ideas has been to the analysis of layered protocols, where a special-purpose, application-specific protocol is layered
on top of a general-purpose secure transport layer protocol, such as SSL/TLS.
For such layered architectures, the CSP model created by Casper abstracts away from the details of the secure transport protocol, and just models the
security services it provides to the application protocol [3].  The goal of  this project would be to apply this technique to study one or more layered
protocols.  An alternative would be to apply these techniques to so-called human-mediated protocols [4], where a human is responsible for transferring or
verifying some messages in the protocol; we believe that similar analysis techniques can be applied to such protocols.

### Prerequisites:

The Concurrency and Computer Security courses would be prerequisites for this project.

### References

• [1] Gavin Lowe, Casper: A Compiler for the Analysis of Security Protocols, Journal of Computer Security, 1998.
• [2] Peter Ryan, Steve Schneider, Michael Goldsmith, Gavin Lowe, Bill Roscoe, Modelling and Analysis of Security Protocols, Addison Wesley, 2001.
• [3] Specifying Secure Transport Layers, Christopher Dilloway, Gavin Lowe.  In 21st IEEE Computer Security Foundations Symposium (CSF 21) 2008.
http://web.comlab.ox.ac.uk//files/116/channels.pdf.
• [4] Exploiting Empirical Engagement in Authentication Protocol Design, Sadie Creese, Michael Goldsmith, Richard Harrison, Bill Roscoe, Paul Whittaker, and
Irfan Zakiuddin.  In Proceedings of SPPC 2005, ttp://web.comlab.ox.ac.uk/people/Bill.Roscoe/publications/103.pdf
Compiling CSO to CSP Gavin Lowe B  C

Not avilable in 2012/13

The aim of this project would be to build a compiler (or compiler plug-in) to translate Scala code written using the CSO library into corresponding CSP models. It would also be useful to develop techniques for automating analysis of various desirable properties, such as freedom from race conditions, and to develop techniques for abstracting some aspects of the CSP models so as to make model checking feasible.

### Prerequisites:

The Concurrency, Concurrent Programming and Compilers courses would be prerequisites.

Secure Channels in CSO Gavin Lowe B

CSO [1] is a concurrency and communication library for the Scala programming language. The aim of this project would be to extend CSO with channels that allow secure communication between processes. The channels should implement some of the specifications in [2]. The Concurrent Programming and Computer Security courses would be prerequisites.

### Prerequisites:

The Concurrent Programming and Computer Security courses would be prerequisites.

### References:

Inconsistency-Tolerant Query Answering in the Semantic Web Thomas Lukasiewicz B  C

### Background:

The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering. Next-generation technologies for semantic search and query answering on the Web can be realized via knowledge bases extracted from the Web relative to an underlying ontology. The realization of such knowledge bases requires a principled and sensible way of managing conflicting information. Though inconsistency has been addressed for many years now in artificial intelligence and database theory, the search for flexible and computationally efficient inconsistency-tolerant semantics is a very active research topic.

### Potential Projects:

● Characterization of inconsistency and repairing for different fragments of the Datalog+/-family of ontology languages.

● Study of desirable properties of repairs and development of adequate preference relations among repairs.

● Development of alternative inconsistency-tolerant semantics for query answering in different fragments of Datalog+/-: analysis of the trade-off between quality of answers and computation cost.

● Argumentation is one of the most well-known and capable systems for inconsistency management. The study of argumentation aspects and frameworks for different fragments of the Datalog+/- family of ontology languages as part of the above points.

● Implementation of repairing and inconsistency-tolerant query answering techniques on fragments and extensions of Datalog+/-.

### Prerequisites:

Participation in all aspects of the project require good analytical skills and background in knowledge representation and reasoning (in particular, first-order logic and logic programming). For the theoretical aspects, background in computational complexity and database theory would be ideal. The practical (implementation-oriented) parts require software engineering skills, knowledge of web programming to develop web interfaces, as well as good knowledge of Java and Eclipse.

### Supervision:

The projects based on implementation will be supervised by Maria Vanina Martinez, while the ones based on theoretical aspects will be co-supervised by Thomas Lukasiewicz and Maria Vanina Martinez.

Ranking and Top-k Query Answering under Uncertainty in the Semantic Web Thomas Lukasiewicz B  C

### Background:

The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering. Next-generation technologies for semantic search and query answering on the Web can be realized via knowledge bases extracted from the Web relative to an underlying ontology. Such knowledge bases require probabilistic data models, along with scalable query answering algorithms, which are both still major unsolved problems. They may be based on integrating (i) ontology languages, (ii) database technologies, and (iii) formalisms for managing probabilistic uncertainty.

### Potential Projects:

● Characterization of ranking for non-probabilistic ontologies via explicitly defined user preferences. Key questions involve how to represent the user preferences, how to automatically obtain them (e.g., from the Web), and how to use them for ranking answers to ontological queries over knowledge bases.

● Extension of the above ranking techniques to probabilistic ontologies, i.e., combining the relevance of an answer with the probability that it is in fact true.

● Characterization of top-k query answering based on user preferences, and extension to probabilistic ontologies.

● Implementation of ranking and top-k querying on fragments of probabilistic Datalog+/-.

● Adaptation of sampling techniques for efficient query answering in probabilistic ontologies.

● Experimental evaluation of implementations over both synthetic and real-world data.

### Prerequisites:

Participation in all aspects of the project require good analytical skills, and background in knowledge representation and reasoning (in particular, first-order logic and logic programming). For the theoretical aspects, background in computational complexity and database theory would be ideal. The practical (implementation-oriented) parts require software engineering skills, knowledge of Web programming to develop Web interfaces, as well as good knowledge of Java and Eclipse.

### Supervision:

The projects based on implementation will be supervised by Gerardo Simari, while the ones based on theoretical aspects will be co-supervised by Thomas Lukasiewicz and Gerardo Simari.

Design Driven Acquisition of Bio-experiments Eamonn Maguire B  C

The Sansone Group at the Oxford e-Research Centre offers several student projects throughout the year, focused on developing and applying new methods and tools for improving management, curation and sharing of data in life science.

Supervisors: Susanna-Assunta Sansone; Philippe Rocca-Serra and Eamonn Maguire.

Graph Based Bio-experiments Store Eamonn Maguire B  C

The Sansone Group at the Oxford e-Research Centre offers several student projects throughout the year, focused on developing and applying new methods and tools for improving management, curation and sharing of data in life science.

Supervisors: Susanna-Assunta Sansone; Philippe Rocca-Serra and Eamonn Maguire.

Ontology Based Validation of Bio-experimental Records Eamonn Maguire B  C

(Joint Supervisor with Ian Horrocks)

The Sansone Group at the Oxford e-Research Centre offers several student projects throughout the year, focused on developing and applying new methods and tools for improving management, curation and sharing of data in life science.

Supervisors: Susanna-Assunta Sansone; Philippe Rocca-Serra and Eamonn Maguire.

Algorithm Development/Applications in Structure Based Biocomputing Peter Minary B  C

The increasing amount of experimental data for key macromolecules provides broad opportunities for the computational modeling and functional interpretation of their structures. Several projects are available in developing algorithms to increase the resolution and modelling to aid the functional interpretation of experimental structures. In recent years we already developed several key algorithms in this area (pubs. 17 - 19: http://www.cs.ox.ac.uk/people/peter.minary/csb/publications.html). We need further developments to extend the application domain of these algorithms to more experimental data types and larger and more challenging molecular complexes. A project may include developing basic algorithmic theory, software implementation or applications in collaboration with experimental efforts.

Prerequisites: excellent programming and statistical skills are desirable but the project can be tailored to suit those from a variety of backgrounds.

Developing New Protocols & Applications in Computational Epigenetics Peter Minary B  C

Epigenetics is the study of heritable changes in gene expression caused by mechanisms other than changes in the underlying DNA sequence. Epigenetic marks could include DNA modification and have been associated with cancer and a number of cardiovascular disease risk factors. We aim to develop novel computational procedures to predict how DNA structure and higher-level DNA packing in chromatin (http://www.nature.com/scitable/topicpage/dna-packaging-nucleosomes-and-chromatin-310) depends on sequence and on epigenetic effects such as DNA methylation. Given, that nucleoside analogue epigenetic drugs, which simply replace certain nucleotides in the DNA sequence, can be viewed as synthetic epigenetic marks, applications may include the design of new epigenetic drugs.

Prerequisites: the project can be tailored to suit those from a variety of backgrounds however experience of working with large-scale data sets is desirable.

Software Development for Structure Based Biocomputing Peter Minary B  C

Several projects are available to implement new algorithms/protocols into the MOSAICS (http://www.cs.ox.ac.uk/people/peter.minary/csb/MOSAICS.html) package. In particular, we have projects for the design of efficient Monte Carlo schemes, which are applicable to systems with high dimensionality. In addition, projects may involve collaboration with the Parallel Programming Laboratory PPL: http://charm.cs.uiuc.edu/ to optimize performance in parallel environments.

Prerequisites: experience in scientific computing and/or software development is desirable but the project can be tailored to suit those from a variety of backgrounds.

An Electronic Commerce Protocol Hanno Nickau B  C

Commercial use of the Internet is becoming more and more common, with an increasing variety of goods becoming available for purchase over the Net. Clearly, we want such purchases to be carried out securely: a customer wants to be sure of what (s)he's buying and the price (s)he's paying; the merchant wants to be sure of receiving payment; both sides want to end up with evidence of the transaction, in case the other side denies it took place; the act of purchase should not leak secrets, such as credit card details, to an eavesdropper.

The aim of this project is to find out more about the protocols that are used for electronic commerce, and to implement a simple e-commerce protocol. In more detail:

Understand the requirements of e-commerce protocols;

Specify an e-commerce protocol, both in terms of its functional and security requirements;

• Understand cryptographic techniques;
• Understand how these cryptographic techniques can be combined together to create a secure protocol - and understand the weaknesses that allow some protocols to be attacked;
• Design a protocol to meet the requirements identified;
• Implement the protocol.

A variant of this project would be to implement a protocol for voting on the web (which would have a different set of security properties).

Prerequisites for this project include good program design and implementation skills, including some experience of object-oriented programming, and a willingness to learn about protocols and cryptography. The courses on concurrency and distributed systems provide useful background for this project.

1 Jonathan Knudsen, Java Cryptography, O'Reilly, 1998.

Data mining Dan Olteanu B  C

Please see Dr Dan Olteanu's web page (http://www.cs.ox.ac.uk/people/dan.olteanu.html) for further details

Factorised Databases Dan Olteanu B  C

More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students.

Probabilistic databases (MayBMS, SPROUT) Dan Olteanu B  C

Please see Dr Dan Olteanu's web page (http://www.cs.ox.ac.uk/people/dan.olteanu.html) for further details

Scalable and declarative machine learning Dan Olteanu B  C

More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students.

Uncertain Database Management Systems Dan Olteanu B  C

Today, uncertainty is commonplace in data management scenarios dealing with data integration, sensor readings, information extraction from unstructured sources, or whenever information is manually entered and is therefore prone to inaccuracy or partiality. In these scenarios, uncertainty arises from the existence of alternatives for mapping schemas of different sources or for possible non-identical record duplicates, different interpretations of sensor data, multiple extraction possibilities from unstructured data, or several possible readings of manually filled forms respectively.

To accommodate uncertainty, the current data management technology should pursue a paradigm shift from deterministic to possible worlds semantics and address the basic data management problems in the new context. Projects on this topic should offer support for this paradigm shift by investigating some of the following directions

• compact representation systems for large sets of possible worlds,
• techniques for processing and constraints on succinct representations of possible worlds,
• uncertainty-aware query languages beyond relational algebra.

All aforementioned directions can lead to both theoretical and practical (implementation-oriented) projects. Anyone interested in doing a project in one of these topics is encouraged to get in touch with Dan Olteanu to explore specific ideas, such as

• Query Evaluation: Tractability and Efficient Algorithms
• Approximate and incremental view maintenance in probabilistic databases
• Synthesising query mappings for input and output probabilistic data
• View materialization for query optimization in probabilistic databases
• Modelling and processing streams of uncertain sensor data
• Algebraic optimizations for the MayBMS query language

Prerequisites: All projects within this framework require prior exposure to databases, though some projects may only require knowledge of either database theory or database systems. In the latter case, strong C/C++ skills are essential (Proficiency in any other general-purpose programming language is in any case an important start). Students with very good marks in the Database Systems Implementation course are preferred.

• Latest publications on SPROUT and MayBMS.
• Lyublena Antova and Christoph Koch and Dan Olteanu. From Complete to Incomplete Information and Back. SIGMOD 2007.
• Lyublena Antova and Thomas Jansen and Christoph Koch and Dan Olteanu. Fast and Simple Relational Processing of Uncertain Data. ICDE 2008.
• Nilesh Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. VLDB 2004.
Branching Temporal Logics, Automata and Games Luke Ong B  C

Model checking has emerged as a powerful method for the formal verification of programs. Temporal logics such as CTL (computational tree logic) and CTL* are widely used to specify programs because they are expressive and easy to understand. Given an abstract model of a program, a model checker (which typically implements the acceptance problem for a class of automata) verifies whether the model meets a given specification. A conceptually attractive method for solving the model checking problem is by reducing it to the solution of (a suitable subclass of) parity games. These are a type of two player infinite game played on a finite graph. The project concerns the connexions between the temporal logics CTL and / or CTL*, automata, and games. Some of the following directions may be explored. 1. Representing CTL / CTL* as classes of alternating tree automata. 2. Inter-translation between CTL / CTL* and classes of alternating tree automata 3. Using B¨uchi games and other subclasses of parity games to analyse the CTL / CTL* model checking problem 4. Efficient implementation of model checking algorithms 5. Application of the model checker to higher-order model checking.

### References:

Orna Kupferman, Moshe Y. Vardi, Pierre Wolper: An automata-theoretic approach to branchingtime model checking. J. ACM 47(2): 312-360 (2000).
http://dx.doi.org/10.1145/333979.333987

Rachel Bailey: A Comparative Study of Alorithmics for Solving B¨uchi Games. University of Oxford MSc Dissertation, 2010.
http://www.cs.ox.ac.uk/people/luke.ong/personal/publications/RachelBailey_MScdissertation.pdf

Luke Ong's Projects Luke Ong B  C See http://users.comlab.ox.ac.uk/luke.ong/projects/ for more information
Machine Learning Vasile Palade B  C

Dr. Vasile Palade (vasile.palade@cs.ox.ac.uk) will supervise projects in machine learning, ranging from more theoretical approaches to practical applications of machine learning. A preference will be given for projects on neural networks, support vector machines, evolutionary computation and fuzzy techniques applications to problems from the bioinfomatics area, web usage mining, financial modelling, but other options will be considered too (e.g. image processing).

For example, one possible project may be to investigate the performance of several variants of particle swarm optimization algorithms on classical benchmark optimization problems, such as the CEC 2005 benchmark suite, which can then be applied to some real-world optimization problems from the above areas. Another project may be to study the problem of class imbalance learning, and develop some classifiers or ensemble of classifiers using some of the machine learning methods listed above, in the presence of highly imbalanced datasets. Popular imbalanced data sets, such as those from the UCI machine learning repository, but not only, could be used.

These projects would be more appropriate for MSc and 4th year students, but 3rd year students could choose such a project if they have built an appropriate background through the courses they had previously attended. Ideally, students should have attended the Machine Learning course, or at least the Intelligent Systems or the Information Retrieval course. Please feel free to contact Dr. V. Palade and discuss about possible projects in these areas.

Functional Curation of Biological Models Jonathan Cooper, Joe Pitt-Francis C

Our computational models of biological systems are becoming increasingly complicated. Often it is unclear which wet-lab experiments were used for training/calibration/parameter fitting, which have been used for validation of model behaviour (evaluating how predictive the models are), and which experimental results have never been compared with model predictions. It is also difficult to reproduce published results of model simulations, compare the behaviour of different models, determine a model's suitability or limitations for a particular study, or develop models incrementally in a robust manner.

Fundamental to improving this situation is the description of simulation protocols in a machine-readable format, allowing us to replicate wet-lab experiments in-silico, with any of the possible computational models of the biological system. We are developing a system for this, extending the community standard SED-ML - the Simulation Experiment Description Markup Language - for representing protocols. The resulting ability to compare model behaviours under a range of experimental scenarios is a concept we have termed "Functional Curation". It can provide a more comprehensive view of model behaviour, far beyond that described in traditional publications, greatly assisting users in selecting a suitable model for reuse which will, in turn, result in the development of more relevant and accurate models.

Several projects are available in this area, depending on the interests and experience of the student. In most cases a knowledge of Python and/or C++ would be useful. Some examples are:

• Improving usability, for example by developing an editor for the protocol language.
• Allowing protocols to be run on SBML models, not just CellML as at present, greatly widening applicability of the technique.
• Extending the use of metadata annotations for interfacing between models and protocols, allowing for reasoning over annotations from ontologies.
• Protocol analysis for performance optimisation, especially the use of parallelisation.
• Allowing protocols to be run on GPGPUs.
Graphics pipeline animator Joe Pitt-Francis B  C

Pre-requisites: Computer graphics, Object-oriented programming

The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process

Ellipsis Interpretation Stephen Pulman B  C

Sentences like: John likes Chopin, but Mary doesn't', or John likes films, but not plays' contain ellipsis - missing' components that have to be filled in from earlier parts of the sentence: John likes Chopin, but Mary doesn't like Chopin', John likes films, but John doesn't like plays'. If you run the first pair of sentences through a parser like the Stanford parser:

(http://nlp.stanford.edu:8080/parser/index.jsp) you will see that it gets the constituent structure right, but doesn't attempt to capture the interpretation (see the dependency representations for semantically relevant information).

This project will attempt to develop and test a post-processing module for the Stanford parser which will interpret some common cases of ellipsis.

Prerequisites: some familiarity with linguistic theories of ellipsis.

Proof Theory For a Linguistically Motivated Logic Stephen Pulman B  C

A collaborator at Google has developed a logical language aimed at representing some aspects of the meaning of sentences in a linguistically motivated way, derivable from a dependency parse. You can read about it here: http://aclweb.org/anthology-new/W/W11/W11-0103.pdf

While the logic can be given a denotational semantics, it has yet to be provided with a proof theory in order that inferences can be drawn from sentences translated into it. This project aims to develop a proof theory for at least part of this logic, and if possible, test it via an implementation.

Prerequisites: only suitable for someone who has done BOTH the Computational Linguistics AND the Knowledge Representation and Reasoning courses.

Modelling and verifying systems in Timed CSP and FDR Bill Roscoe B  C

Timed CSP reinterprets the CSP language in a real-time setting and has a semantics in which the exact times of events are recorded as well as their order. Originally devised in the 1980s, it has only just been implemented as an alternative mode for FDR. The objective of this project is to take one or more examples of timed concurrent system from the literature, implement them in Timed CSP, and where possible compare the performance of these models with similar examples running on other analysis tools such as Uppaal.

### References:

(Reference Understanding Concurrent Systems, especially Chapter 15, and Model Checking Timed CSP, from AWR's web list of publications)

A simple object-oriented language Michael Spivey B  C

Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala.

Better JIT translation Michael Spivey B  C

The existing JIT for Keiko is very simple-minded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pure-JIT implementation.

Better performance for GeomLab Michael Spivey B  C

At present, GeomLab programs show a performance that competes favourably with Python, making it possible to address tasks like computing images of the Mandelbrot set using a purely functional program that calls a function once for each pixel. But there is still a gap between the performance of GeomLab programs and similar ones written in Java or C, and more ambitious image-processing tasks would be made possible by better performance, particularly in the area of arithmetic. Explore ways of improving performance, perhaps including the possibility of improving the performance of GeomLab by allowing numbers to be passed around without wrapping them in heap-allocated objects, or the possibility of compiling the code for Haskell-style pattern matching in a better way.

Eliminating array bounds checks Michael Spivey B  C

The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is non-null. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety.

GeomLab and Mindstorms Michael Spivey B  C

GeomLab has a turtle graphics feature, but the pictures are drawn only on the screen. It should be possible to make a turtle out of Lego Mindstorms, then control it with an instance of Geomlab running on a host computer, with communication over Bluetooth.

GeomLab on Android Michael Spivey B  C

Produce an implementation of GeomLab's GUI and graphics library that works on the Android platform. Either use an interpreter for GeomLab's intermediate code to execute GeomLab programs, or investigate dynamic translation of the intermediate code into code for Android's virtual machine Dalvik.

Heap-based activation records Michael Spivey B  C

At present, Keiko supports only conventional Pascal-like language implementations that store activation records on a stack. Experiment with an implementation where activation records are heap-allocated (and therefore recovered by a garbage collector), procedures are genuinely first-class citizens that can be returned as results in addition to being passed as arguments, and tail recursion is optimised seamlessly.

Keiko on Mindstorms Michael Spivey B  C

Alternative firmware for the Mindstorms robot controller provides an implementation of the JVM, allowing Java programs to run on the controller, subject to some restrictions. Using this firmware as a guide, produce an interpreter for a suitable bytecode, perhaps some variant of Keiko, allowing Oberon or another robot language of your own design to run on the controller. Aim to support the buttons and display at first, and perhaps add control of the motors and sensors later.

Type-checking for GeomLab Michael Spivey B  C

The GeomLab language is untyped, leading to errors when expressions are evaluated that would be better caught at an earlier stage. Most GeomLab programs, however, follow relatively simple typing rules. The aim in this project is to write a polymorphic type checker for GeomLab and integrate it into the GeomLab system, which is implemented in Java. A simple implementation of the type-checker would wait until an expression is about to be evaluated, and type-check the whole program at that point. As an extension of the project, you could investigate whether it is possible to type-check function definitions one at a time, even when some of the functions they call have not yet been defined.

Exact Algorithms for Complex Root Isolation Irina Voiculescu C Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and Hansen-Sengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU.
Implementation of numerical methods Jonathan Whiteley B  C

Many techniques from scientific computing - such as numerical methods for solving differential equations, root finding for non-linear functions, or iterative linear algebra techniques - are based on rigorous mathematics from the fields of analysis or algebra. As such, theoretical results exist for the rate of convergence - or even, occasionally, divergence - of these techniques. The aim of this project is to choose a suitable scientific computing technique, understand the theoretical basis of this technique, and then implement it computationally using the theory to ensure that the implementation is both correct and efficient.

Prerequisites: first year linear algebra and continuous mathematics. Numerical Solution of Differential Equations would be desirable.

Parallel linear algebra Jonathan Whiteley B  C

The most common large-scale parallel architecture for scientific computing is a distributed memory computer. When using this architecture each processor has access to data stored in its own memory, while data stored by memory associated with other processors may only be accessed by communication across a network. For reasons of computational efficiency, many parallel scientific computing algorithms are implemented by partitioning the data stored so that communication across the network between processors is minimised. This project will investigate the most appropriate partitioning for linear algebra calculations.

Prerequisite: first year linear algebra. Continuous mathematics and Numerical Solution to Differential equations would be desirable.

Sparse matrix storage Jonathan Whiteley B  C

Many matrices used in scientific computing applications are large (100s of thousands of rows) and sparse (typically around 10 non-zeros per row). Storing the whole matrix (including the zeros) is infeasible due to memory constraints. Instead, the matrix is stored in sparse format: only the non-zero entries and the location of these entries are stored. This clearly affects the implementation of operations on this matrix, such as arithmetic operations and calculation of the determinant. A further complication is that non-zeros may be added to the matrix at any time, and so the sparsity pattern changes dynamically. In this project, an efficient sparse storage format will be implemented, together with a suite of operations using this sparse storage format.

Prerequisite: first year linear algebra. Continuous mathematics would be desirable.

Algorithms for Bisimulation and Simulation on Markov Chains James Worrell B  C Bisimulation and simulation are relations that can be used for state compression in Markov chains. The aim of this project is to understand, implement and compare algorithms for computing these relations. Such algorithms could involve subprocedures for solving the max-flow problem or computing matrix eigenvalues.
Model Checking LTL on Markov Chains James Worrell B  C The aim of this project is to build a tool to compute the probability that an LTL formula satisfies a Markov chain. One of the main tasks in the project is to build a translator from LTL to unambiguous automata and to see to what extent state-of-the-art compression techniques, such as those used in the tool Wring, can be used in the setting of unambiguous automata.
UCT/Monte-Carlo Game Playing Program James Worrell B  C Fairly recently, new ideas have emerged in Artificial Intelligence that have led to the design of radically different game-playing programs. Originally developed for the game of Go, these algorithms (such as UCT or Monte-Carlo) appear to work quite well on various other games (typically games with a very large number of possible moves at each stage) for which minimax/alpha-beta performs poorly. The aim of this advanced project would be to implement and evaluate such algorithms on a particular game.
Extending the Haskell Foreign Function Interface Nicolas Wu B  C The Foreign Function Interface allows programs in Haskell to access libraries that expose an interface in C in a convenient manner. However, writing bindings to C++ is not as simple and currently users fall back on exposing C++ functions via a C interface. This project would aim to create a foreign function interface that interacts with SWIG, an interface compiler that connects programs written in C and C++ with scripting languages such as Perl, Python, Ruby, and Tcl. An alternative take on this project would be to write an FFI that interfaces with another language, such as Java or Python.
Functional Algorithms for Real Time Strategy Games Nicolas Wu B  C Writing an agent that is able to adequately play a real time strategy game is an excellent vehicle to showcase a wide range of algorithms that arise in the field of artificial intelligence. This project investigates the use of Haskell to write functional algorithms as an alternative to their imperative counterparts. As motivation, the aim would be to create an agent that is able to interface with Starcraft, a world-renowned real time strategy game that has an exposed API for such experiments.
Improving Haskell Code Generation Nicolas Wu B  C

The Glasgow Haskell Compiler (GHC) is a widely used, industrial strength compiler for Haskell that is able to produce bytecode that can be compiled by the Low Level Machine (LLVM) compiler infrastructure. This project is an investigation into Haskell's LLVM backend, with the aim of improving the code that is generated.

Currently Haskell directly outputs LLVM instructions which are then parsed by the LLVM tools before compilation. The goal of this project is to replace this mechanism by directly interfacing with the LLVM API, thus reducing the time it takes to compile code, and making the backend more robust to future changes in the LLVM backend.

Extension of the interval program analysis for unknown library functions Hongseok Yang B  C

The interval program analysis is a well-known algorithm for estimating the behaviour of programs without actually running them. The algorithm takes an imperative program, and returns, at each program point, interval constraints for variables in the program, such as 1 <= x <= 3 && 2 <= y. This algorithm assumes that all the functions called by the input program are defined in the program, so that the source code of every called function can be found in the program. However, in practice, this assumption is not necessarily met. Programs often use library functions whose source code is not available. The goal of this project is to lift this assumption. During the project, a student will develop an interval-analysis algorithm that works in the presence of calls to unknown library functions, implement the algorithm, and evaluate the algorithm experimentally.

### Prerequisites:

A prerequisite of this project is the Compiler course.

Nonblocking data structures based on message-passing concurrency Hongseok Yang B  C

Developing efficient concurrent data structures is nontrivial, and requires several interesting new programming techniques. In the context of shared-variable concurrency, researchers developed such data structures, often called nonblocking algorithms, which avoid using locks (expressed by the synchronized keyword in Java) and perform well even when many threads attempt to use the data structures at the same time. The goal of this project is to revisit these data structures designed for shared-variable concurrency, especially those found in Herlihy and Shavit's book "The art of multiprocessor programming", and to build corresponding data structures based on asynchronous message-passing passing or Actor model, which is supported by Erlang or Scala. The built data structures will be analysed theoretically (by proving their linearizability) or empirically (by measuring their performance) or both.

### Prerequisites:

Although it is not required, it is desirable that a student carrying out this project took the Concurrent Programming course before.

Optimizing functional expressions in the Scala programming language Andrew Bate, Hongseok Yang B  C

The Scala programming language provides a large number of capabilities that are normally found only in functional programming languages. However, as of the latest version of the compiler (currently 2.10.x), there is little to no optimization of functional expressions, such as maps or filters, over collections.

The objective of the project is to develop a framework of compile-time optimizations for the standard functional constructs, such as for comprehensions, maps, folds, filters, and so on, as has been implemented for the Haskell programming language. These optimizations should achieve performance comparable to that of the equivalent imperative code.

Prerequisites: Compilers; Useful: Computer Architecture