Honour School of Computer Science - Student projects
Suggested projects
| Project | Supervisors | Parts | Description | |
|---|---|---|---|---|
|
Sheaf theoretic semantics for vector space models of natural language | Samson Abramsky | B C | (Joint with Mehrnoosh Sadzradeh) Contextuality is a fundamental feature of quantum physical theories and one that distinguishes it from classical mechanics. In a recent paper by Abramsky and Brandenburger, the categorical notion of sheaves has been used to formalize contextuality. This has resulted in generalizing and extending contextuality to other theories which share some structural properties with quantum mechanics. A consequence of this type of modeling is a succinct logical axiomatization of properties such as non-local correlations and as a result of classical no go theorems such as Bell and Kochen-Soecker. Like quantum mechanics, natural language has contextual features; these have been the subject of much study in distributional models of meaning, originated in the work of Firth and later advanced by Schutze. These models are based on vector spaces over the semiring of positive reals with an inner product operation. The vectors represent meanings of words, based on the contexts in which they often appear, and the inner product measures degrees of word synonymy. Despite their success in modeling word meaning, vector spaces suffer from two major shortcomings: firstly they do not immediately scale up to sentences, and secondly, they cannot, at least not in an intuitive way, provide semantics for logical words such as `and', `or', `not'. Recent work in our group has developed a compositional distributional model of meaning in natural language, which lifts vector space meaning to phrases and sentences. This has already led to some very promising experimental results. However, this approach does not deal so well with the logical words. The goal of this project is to use sheaf theoretic models to provide both a contextual and logical semantics for natural language. We believe that sheaves provide a generalization of the logical Montague semantics of natural language which did very well in modeling logical connectives, but did not account for contextuality. The project will also aim to combine these ideas with those of the distributional approach, leading to an approach which combines the advantages of Montague-style and vector-space semantics.
Prerequisites ==========
The interested student should have taken the category theory and computational linguistics courses, or be familiar with the contents of these. |
|
Data Cleaning | Michael Benedikt | B C | Professor Michael Benedikt ( benedikt@comlab.ox.ac.uk) would be delighted to supervise projects in Data Cleaning. Please feel free to contact him. |
|
Optimization of Web Query Plans | Michael Benedikt | This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions. We will look at logic-based methods for analyzing query plans, taking into account integrity constraints that may exist on the data. |
|
|
Querying Blogs | Michael Benedikt | ||
|
Querying Wikis | Michael Benedikt | ||
|
XML transformation and update languages | Michael Benedikt | B C | Professor Michael Benedikt ( benedikt@comlab.ox.ac.uk) would be delighted to supervise projects in XML transformation and update languages. Please feel free to contact him. |
|
Medical image analysis | Mike Brady | B C | Professor Mike Brady (department of engineering science, jmb@robots.ox.ac.uk) would be delighted to supervise projects in medical image analysis. See http://www.robots.ox.ac.uk/~mvl/external/mvl-html/ |
|
Building a population of human cardiac cell models using public participation to explore and compute the parameter space | Kevin Burrage | C | Computing a population of models (that is a model with differing sets of valid parameters) is intensive, inherently parallel, and so well suited to the BOINC distributed volunteer computing environment that allows many computational jobs to be run independently and in parallel to each other. This project will focus on the construction of the BOINC project environment, that will allow a population of cardiac action potential models to be built by comparing the model dynamics against a set of experimental biomarkers. This project will suit a student with a computational background and interest in using distributed computing via the concept of public participation on an important application problem in cardiac research. |
|
Modelling and reasoning about agent-based, networked complex systems such as financial markets, biological systems and the Internet | Ani Calinescu | C | Complex systems are characterized by variety and uncertainty. A heterogeneous agent-based modelling approach to such systems would need to specify the types of agents, their properties, the structure of the network, and the information flows among them. Generic and domain-specific research questions could be investigated within this context. This topic could be adapted to the student's background and interests. From a theoretical perspective, the project could aim to produce a formal specification of properties of such as adaptiveness, emergence and robustness. From a more applied perspective, the project could implement and extend existing approaches to modelling complex systems such as the Internet, biological systems and financial markets. Prerequisites:Intelligent Systems or Machine Learning
Please feel free to contact Dr Ani Calinescu to discuss any of these project topics in more detail. |
|
Modelling and reasoning about pervasive computing systems | Ani Calinescu | C | This topic involves the specification, implementation and analysis of algorithms and protocols, in order to investigate, prove and validate properties of pervasive computing systems. Examples include electronic voting protocols and human authentication protocols. Prerequisites:Computer Security and (Concurrent Programming and/or Concurrency) Please feel free to contact Dr Ani Calinescu to discuss any of these project topics in more detail. |
|
Building an Animation or Simulation System | Stephen Cameron | B | The simulation of objects and beings within a computer has become a hot topic over the last few years, with applications to virtual reality, the design of safe stadia, synthetic actors and environments for films, and computer games, amongst others. This project would look at the design and construction of a relatively simple animation or simulation system, that would probably sacrifice ease of animation in order to focus on the modelling of interactions between the actors and their environments. Such a project would probably make use of existing collision detection code within the computing laboratory, coupled with algorithms that simulate the physical interaction of bodies. (An alternative would be to use one of the physical modelling APIs that are now becoming available.) |
|
Developing the Robot Sheepdog | Stephen Cameron | B | In 1998 a consortium consisting of the Computing Laboratory, Silsoe Research Institute, and the Universities of Leeds and Bristol, successfully demonstrated a robotic device that was capable of herding live ducks, and thus demonstrated the efficacy of a mathematical model of flocking behaviour. In order to carry on with this work (in a way that avoids having to muck out real animals!) we are repeating the work but this time with robot ducks. Projects to help continue this work are available, and suggested topics include (but are not limited to): building better sensing or (radio) communication hardware for existing ducks; designing (and possibly testing) novel algorithms to look at cooperation between sheepdogs; using machine learning to allow the shepdogs to learn; and building better simulations of the robots. |
|
Helicopter Robots | Stephen Cameron | B C | Small quad-rotor helicopters are now available that can fly indoors; for example, the AR-Drone (ardrone.parrot.com) carries a video camera and can be driven from an Android or iOS device, for which a suitable software library is now available. This project will develop software for such a helicopter. Possible specific projects might: attempt to fly the helicopter automatically; get the helicopter to look for open doorways; analyse the camera images; develop a game; etc. |
|
Meet & Greet Robot | Stephen Cameron | B C | One interesting use of a small mobile robot is to go up to people in a room, and offer to provide them with information. For example, at an Admissions Open Day such a robot could supply them with information about where to find the lecture theatre, or how to find a particular College. We have a suitable small robot (www.turtlebot.com), which contains a Kinect sensor to allow it to sense its surroundings, and a netbook for its `brain'. This project would either work with the real robot, or develop software for a simulated robot, and could either focus on the low-level navigation software; or more on the high-level functions; or even on speech recognition. The project has the potential to be split among a small group of students. |
|
Object-Oriented Splines Library | Stephen Cameron | C | Many objects are described to computer design and animation systems as a number of curved surface pieces, called splines patches. There are many different families of such splines (with different mathematical -- and thus aesthetic -- properties), including Bezier, B-spline, and NURBS. One way of describing these patches is to use (normally cubic) polynomial equations, and many implementations make this view foremost to the user. However an alternative viewpoint is to treat the patches as objects that are combined using a small number of operations to make arbitrarily complex objects. The aim of this project would be to design and construct an object-oriented software library for constructing and using splines in this form. The system would include a demonstration testbed (probably written in C++ or Java) that would allow splines curved to be traced out, combined to form one or more surface patches, and displayed in a graphics window. Experience of graphics or splines would be an advantage, and with object-oriented design techniques a necessity. |
|
Robot Path Planning | Stephen Cameron | B | A robot cannot find its way by looking at a map designed for humans; rather, it must make a complex series of geometric calculations. Many algorithms have been developed for robot path planning, and the purpose of this project is to investigate (and improve upon) some of these. The `robot' might be an arbitrary convex polygon that can slide in a world composed of polygonal obstacles; an industrial robot arm; or a human trying to find their way across a strange city. Which algorithm(s) you investigate is largely up to you: they range from pragmatic, `suck-it-and-see' methods to mathematically-complicated methods that are guaranteed to succeed (after a lot of CPU time!). There is also considerable scope to think of novel robots. For example, previous groups have considered:
|
|
Robot Sheepdog | Stephen Cameron | C | In 1998 a consortium consisting of the Computing Laboratory, Silsoe Research Institute, and the Universities of Leeds and Bristol, successfully demonstrated a robotic device that was capable of herding ducks, and thus demonstrated the efficacy of a mathematical model of flocking behaviour. In order to carry on this work (in a way that avoids having to muck out real animals!) we are repeating the work but this time with robot ducks. Currently a group of undergraduate project students are building the first prototype devices, which should be commissioned by Easter. There is then scope for one or more MSc projects, using either these devices, or testing new devices, or just experimenting with new control regimes under simulation to improve the capabilities of the system. This project might suit students with an engineering bent, or an inclination towards AI or artificial life. |
|
Robot Soccer Simulation | Stephen Cameron | B | For several years now FIRA (Federation of International Robot-Soccer Association) has promoted the development of robot soccer, with the stated goal of producing a team of robots that can play (and beat!) humans by the year 2050. Their main events are tournaments in which research teams play each other using small wheeled robots, but more recently they have introduced a simulation league (`Simurosot'), in which a simulation server program is provided as the `playing surface'. The programmers then construct their teams as client programs, and two such teams can then play each other with the server acting a referee and providing an overhead view of the play. The key to producing a (simulated) robot team is deciding how each player will move in response to (a) its position relative to the ball and goals, and (b) the positions of the other players --- this information is provided to both teams by the server on each cycle. Robot soccer is in its infancy, and most play is laughable; in order to produce interesting play it is necessary to encode {\em strategy} and {\em tactics}, and so it is expected that most of the effort in this project would focus on providing such a capability. Apart from that, the project is open-ended; a really interesting (but difficult!) extension of this project would have your team capable of learning about your opponent's tactics, and reacting to them! Requirements No physical robots need be touched to do this project. The FIRA server as supplied runs on PCs under Windows, and similar servers are available for Linux. Team strategy can be encoded in a standard language (e.g., C), or using a language that is designed for describing agents. This project will be more fun if multiple teams are constructed, so that they can play each other! Links I've constructed a web-page for robot soccer links at: http://www.comlab.ox.ac.uk/~cameron/soccer/, which includes links to the FIRA site and the basic rules of the game. |
|
Three Dimensional Sculptured Surface Graphics | Stephen Cameron | B | The aim of this project is to produce a system on which it is possible to design and render (display) a complicated surface, of the sort used in many graphics systems. These surfaces are defined by a number of points in space, with the surface itself fitting smoothly between these points. Thus the required system must make it easy for the user to define these points. This can be done by starting with a regular array of points in a plane, which the user can tweak using the computer's mouse. To be useful as a design tool the system should be totally interactive: the user should be able to use the windowing system to view the grid of control points from any viewing angle, using icons to control the interaction. When the user is satisfied with the set of control points he should be able to see a display of the smooth surface itself. Several variations of this project are possible, depending on the way that the graphical user interface is constructed, the types of surfaces considered, and the degree of realism in the display of the surface. The system could be written on some machine that you have easy access to (e.g., a PC), or using a Sun (in which case it would probably have to be written in C or Java to make full use of the Sun graphics) |
|
Quantum Computing and Quantum Information, Logic, Category Theory, Fundamental Physics | Bob Coecke | B C | Bob Coecke is willing to supervise projects in these areas. Please feel free to contact him. - Quantum computing
and quantum information |
|
Software development: quantomatic, automated graphical reasoning tools | Bob Coecke | B C | Recent graph-based formalisms for computation provide an abstract and symbolic way to represent and simulate quantum information processing. Manual manipulation of such graphs is slow and error prone. This project employs a formalism, based on monoidal categories, that supports mechanised reasoning with open-graphs. This gives a compositional account of graph rewriting that preserves the underlying categorical semantics. |
|
Kinect interface development for cybersecurity visual analytics tool | Sadie Creese | B C | (Joint with Michael Goldsmith) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces for security possible, depending on interest and inspiration.
|
|
Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs | Sadie Creese | B C | (Joint with Michael Goldsmith) Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue. |
|
Smartphone security | Sadie Creese | B C | (Joint with Michael Goldsmith) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.
Prerequisites:Concurrency, Concurrent Programming, Computer Security all possibly an advantage. |
|
Technology-layer social networks | Sadie Creese | C | (joint with Michael Goldsmith) Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc. |
|
Exploiting background knowledge in ontology matching | Bernardo Cuenca Grau | B C | LogMap is a highly optimised ontology matching system developed by members of the Knowledge Representation and Reasoning Group with EPSRC funding (see http://www.cs.ox.ac.uk/projects/LogMap/index.html for details). The NCBO Bioportal (http://bioportal.bioontology.org) is a large repository of bio-medical ontologies and cross-references between them. The goal of this project is to exploit the information provided by BioPortal as background knowledge to improve ontology matching results in LogMap. Main tasks associated to the project: - Understanding how other matching tools (such as AgrMaker and GOMMA) exploit BioPortal to improve their results. - Understanding communication with Bioportal web services to query ontologies, mappings and other resources. - Understanding the LogMap's algorithm and devising a suitable way to communicate with BioPortal without harming scalability. - Implementing BioPortal access in LogMap. - Comparison with other matching tools. |
|
Interactive ontology matching in LogMap | Bernardo Cuenca Grau | B C | LogMap is a highly optimised ontology matching system developed by members of the Knowledge Representation and Reasoning Group with EPSRC funding (see http://www.cs.ox.ac.uk/projects/LogMap/index.html for details). In applications requiring very accurate mappings, user intervention during the matching process becomes essential. LogMap implements basic features allowing domain experts to interactively contribute to the matching process. These features are, however,still rather rudimentary. The main goal of this project is to extend and improve LogMap's support for interactive ontology matching. From an algorithmic point of view, the main challenge is to limit to a minimum the number of questions that need to be asked to a human expert, while maximising the benefit of experts' feedback. From an application development point of view, the main goal is to develop a graphical front-end for LogMap that facilitates user interaction. |
|
Clean up your Extraction: Deduplication for Data Extraction | Tim Furche | B C | Deduplication is the problem of identifying and removing duplicates in data. Though well studied in data integration and cleaning, nowadays data is increasingly extracted automatically from web sources. This yields noisy data that is not easily treated with existing deduplication algorithms. However, the extraction process can often supply strong clues on whether two data items are the same, e.g., the URL of a canonical page describing that item. In this project, we will investigate deduplication in the context of data extraction. We will consider rst deduplication within a site, both in presence and absence of unique item URLs. Second, we consider deduplication in result extracted from di erent sites with the aim to align record structures from both pages where possible to nd duplicates. For this project some knowledge of databases and data cleaning is useful. The project should be driven by extensive evaluation. |
|
Docu-Research. Object Annotation and Visualization for PDF Documents | Tim Furche | B C | (Joint supervisor with G Orsi) PDF is the de-facto standard for unstructured documents on the web. Being able to e_ectively analyse, annotate and manipulate PDF documents is a challenging area of research and a pro_table commercial opportunity. In particular, current PDF processing tools (e.g., Adobe Acrobat) allow users to search the full content of the PDF but not to locate interesting objects in it (e.g., what is the contribution of this paper? Is there any budget in this project proposal?). Two particularly challenging aspects of this problem are the annotation and the visualization of such objects. This project is concerned with the semantic annotation of structured object in scienti_c PDF documents (e.g., conference-papers, journals, and project proposals) as well as their e_ective visualization. Good Knowledge of Java is required. Knowledge about (semantic-)web technology and natural- language processing is preferred. |
|
Form Corpus & Benchmark | Tim Furche | B C | (Supervisor C Schallhart) Web pages are the past since interactive web application interfaces have reshaped the online world. With all their feature richness, they enrich our personal online experience and provide some great new challenges for research. In particular, forms became much complex in assisting the user during the _lling, e.g., with completion options, or through structuring the form _lling process by dynam-ically enabling or hiding form elements. Such forms are a very interesting research topic but their complexity prevented so far the establishment of a corpus of modern forms to benchmark di_erent tools dealing with forms automatically. This MSC project will _ll this gap in building a corpus of such forms: Based on a number of production sites from one or two domains, we will build our corpus of web interfaces, connected to a (shared) database. Not only will the future evaluations in the DIADEM project rely on this corpus, but we will also publish the corpus { promoting it as a common benchmark for the research community working on forms. Knowledge in Java, HTML, CSS, Javascript, and web application development are required. |
|
Form Filling & Probing | Tim Furche | B C | (Joint with C Schallhart) Unearthing the knowledge hidden in queryable web sites requires a good understanding of the involved forms. As part of DIADEM, we are developing OPAL (Ontology based web Pattern Analysis with Logic), a tool to recognize forms belonging to a parameterizable application domain, such as the real estate or used car market. OPAL determines the meaning of individual form elements, e.g., it identi_es the _eld for the minimum or maximum price or for some location. This MSC project will build upon OPAL to not only deal with static forms but also with sequences of interrelated forms, as in case of a rough initial form, followed by a re_nement form, or in case of forms showing some options only after _lling some other parts. Over the course of this MSC project, we will develop a tool which invokes OPAL to analyze a given form, to explore all available submission mechanisms on this form, analyze the resulting pages for forms continuing the initial query, and to combine the outcome all found forms into a single interaction description. Knowledge in Java, HTML, CSS are required, prior experience in logic programming would be a strong plus. |
|
Fully-visual Data Extraction from Result Pages | Tim Furche | B C | (Joint superision with G Grasso and C Schallhart) In DIADEM (diadem-project.info), we perform automatic data extraction from result pages exploiting domain annotations to bootstrap the detection of repeated structures (data records) on the page. In our current prototype, the similarity analysis relies on the page structure (DOM), without incorporating visual clues, such as style, layout. As a complementary method, this MSC thesis aims at investigating new approaches to automatic data record extraction from template pages in a given domain, employing only visual properties and domain annotations. Starting from annotations on the page, identifying e.g., prices, product names, or locations, we want to recognize data records by visual similarity, discovered in geometric relations, such as alignment or box dimension, and in style information, such as font, color, or other CSS properties. Most existing approaches to visual data extraction di_er from our goal as they either combine visual information with the DOM structure, or because their extraction process is not guided by domain annotations at all. The outcome of the thesis is a method for visual data record recognition, implemented in a prototype, ready to work with domain annotations as parameter. This project requires an intensive preliminary study of related work in wed data extraction. The student will be working with existing tools developed as part of the DIADEM framework, such as APIs for browser interaction and text annotations. The development language is Java, knowledge of HTML and CSS is highly recommended, with Javascript and XPath as a plus. |
|
Good Wrappers are Lazy Wrappers: Avoiding Page Rendering | Tim Furche | B C | The wealth of information on the web is exceeding any other human-created source of information by orders of magnitude. Harnessing that data, however, requires increasingly automated methods that can collect relevant data from many websites and lter it,e .g., for further human consumption. Automatizing such tasks, however, increasingly requires rendering all web pages involved, as scripted interfaces only work if the page is properly rendered and wrappers increasingly use visual features for robust extraction. However, page rendering comes at a signi cant cost, e.g., in OXPath it dominates OXPath evaluation by a wide margin. Therefore, we investigate in this project how to automatically detect when a page accessed by an (OXPath) wrapper needs to be rendered. We aim to do so by a combination of static analysis of the expression and statistics gathered from previous runs. This project is particularly suited for students interested in browser technologies and their use in data extraction. Familiarity with Javascript and the DOM API would be helpful. Since 1 OXPath is implemented in Java and the method developed will be evaluated by implementation as an extension to OXPath, extensive knowledge of Java is needed. |
|
Mine4Nuggets. Searching for objects into images | Tim Furche | B C | (Joint supervisior with G Orsi) Most of the information on commercial websites (e.g., Amazon, Rightmove, and Autotrader) is rep-resented by (semi-)structured pages where the interesting data is structured into de_ned structured or visual patterns. However, some critical information is often represented with an image or a chart. As an example, on many real-estate websites, data about the energy e_ciency of the property is represented by an EPC Chart (http://www.epcchart.com/), while the actual size of the rooms is represented through a oor-plan http://en.wikipedia.org/wiki/Floor_plan). Have you ever tried to search for an energy-e_cient property or one with a large bedroom? The answer is: you can't. This project is concerned with the study of semantic image analysis techniques that can be used to locate and extract interesting objects from pictures and charts on the web. The goal is to enhance current search technology with object-search capabilities for images, a _eld that o_ers challenging research and potential commercial opportunities. The project will be conducted within the DIADEM project (http://diadem.cs.ox.ac.uk). Good knowledge of Java is required, previous knowledge about image analysis, computer vision and OCR technology is preferred. |
|
OXPath meets Javascript | Tim Furche | B C | (Joint with G Grasso) OXPath (http://diadem.cs.ox.ac.uk/OXPath/) is a wrapper language developed at Oxford in the DIADEM (diadem-project.info) project, extending standard XPath. It is particularly well suited for extraction from rich internet applications with sophisticated client-side interfaces, as well as to facilitate web automation and web crawling. OXPath has already been applied in several projects by researchers and practitioners, and it is available as a Java API (open source) interfacing with a real web browser. In this project we aim at providing a native Javascript implementation of OXPath. Nowadays Javascript has gained increasing interest and popularity thanks to the availability of prominent engines (e.g., Google V8 Javascript Engine http://code.google.com/p/ v8/) and open source frameworks (e.g, Node.js http://nodejs.org/), that allow highly scalable server-side Javascript execution. The outcome of this thesis is the implementation in Javascript of various fragments of OXPath characterized by increasing complexity. This will involve, for instance, the realization of a parser, and caching/memoization structures supporting the evaluation of OXPath. This MSC projects provides an opportunity to familiarize yourself with web frameworks and languages for data extraction such as XPath, OXPath, Javascript, and Node.js. Some knowledge of these languages would be desirable. Knowledge of Unix is highly recommended. |
|
One Path to Find them All: Learning OXPath Wrappers | Tim Furche | B C | The wealth of information on the web is exceeding any other human-created source of information by orders of magnitude. Harnessing that data, however, requires increasingly automated methods that can collect relevant data from many websites and lter it,e .g., for further human consumption. OXPath (http://diadem.cs.ox.ac.uk/OXPath/) is a wrapper language developed at Oxford in the DIADEM (diadem-project.info) project. It is particularly well suited for extraction from rich internet applications with sophisticated client-side interfaces. Manually creating such wrappers, however, still remains a daunting task. In this project, we therefore aim to develop a method for learning OXPath wrappers from sets of annotated examples, that indicate, e.g., what data to extract or which actions to perform. The challenge in such a learning algorithm lies in the need to abstract from multiple examples to a single, compact OXPath expression that should be close to optimal under a set of additional criteria. Though there is a signi cant body of literature on learning XPath expressions, OXPath is a richer language and introduces new challenges. We also speci cally aim at extensive experimental validation of the developed techniques. This project is particularly suited for students interested in modern web technology and familiar with compiler design or query optimization. The learner will be implemented in Java, where we provide a set of customized APIs for action execution and DOM interaction. |
|
Online Shop Assistant | Tim Furche | B C | (Supervisor G Grasso) We want to develop an innovative on-line service to help users in their shopping: After specifying a shopping list of products, the system should identify the best shops to buy them from, ranked by di_erent criteria. For instance, the system might search for the least expensive supermarket or nearest grocery shop selling a set of products. The student will build upon the data extraction facilities available in DIADEM (diadem-project.info), in particular OXPath (http://diadem.cs.ox.ac.uk/OXPath/), a wrapper language facilitating web data extraction and automatizing web interactions. For this reason, this project requires a preliminary study of languages like XPath and OXPath and their semantics. The project outcome is a web portal with a modern layout, supporting at least user registration, shopping list speci_cation, and two optimization scenarios, such as best price and closest location (more features if time permits). Knowledge of Java is required, previous experiences in web development and databases are recommended. |
|
Kinect interface development for cybersecurity visual analytics tool | Michael Goldsmith | B C | (Joint with Sadie Creese) Kinect interface development for cybersecurity visual analytics tool: using the open development environment for Kinect, to develop an alternative HCI for the Oxford CyberVis tool http://www.cs.ox.ac.uk/projects/cybervis/ (focusing on navigation of the CyberVis environment). As CyberVIs is primarily developed in JAVA a good understanding that language is essential in order to interface Kinect to CyberVis. Suitable for 3rd or 4th year undergraduates, or MSc. Other projects on novel human-computer interfaces for security possible, depending on interest and inspiration. |
|
Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs | Michael Goldsmith | B C | (Joint with Sadie Creese) Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue. |
|
Smartphone security | Michael Goldsmith | B C | (Joint with Sadie Creese) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration. Prerequisites:Concurrency, Concurrent Programming, Computer Security all possibly an advantage. |
|
Technology-layer social networks | Michael Goldsmith | C | (Joint with Sadie Creese) Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc. |
|
Classifying and Benchmarking Web Automation Tools | Georg Gottlob | B C | Background: The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on. Principal goal of the MSc or Honour School project: This project consists in a case-study on web automation tasks in order to validate the ease-of-use of existing tools of
web automation such as OXPath and ChickenFoot. Skills Needed: This project requires good analytic and theoretical skills. Also, knowledge of XPath and Java is preferred. Supervision: This project will be co-supervised by Dr. Tim Furche |
|
Domain-Based Structural Web Page Analysis and Data Extraction | Georg Gottlob | B C |
Background: There are several projects available. The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on. Ultimately, we want to construct a system that is able to automatically navigate Web pages within a given application domain and extract relevant data from that pages. For example a system for the real-estate domain should accept as input an URL of an estate agent and output all properties (houses or flats) that are currently advertised by this agent to be for sale. The output should be a highly structured XML file obeying a certain pre-defined schema. Principal goal of the MSc or Honour School project: Contribute to the DIADEM project by providing useful studies and building blocks. Examples of such studies or building blocks are:
Note: The DIADEM project runs from 2010 to 1015. Every year only a few MSc or Honour School projects will be available. Some will be co-supervised by DIADEM research staff. Skills Needed: Each of these projects requires good analytic and theoretical skills, software engineering skills, knowledge of web programming and web data processing, good knowledge of Java and Eclipse. |
|
Web Form Probing | Georg Gottlob | B C | Background: The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on. Principal goal of the MSc or Honour School project: A relevant task in Deep Web data extraction system is the so-called Form-Probing. Roughly, given a web form where each
field is annotated with some concept, probing a form involves tasks such as: Skills Needed: This project requires good theoretical, analytic and software engineering skills. Knowledge of Java and web-technologies is also required. Supervision: This project will be co-supervised by Dr. Tim Furche. |
|
Bioinformatics Projects | Jotun Hein | B C | Bioinformatics Projects can be found here. If you choose any of these projects you must find a joint supervisor from the Computing Laboratory. Pete Jeavons can advise on a suitable choice. |
|
Compilation of a CSP-like language | Geraint Jones | B C | This is a compiler project, also requiring familiarity with concurrency. The parallel programming language occam is essentially an implementable sublanguage of CSP. The aim of this project is to produce a small portable implementation of a subset of occam; the proposed technique is to implement a virtual machine based on the inmos transputer, and a compiler which targets that language. One of the aims of the project is to implement an extension to occam which permits recursion; more ambitious projects might implement a distributed implementation with several communicating copies of the virtual machine. Other possibilities are to produce separate virtual machines, optimised for displaying a simulation, or for efficiency of implementation, or translating the virtual machine code into native code for a real machine. |
|
Logic circuit workbench | Geraint Jones | B C | This is an interactive programming project with some logic simulation behind it. The idea is to produce a simulator for a traditional logic breadboard. The user should be able to construct a logic circuit by drawing components from a library of standard gates and latches and so on, and connecting them together in allowable ways. It should then be possible to simulate the behaviour of the circuit, to control its inputs in various ways, and to display its outputs and the values of internal signals in various ways. The simulator should be able to enforce design rules (such as those about not connecting standard outputs together, or limiting fan-out) but should also cope with partially completed circuits; it might be able to implement circuits described in terms of replicated sub-circuits; it should also be able to some sort of standard netlist. It might be that this would make a project for two undergraduates: one implementing the interface, the other implementing a simulator that runs the logic. |
|
Mosaic player | Geraint Jones | B | This is an interactive graphics programming project. This is an idea based on a child's toy, consisting of a collection of acrylic pieces. There are several colours, and two shapes: isosceles right-angled triangles, and rhombuses with 45 degree and 135 degree angles and sides to match the equal sides of the triangles. The player arranges (all) the pieces in one of the many ways that produce a square of the right size, with a pattern that has several (reflectional or rotational) symmetries. The project is to design and build a small program that can be used to simulate the play: perhaps the user specifies symmetries, and then places a single piece which is replicated in several places as required by the symmetry. There is a small subtlety in that the target square has a side which is an irrational number of the unit length (edge of the rhombus), so there is no integer grid on whcih the pieces lie. (It is very easy to make a mistake with the real toy which almost produces a square of the wrong size.) |
|
Toys for Animating Mathematics | Geraint Jones | B C | The aim is to take some mathematics that would be within the grasp of a mathematically-inclined sixth-former and turn it into some attention-grabbing web pages. Ideally this reveals a connection with computing science. I imagine that this project necessarily involves some sort of animation, and I have visions of Open University television maths lectures. The programming need not be the most important part of this project, though, because some of the work is in choosing a topic and designing problems and puzzles and the like around it. There's a lot of this sort of thing about, though, so it would be necessary to be a bit original. Think of GeomLab (and then perhaps think of something a little less ambitious). It might involve logic and proof, it might be about sequences and series, it might be about graphs, it might be about the mathematics of cryptography... you might have something else in mind. |
|
A steganography app for facebook | Andrew Ker | B C | Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. Facebook is the ideal platform for the transmission of payload-carrying images, since images are so commonly uploaded and shared. And facebook does not permit encrypted communications between users, who might seek an alternative way to preserve their privacy. This project is to develop a steganography app for facebook, which allows messages to be sent and received inside pictures. Prerequisites:The only prerequisite is to understand how apps are developed on facebook. Be warned: the supervisor knows nothing of the facebook platform, and the student is largely on their own when it comes to the programming. The supervisor will simply give advice about what steganography functionality should be implemented. |
|
Intelligent combination of steganalysis features | Andrew Ker | C | Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. Steganalysis is the art of detecting that such hiding took place. The most effective ways to detect steganography are machine learning algorithms applied to "features" extracted from the media involved, trained on massive sets of known cover and stego objects. But, unlike in traditional machine learning, we can analyse (or simulate) the effect of embedding on features to a high level of detail. It turns out that some features are "equivalent" in that the same thing happens to them under embedding, in which case it is more efficient to combine them. This project is to formulate a proper definition of which features should be combined, and implement software to do so. The combined features can be benchmarked against the standard features using machine learning tools. Prerequisites:Discrete maths, linear algebra, machine learning |
|
mp3 steganography | Andrew Ker | B C | Steganography means hiding a hidden payload within an apparently-innocent cover, usually an item of digital media. There is lots of literature on hiding, and detection of hiding, in images, but rather little in audio. But mp3 audio is widely transmitted on file-sharing networks and thus provides an excellent cover. One slightly old-fashioned example of mp3 steganography can be found here http://www.petitcolas.net/fabien/steganography/mp3stego/. This project is to explore and develop mp3 steganography, adapting modern image hiding techniques to the audio format and implementing hiding and extraction programs. Prerequisites:No course prerequisites, but this project is only suitable for students who are already familiar with the details of mp3 format and encoding, for example those who have worked on open-source audio codecs. It is NOT possible to learn the format within the time available. |
|
Projects being offered by D Kroening | Daniel Kroening | B C | Daniel Kroening is happy to supervise projects related to
For a list of sample projects, see here.
|
|
Projects being offered by Marta Kwiatkowska | Marta Kwiatkowska | B C | I am happy to supervise the following projects: Model checking for DNA Software verification for sensor networks Model checking for automated negotiation protocols
Please contact Marta.Kwiatkowska@comlab.ox.ac.uk if you are interested |
|
Projects offered by Marta Kwiatkowska | Marta Kwiatkowska | B C | I have a list of projects which were offered to MSc students: http://www.cs.ox.ac.uk/people/marta.kwiatkowska/teaching.html I am happy to offer them to u/grad students, generally in the area of: Probabilistic model checking Model checking for DNA computation Model checking for wireless sensor networks |
|
Analysis of Security Protocols | Gavin Lowe | B C | A security protocol consists of an exchange of messages between two or more agents, with goals such as establishing a cryptographic key, or authenticating the identities of the agents. These protocols are designed to operate in particularly hostile environments, where an adversary or intruder may be trying to attack the protocol, for example to learn the value of a key. Designing secure protocols has proven to be remarkably difficult; in some cases, attacks have been discovered several years after the protocol was first suggested. We have developed a systematic technique for analysing these protocols. Briefly the technique is as follows:
A recent extension of these ideas has been to the analysis of layered protocols, where a special-purpose, application-specific
protocol is layered Prerequisites:The Concurrency and Computer Security courses would be prerequisites for this project. References
|
|
Compiling CSO to CSP | Gavin Lowe | B C | The aim of this project would be to build a compiler (or compiler plug-in) to translate Scala code written using the CSO library into corresponding CSP models. It would also be useful to develop techniques for automating analysis of various desirable properties, such as freedom from race conditions, and to develop techniques for abstracting some aspects of the CSP models so as to make model checking feasible. Prerequisites:The Concurrency, Concurrent Programming and Compilers courses would be prerequisites. |
|
Implementing a CSP Process Animator | Gavin Lowe | B C | (Joint supervision with Tom Gibson-Robinson) Probe [1] is a tool that allows CSP [2] processes to be visualised and manually explored. The goal of this project would be to build a new version of Probe that is significantly more user-friendly whilst supporting additional debug functionality and alternative display formats. This would be built on top of the libcspm Haskell library [3] that provides a parser, typechecker and evaluator for machine CSP. Prerequisites for this would be strong functional programming skills (e.g. via the Functional Programming course) and a good understand of CSP (e.g. via the Concurrency course). Prerequisites:Prerequisites for this would be strong functional programming skills References:[1] Probe, Formal Systems, http://www.fsel.com/. |
|
Secure Channels in CSO | Gavin Lowe | B | CSO [1] is a concurrency and communication library for the Scala programming language. The aim of this project would be to extend CSO with channels that allow secure communication between processes. The channels should implement some of the specifications in [2]. The Concurrent Programming and Computer Security courses would be prerequisites. Prerequisites:The Concurrent Programming and Computer Security courses would be prerequisites. References:
|
|
Inconsistency-Tolerant Query Answering in the Semantic Web | Thomas Lukasiewicz | B C | Background:The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering. Next-generation technologies for semantic search and query answering on the Web can be realized via knowledge bases extracted from the Web relative to an underlying ontology. The realization of such knowledge bases requires a principled and sensible way of managing conflicting information. Though inconsistency has been addressed for many years now in artificial intelligence and database theory, the search for flexible and computationally efficient inconsistency-tolerant semantics is a very active research topic. Potential Projects:● Characterization of inconsistency and repairing for different fragments of the Datalog+/-family of ontology languages. ● Study of desirable properties of repairs and development of adequate preference relations among repairs. ● Development of alternative inconsistency-tolerant semantics for query answering in different fragments of Datalog+/-: analysis of the trade-off between quality of answers and computation cost. ● Argumentation is one of the most well-known and capable systems for inconsistency management. The study of argumentation aspects and frameworks for different fragments of the Datalog+/- family of ontology languages as part of the above points. ● Implementation of repairing and inconsistency-tolerant query answering techniques on fragments and extensions of Datalog+/-. Prerequisites:Participation in all aspects of the project require good analytical skills and background in knowledge representation and reasoning (in particular, first-order logic and logic programming). For the theoretical aspects, background in computational complexity and database theory would be ideal. The practical (implementation-oriented) parts require software engineering skills, knowledge of web programming to develop web interfaces, as well as good knowledge of Java and Eclipse. Supervision:The projects based on implementation will be supervised by Maria Vanina Martinez, while the ones based on theoretical aspects will be co-supervised by Thomas Lukasiewicz and Maria Vanina Martinez. |
|
Ranking and Top-k Query Answering under Uncertainty in the Semantic Web | Thomas Lukasiewicz | B C | Background:The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering. Next-generation technologies for semantic search and query answering on the Web can be realized via knowledge bases extracted from the Web relative to an underlying ontology. Such knowledge bases require probabilistic data models, along with scalable query answering algorithms, which are both still major unsolved problems. They may be based on integrating (i) ontology languages, (ii) database technologies, and (iii) formalisms for managing probabilistic uncertainty. Potential Projects:● Characterization of ranking for non-probabilistic ontologies via explicitly defined user preferences. Key questions involve how to represent the user preferences, how to automatically obtain them (e.g., from the Web), and how to use them for ranking answers to ontological queries over knowledge bases. ● Extension of the above ranking techniques to probabilistic ontologies, i.e., combining the relevance of an answer with the probability that it is in fact true. ● Characterization of top-k query answering based on user preferences, and extension to probabilistic ontologies. ● Implementation of ranking and top-k querying on fragments of probabilistic Datalog+/-. ● Adaptation of sampling techniques for efficient query answering in probabilistic ontologies. ● Experimental evaluation of implementations over both synthetic and real-world data. Prerequisites:Participation in all aspects of the project require good analytical skills, and background in knowledge representation and reasoning (in particular, first-order logic and logic programming). For the theoretical aspects, background in computational complexity and database theory would be ideal. The practical (implementation-oriented) parts require software engineering skills, knowledge of Web programming to develop Web interfaces, as well as good knowledge of Java and Eclipse. Supervision:The projects based on implementation will be supervised by Gerardo Simari, while the ones based on theoretical aspects will be co-supervised by Thomas Lukasiewicz and Gerardo Simari. |
|
Semantic Web Search Based on Ontological Conjunctive Queries | Thomas Lukasiewicz | B C | Many experts predict that the next huge step forward in Web information technology will be achieved by adding semantics to Web data, and will possibly consist of (some form of) the Semantic Web. The project is based on a novel approach to Semantic Web search, called Serene, which allows for a semantic processing of Web search queries, and for evaluating complex Web search queries that involve reasoning over the Web. The goal of the project is to develop a Semantic Web search system on top of Serene for a collection of Web pages relative to an underlying ontology. |
|
Design Driven Acquisition of Bio-experiments | Eamonn Maguire | B C | The Sansone Group at the Oxford e-Research Centre offers several student projects throughout the year, focused on developing and applying new methods and tools for improving management, curation and sharing of data in life science. Supervisors: Susanna-Assunta Sansone; Philippe Rocca-Serra and Eamonn Maguire.
Information about this project can be found at: http://www.oerc.ox.ac.uk/research/sansone/projects/student-projects#ontology-acquisition |
|
Graph Based Bio-experiments Store | Eamonn Maguire | B C | The Sansone Group at the Oxford e-Research Centre offers several student projects throughout the year, focused on developing and applying new methods and tools for improving management, curation and sharing of data in life science. Supervisors: Susanna-Assunta Sansone; Philippe Rocca-Serra and Eamonn Maguire.
Information about this project can be found at: http://www.oerc.ox.ac.uk/research/sansone/projects/student-projects#graph |
|
Ontology Based Validation of Bio-experimental Records | Eamonn Maguire | B C | (Joint Supervisor with Ian Horrocks)
The Sansone Group at the Oxford e-Research Centre offers several student projects throughout the year, focused on developing and applying new methods and tools for improving management, curation and sharing of data in life science. Supervisors: Susanna-Assunta Sansone; Philippe Rocca-Serra and Eamonn Maguire.
Information about this project can be found at: http://www.oerc.ox.ac.uk/research/sansone/projects/student-projects#validation |
|
An Electronic Commerce Protocol | Hanno Nickau | B C | Commercial use of the Internet is becoming more and more common, with an increasing variety of goods becoming available for purchase over the Net. Clearly, we want such purchases to be carried out securely: a customer wants to be sure of what (s)he's buying and the price (s)he's paying; the merchant wants to be sure of receiving payment; both sides want to end up with evidence of the transaction, in case the other side denies it took place; the act of purchase should not leak secrets, such as credit card details, to an eavesdropper. The aim of this project is to find out more about the protocols that are used for electronic commerce, and to implement a simple e-commerce protocol. In more detail: Understand the requirements of e-commerce protocols; Specify an e-commerce protocol, both in terms of its functional and security requirements;
A variant of this project would be to implement a protocol for voting on the web (which would have a different set of security properties). Prerequisites for this project include good program design and implementation skills, including some experience of object-oriented programming, and a willingness to learn about protocols and cryptography. The courses on concurrency and distributed systems provide useful background for this project. 1 Jonathan Knudsen, Java Cryptography, O'Reilly, 1998. |
|
Data mining | Dan Olteanu | B C | Please see Dr Dan Olteanu's web page (http://www.cs.ox.ac.uk/people/dan.olteanu.html) for further details |
|
Factorised Databases | Dan Olteanu | B C | More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students.
|
|
Probabilistic databases (MayBMS, SPROUT) | Dan Olteanu | B C | Please see Dr Dan Olteanu's web page (http://www.cs.ox.ac.uk/people/dan.olteanu.html) for further details |
|
Scalable and declarative machine learning | Dan Olteanu | B C | More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students. |
|
Uncertain Database Management Systems | Dan Olteanu | B C | Today, uncertainty is commonplace in data management scenarios dealing with data integration, sensor readings, information extraction from unstructured sources, or whenever information is manually entered and is therefore prone to inaccuracy or partiality. In these scenarios, uncertainty arises from the existence of alternatives for mapping schemas of different sources or for possible non-identical record duplicates, different interpretations of sensor data, multiple extraction possibilities from unstructured data, or several possible readings of manually filled forms respectively. To accommodate uncertainty, the current data management technology should pursue a paradigm shift from deterministic to possible worlds semantics and address the basic data management problems in the new context. Projects on this topic should offer support for this paradigm shift by investigating some of the following directions
All aforementioned directions can lead to both theoretical and practical (implementation-oriented) projects. Anyone interested in doing a project in one of these topics is encouraged to get in touch with Dan Olteanu to explore specific ideas, such as
Prerequisites: All projects within this framework require prior exposure to databases, though some projects may only require knowledge of either database theory or database systems. In the latter case, strong C/C++ skills are essential (Proficiency in any other general-purpose programming language is in any case an important start). Students with very good marks in the Database Systems Implementation course are preferred. Suggested reading
|
|
Branching Temporal Logics, Automata and Games | Luke Ong | B C | Model checking has emerged as a powerful method for the formal verification of programs. Temporal logics such as CTL (computational tree logic) and CTL* are widely used to specify programs because they are expressive and easy to understand. Given an abstract model of a program, a model checker (which typically implements the acceptance problem for a class of automata) verifies whether the model meets a given specification. A conceptually attractive method for solving the model checking problem is by reducing it to the solution of (a suitable subclass of) parity games. These are a type of two player infinite game played on a finite graph. The project concerns the connexions between the temporal logics CTL and / or CTL*, automata, and games. Some of the following directions may be explored. 1. Representing CTL / CTL* as classes of alternating tree automata. 2. Inter-translation between CTL / CTL* and classes of alternating tree automata 3. Using B¨uchi games and other subclasses of parity games to analyse the CTL / CTL* model checking problem 4. Efficient implementation of model checking algorithms 5. Application of the model checker to higher-order model checking. References:Orna Kupferman, Moshe Y. Vardi, Pierre Wolper: An automata-theoretic approach to branchingtime model checking. J. ACM 47(2):
312-360 (2000). Rachel Bailey: A Comparative Study of Alorithmics for Solving B¨uchi Games. University of Oxford MSc Dissertation,
2010. |
|
Luke Ong's Projects | Luke Ong | B C | See http://users.comlab.ox.ac.uk/luke.ong/projects/ for more information |
|
Machine Learning | Vasile Palade | B C | Dr. Vasile Palade (vasile.palade@cs.ox.ac.uk) is willing to supervise MSc projects in machine learning, ranging from more theoretical approaches to practical applications of machine learning. A preference will be given for projects on neural networks, support vector machines, evolutionary computation and fuzzy techniques applications to problems from the Bioinfomatics area, web usage mining, financial modeling, but other options will be considered too (e.g., character and image recognition). Please feel free to contact him and discuss about possible projects in these areas. |
|
Graphics pipeline animator | Joe Pitt-Francis | B C | Pre-requisites: Computer graphics, Object-oriented programming The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process |
|
Ellipsis Interpretation - Not available 2011-12 | Stephen Pulman | B C | Sentences like: `John likes Chopin, but Mary doesn't', or `John likes films, but not plays' contain ellipsis - `missing' components that have to be filled in from earlier parts of the sentence: John likes Chopin, but Mary doesn't like Chopin', `John likes films, but John doesn't like plays'. If you run the first pair of sentences through a parser like the Stanford parser: (http://nlp.stanford.edu:8080/parser/index.jsp) you will see that it gets the constituent structure right, but doesn't attempt to capture the interpretation (see the dependency representations for semantically relevant information). This project will attempt to develop and test a post-processing module for the Stanford parser which will interpret some common cases of ellipsis. Prerequisites: some familiarity with linguistic theories of ellipsis. |
|
Proof Theory For a Linguistically Motivated Logic - Not available 2011/12 | Stephen Pulman | B C | A collaborator at Google has developed a logical language aimed at representing some aspects of the meaning of sentences in a linguistically motivated way, derivable from a dependency parse. You can read about it here: http://aclweb.org/anthology-new/W/W11/W11-0103.pdf While the logic can be given a denotational semantics, it has yet to be provided with a proof theory in order that inferences can be drawn from sentences translated into it. This project aims to develop a proof theory for at least part of this logic, and if possible, test it via an implementation. Prerequisites: only suitable for someone who has done BOTH the Computational Linguistics AND the Knowledge Representation and Reasoning courses. |
|
Modelling and verifying systems in Timed CSP and FDR | Bill Roscoe | B C | Timed CSP reinterprets the CSP language in a real-time setting and has a semantics in which the exact times of events are recorded as well as their order. Originally devised in the 1980s, it has only just been implemented as an alternative mode for FDR. The objective of this project is to take one or more examples of timed concurrent system from the literature, implement them in Timed CSP, and where possible compare the performance of these models with similar examples running on other analysis tools such as Uppaal. References:(Reference Understanding Concurrent Systems, especially Chapter 15, and Model Checking Timed CSP, from AWR's web list of publications) |
|
A parser for pre-group grammars | Mehrnoosh Sadrzadeh | C | Jointly supervised by Stephen Pulman and Mehrnoosh Sadzradeh. A background of Computational Linguistics is required
Pre-group grammars are a recent grammatical formalism somewhat similar to categorial grammar. The formalism is of interest to computational linguistics because it can be assimilated to category theory, computer scientists' favourite theoretical formalism. This project is to develop a parser for pre-group grammars, and, time permitting, investigate probabilistic disambiguation for the formalism. A paper describing the formalism is at: ftp://ftp.math.mcgill.ca/pub/lambek/natlangproc.pdf |
|
Implementing Algorithms for Robot Navigation | Mehrnoosh Sadrzadeh | B C | Automatic robot navigation is one of the major problems addressed in AI, where robots have to roam and communicate to acquire information and learn about their whereabouts, e.g. in the grid model of a building or a city, based on a map thereof. This project is to write programs to implement decision procedures of an algebraic axiomatics or rule-based logic of the models that automate scenarios of robot navigation. Knowledge of the theory behind the algebra and sequent calculus is not required. Previous implementations of these settings can be downloaded from 1- Aximo: Automated Axiomatic Reasoning for Information Update http://web.comlab.ox.ac.uk/publications/publication2909-abstract.html 2- Implementation of a cut-free sequent calculus for logics with adjoint modalities http://web.comlab.ox.ac.uk/publications/publication3037-abstract.html |
|
A simple object-oriented language | Michael Spivey | B C | Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Better JIT translation | Michael Spivey | B C | The existing JIT for Keiko is very simple-minded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pure-JIT implementation. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Better performance for GeomLab | Michael Spivey | B C | At present, GeomLab programs show a performance that competes favourably with Python, making it possible to address tasks like computing images of the Mandelbrot set using a purely functional program that calls a function once for each pixel. But there is still a gap between the performance of GeomLab programs and similar ones written in Java or C, and more ambitious image-processing tasks would be made possible by better performance, particularly in the area of arithmetic. Explore ways of improving performance, perhaps including the possibility of improving the performance of GeomLab by allowing numbers to be passed around without wrapping them in heap-allocated objects, or the possibility of compiling the code for Haskell-style pattern matching in a better way. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Eliminating array bounds checks | Michael Spivey | B C | The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is non-null. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
GeomLab and Mindstorms | Michael Spivey | B C | GeomLab has a turtle graphics feature, but the pictures are drawn only on the screen. It should be possible to make a turtle out of Lego Mindstorms, then control it with an instance of Geomlab running on a host computer, with communication over Bluetooth. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
GeomLab on Android | Michael Spivey | B C | Produce an implementation of GeomLab's GUI and graphics library that works on the Android platform. Either use an interpreter for GeomLab's intermediate code to execute GeomLab programs, or investigate dynamic translation of the intermediate code into code for Android's virtual machine Dalvik. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Heap-based activation records | Michael Spivey | B C | At present, Keiko supports only conventional Pascal-like language implementations that store activation records on a stack. Experiment with an implementation where activation records are heap-allocated (and therefore recovered by a garbage collector), procedures are genuinely first-class citizens that can be returned as results in addition to being passed as arguments, and tail recursion is optimised seamlessly. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Keiko on Mindstorms | Michael Spivey | B C | Alternative firmware for the Mindstorms robot controller provides an implementation of the JVM, allowing Java programs to run on the controller, subject to some restrictions. Using this firmware as a guide, produce an interpreter for a suitable bytecode, perhaps some variant of Keiko, allowing Oberon or another robot language of your own design to run on the controller. Aim to support the buttons and display at first, and perhaps add control of the motors and sensors later. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Type-checking for GeomLab | Michael Spivey | B C | The GeomLab language is untyped, leading to errors when expressions are evaluated that would be better caught at an earlier stage. Most GeomLab programs, however, follow relatively simple typing rules. The aim in this project is to write a polymorphic type checker for GeomLab and integrate it into the GeomLab system, which is implemented in Java. A simple implementation of the type-checker would wait until an expression is about to be evaluated, and type-check the whole program at that point. As an extension of the project, you could investigate whether it is possible to type-check function definitions one at a time, even when some of the functions they call have not yet been defined. Please see http://spivey.oriel.ox.ac.uk/corner/Undergraduate_and_M.Sc._projects for further details. |
|
Exact Algorithms for Complex Root Isolation | Irina Voiculescu | C | Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and Hansen-Sengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU. |
|
Implementation of numerical methods | Jonathan Whiteley | B C | Many techniques from scientific computing - such as numerical methods for solving differential equations, root finding for non-linear functions, or iterative linear algebra techniques - are based on rigorous mathematics from the fields of analysis or algebra. As such, theoretical results exist for the rate of convergence - or even, occasionally, divergence - of these techniques. The aim of this project is to choose a suitable scientific computing technique, understand the theoretical basis of this technique, and then implement it computationally using the theory to ensure that the implementation is both correct and efficient. |
|
Parallel linear algebra | Jonathan Whiteley | B C | The most common large-scale parallel architecture for scientific computing is a distributed memory computer. When using this architecture each processor has access to data stored in its own memory, while data stored by memory associated with other processors may only be accessed by communication across a network. For reasons of computational efficiency, many parallel scientific computing algorithms are implemented by partitioning the data stored so that communication across the network between processors is minimised. This project will investigate the most appropriate partitioning for linear algebra calculations. |
|
Sparse matrix storage | Jonathan Whiteley | B C | Many matrices used in scientific computing applications are large (100s of thousands of rows) and sparse (typically around 10 non-zeros per row). Storing the whole matrix (including the zeros) is infeasible due to memory constraints. Instead, the matrix is stored in sparse format: only the non-zero entries and the location of these entries are stored. This clearly affects the implementation of operations on this matrix, such as arithmetic operations and calculation of the determinant. A further complication is that non-zeros may be added to the matrix at any time, and so the sparsity pattern changes dynamically. In this project, an efficient sparse storage format will be implemented, together with a suite of operations using this sparse storage format. |
|
Algorithms for Bisimulation and Simulation on Markov Chains | James Worrell | B C | Bisimulation and simulation are relations that can be used for state compression in Markov chains. The aim of this project is to understand, implement and compare algorithms for computing these relations. Such algorithms could involve subprocedures for solving the max-flow problem or computing matrix eigenvalues. |
|
Model Checking LTL on Markov Chains | James Worrell | B C | The aim of this project is to build a tool to compute the probability that an LTL formula satisfies a Markov chain. One of the main tasks in the project is to build a translator from LTL to unambiguous automata and to see to what extent state-of-the-art compression techniques, such as those used in the tool Wring, can be used in the setting of unambiguous automata. |
|
UCT/Monte-Carlo Game Playing Program | James Worrell | B C | Fairly recently, new ideas have emerged in Artificial Intelligence that have led to the design of radically different game-playing programs. Originally developed for the game of Go, these algorithms (such as UCT or Monte-Carlo) appear to work quite well on various other games (typically games with a very large number of possible moves at each stage) for which minimax/alpha-beta performs poorly. The aim of this advanced project would be to implement and evaluate such algorithms on a particular game. |
|
Extending the Haskell Foreign Function Interface | Nicolas Wu | B C | The Foreign Function Interface allows programs in Haskell to access libraries that expose an interface in C in a convenient manner. However, writing bindings to C++ is not as simple and currently users fall back on exposing C++ functions via a C interface. This project would aim to create a foreign function interface that interacts with SWIG, an interface compiler that connects programs written in C and C++ with scripting languages such as Perl, Python, Ruby, and Tcl. An alternative take on this project would be to write an FFI that interfaces with another language, such as Java or Python. |
|
Functional Algorithms for Real Time Strategy Games | Nicolas Wu | B C | Writing an agent that is able to adequately play a real time strategy game is an excellent vehicle to showcase a wide range of algorithms that arise in the field of artificial intelligence. This project investigates the use of Haskell to write functional algorithms as an alternative to their imperative counterparts. As motivation, the aim would be to create an agent that is able to interface with Starcraft, a world-renowned real time strategy game that has an exposed API for such experiments. |
|
Improving Haskell Code Generation | Nicolas Wu | B C | The Glasgow Haskell Compiler (GHC) is a widely used, industrial strength compiler for Haskell that is able to produce bytecode that can be compiled by the Low Level Machine (LLVM) compiler infrastructure. This project is an investigation into Haskell's LLVM backend, with the aim of improving the code that is generated. One area where efficiency is lacking is to do with an optimisation called tables-next-to-code (TNTC), where Haskell places top level data just before function calls in the generated bytecode. At present, this optimisation is performed by post-processing LLVM bytecode. This project would involve understanding TNTC and LLVM, and modifying LLVM so that it is able to associate a global variable with a function. |
|
Extension of the interval program analysis for unknown library functions | Hongseok Yang | B C | The interval program analysis is a well-known algorithm for estimating the behaviour of programs without actually running them. The algorithm takes an imperative program, and returns, at each program point, interval constraints for variables in the program, such as 1 <= x <= 3 && 2 <= y. This algorithm assumes that all the functions called by the input program are defined in the program, so that the source code of every called function can be found in the program. However, in practice, this assumption is not necessarily met. Programs often use library functions whose source code is not available. The goal of this project is to lift this assumption. During the project, a student will develop an interval-analysis algorithm that works in the presence of calls to unknown library functions, implement the algorithm, and evaluate the algorithm experimentally. Prerequisites:A prerequisite of this project is the Compiler course. |
|
Nonblocking data structures based on message-passing concurrency | Hongseok Yang | B C | Developing efficient concurrent data structures is nontrivial, and requires several interesting new programming techniques. In the context of shared-variable concurrency, researchers developed such data structures, often called nonblocking algorithms, which avoid using locks (expressed by the synchronized keyword in Java) and perform well even when many threads attempt to use the data structures at the same time. The goal of this project is to revisit these data structures designed for shared-variable concurrency, especially those found in Herlihy and Shavit's book "The art of multiprocessor programming", and to build corresponding data structures based on asynchronous message-passing passing or Actor model, which is supported by Erlang or Scala. The built data structures will be analysed theoretically (by proving their linearizability) or empirically (by measuring their performance) or both. Prerequisites:Although it is not required, it is desirable that a student carrying out this project took the Concurrent Programming course before. |