University of Oxford Logo University of OxfordDepartment of Computer Science - Home
Linked in
Linked in
Follow us on twitter
Twitter
On Facebook
Facebook
Instagram
Instagram

OXPath meets Javascript

Supervisor

Suitable for

Abstract

(Joint with G Grasso)

OXPath (http://diadem.cs.ox.ac.uk/OXPath/) is a wrapper language developed at Oxford in the DIADEM (diadem-project.info) project, extending standard XPath. It is particularly well suited for extraction from rich internet applications with sophisticated client-side interfaces, as well as to facilitate web automation and web crawling. OXPath has already been applied in several projects by researchers and practitioners, and it is available as a Java API (open source) interfacing with a real web browser. In this project we aim at providing a native Javascript implementation of OXPath. Nowadays Javascript has gained increasing interest and popularity thanks to the availability of prominent engines (e.g., Google V8 Javascript Engine http://code.google.com/p/ v8/) and open source frameworks (e.g, Node.js http://nodejs.org/), that allow highly scalable server-side Javascript execution. The outcome of this thesis is the implementation in Javascript of various fragments of OXPath characterized by increasing complexity. This will involve, for instance, the realization of a parser, and caching/memoization structures supporting the evaluation of OXPath. This MSC projects provides an opportunity to familiarize yourself with web frameworks and languages for data extraction such as XPath, OXPath, Javascript, and Node.js. Some knowledge of these languages would be desirable. Knowledge of Unix is highly recommended.