Skip to main content

ALIGNED: successfully bringing new big data technologies from research into industry


After three years of collaboration between industry and academia and €4 million of funding from the European Commission's Horizon 2020 program, the ALIGNED project has successfully rolled out five trial platforms showcasing its success in applying the latest big-data research findings to solving the problems of industry.

The project tackled problems in a wide range of areas, and showed how the latest semantic technologies can help create smarter legal information systems, provide better management of health data, and help construct high-quality archaeological and historical datasets. In all of these areas people are struggling to harness the available data, and in all of them ALIGNED’s semantic and model driven technologies were able to help.

Professor Jim Davies, director of the Software Engineering Programme at the University of Oxford, led Oxford's involvement in ALIGNED. ‘It might appear at first that legal publishing, health informatics, and Bronze Age archaeology are all wildly different, but in fact all semantically rich domains are wrestling with the same challenges,’ he says. ‘Web and big data technologies have given everyone access to vast quantities of data; but size and complexity means that new methods and technologies are needed for managing that data and for integrating it with software systems. We have concentrated on developing some of the latest academic research in the area into software tools that can be used in industry. It’s gratifying to see these tools already being used in production systems to solve real problems.’

‘Oxford's contribution has concentrated on two such tools,’ explains James Welch, researcher on the ALIGNED project in Oxford's Department of Computer Science. ‘The Metadata Catalogue is an online platform for documenting element-by-element descriptions of large datasets or software models. Semantic Booster is an adaption of our existing Booster tool for the automatic generation of information systems from precise specifications. In the ALIGNED project, the original Booster language and compiler has been adapted to fit with a combined software and data engineering methodology. New features include integration with the Metadata Catalogue, scalability measures to make the tool suitable for use with big data, and language annotations to support the exposure of Booster data in RDF format, suitable for linking to open data sets and for integration with semantic web tools such as RDFUnit.’

One of the ALIGNED data sets was provided by the Seshat Global History Databank, an ambitious attempt to gather data sets describing every historical society that has ever existed. Dr Pieter Francois, founding director of Seshat, and research coordinator of the Cultural Evolution Laboratory in Oxford's School of Anthropology and Museum Ethnography, reports: ‘The ALIGNED tools have offered the Seshat research assistants a high quality work platform that allows the provenance of the data to be tracked in impressive detail.’

The science and engineering behind ALIGNED’s research is described in simple terms in videos on the project's YouTube channel, which can be found at