Evolving Health Informatics: semantic frameworks and metadata-driven architectures

1st April 2008 to 30th September 2009

Advances in technology allow us to communicate large amounts of information, almost instantaneously, between any two points on the globe. Advances in analysis and imaging techniques, and progress in genomic, proteomic, and metabonomic science, allow us to obtain detailed information about the health of an individual. Advances in the computerisation of social and business infrastructure allow us to obtain similarly detailed information about other aspects of our lives. The automatic integration of this data, based upon a computable representation of its meaning or semantics, will revolutionise both medical and clinical research, and the impact on healthcare delivery will be dramatic.

Not merely in terms of personalised medicine, informed by the new biology, but also in the very nature of national and international healthcare systems. Agencies will be able to work faster and more effectively in adapting to changes in the circumstances, from advances in the body of medical knowledge to public health emergencies. Their agility will be limited only by the human capacity of ideas and understanding. As illustrations of what it is realistic to expect, within the next decade, we propose to investigate three simple scenarios, each addressing a different aspect of information-driven health: in on-demand collaboration on clinical trials; in control of infectious diseases; and in improving the efficiency of a portfolio of research.

There has been considerable emphasis upon the computational challenges encountered in the analysis of data, and much progress has been made in this respect: in particular, within the projects funded by the EPSRC and MRC as part of the UK e-Science Programme. However, there is growing recognition that the large-scale sharing and integration of data from dynamic, heterogeneous sources requires computable representations of the semantics of data, and it is here that a significant part of the challenge lies. Natural language or informal understanding is sufficient for such a semantics only when the concepts are straightforward, the community is small or homogeneous, and the period of time over which understanding must be maintained is short. For problems of any complexity, communities of any size, or initiatives that are intended to last for many years, a more formal approach is required. The semantics has to be amenable to automatic processing, and this processing has to be automatically linked to the processing of the data itself.

This requires an advance in the state of the art of software engineering. It is not enough merely to mandate the use of languages and technologies such as XML and OWL, through funding and procurement policies: these are building blocks in the solution, but not the solution themselves. Rather, we need methods and tools for the creation, maintenance, and deployment of abstract models of information, studies, and processes, sufficient for the automatic generation and configuration of the software systems required to support information-driven health. These methods and tools underpin most of the information technology needs set out in the "data mining and data fusion" roadmap presented in reports from the recent UK Foresight project on infectious diseases. Their development will require effective collaboration with users and domain experts, as well as advances in semantics- and model-driven software engineering research, above the level of industry-based technological development. The grand challenge for information-driven health is to make semantics-driven management of data standard practice across the whole spectrum of healthcare and medical research.