Skip to main content

Automated parallelisation of scientific applications for graphics cards

Luke Cartey ( Department of Computer Science, Oxford University )

Over the last five years, the graphics processor has become a tempting target for scientific computing, thanks to peak performance that is unrivalled by CPUs. This is achieved by using a massively parallel architecture, with hundreds or thousands of cores, typically producing a runtime speed-up of x10 to x25.


Unfortunately, this increase is not a free lunch, often requiring not only a complex porting of architecture, but a fundamental algorithmic rethink. This is particularly problematic in scientific computing, where domain experts often do not want to learn yet another architecture.


To this end, we have developed a method for taking a high-level recursive function definition and synthesising a  massively parallel implementation for graphics processors. We will discuss how this technique can be applied to optimisation problems in Bioinformatics, by generating efficient programs from simple domain specific languages.

This can give a significant (x16) performance speed-up against comparable CPU tools.




Share this: