Research Interests

My research interests are in Computational Linguistics and Natural Language Processing. Much of my work uses models of language derived from corpus data to develop language processing applications.

My main area of research is linguistically motivated Statistical Parsing, with a particular focus on the grammar formalism Combinatory Categorial Grammar. I also carry out research in areas such as data-driven Machine Translation, Question Answering, Information Extraction, and Lexical- and World-Knowledge Acquisition.

Some Recent Papers

The first paper gives a detailed description of the natural language parser I have developed with James Curran. The parser, and associated tools, are freely available for research use: click on the software link above. The second paper describes a nascent interest in creating a compositional semantics for vector space models of meaning.

Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models
Stephen Clark and James R. Curran
to appear in Computational Linguistics
[PDF](preprint)
Combining Symbolic and Distributional Models of Meaning
Stephen Clark and Stephen Pulman
Proceedings of the AAAI Spring Symposium on Quantum Interaction, pp.52-55, Stanford, CA, 2007
[PDF]

Presentations

Linguistically Motivated Large-Scale Language Processing
Invited talk at CLUK-07
[PDF]

Grants

Accurate and Efficient Parsing of Biomedical Text. Funded by EPSRC. Starts October 2007