Skip to main content

A generic evaluation of a categorical compositional-distributional model of meaning

Supervisor

Bob Coecke

Suitable for

MSc in Computer Science

Abstract

The categorical compositional distributional model of meaning is an emerging field of Computational Linguistics and natural language processing. It has been pioneered in the Quantum and Computational Linguistic groups of the department. The general theoretical underpinnings of the model is based on compact closed categories, inspired by the categorical quantum models of Abramsky and Coecke. In order to apply the model to main stream tasks, one has to instantiate it on concrete linguistic models, and in particular on distributional (vector space) models of meaning. So far, these instantiations have been done on datasets with small sentences containing few language units such as verbs, adjectives, and relative pronouns, all separately. In this project, we aim to connect these single experiments and perform a unified task. This is a term-classification task, the goal of which is to successfully classify a number of dictionary terms to their definitions (or vice-versa). Specifically, the project involves the following:

  1. Build a concrete compositional-distributional model based on the categorical framework of Coecke, Sadrzadeh, Clack (2010), putting together all advances that have been taking placing on this topic recently.
  2. Use the model to compose vectors for dictionary definitions, and then measure the cosine distance between them and the term vectors as a guidance for classification.

We seek candidates who know Computational Linguistics and Linear Algebra, as well as programming. A knowledge of category theory is encouraged, but not required. The project will be co-supervised by Dimitri Kartsaklis, who has performed the previous term-classification tasks, Mehrnoosh Sadrzadeh and Bob Coecke. The outcome may be considered for publication in a peer-reviewed conference or journal.

References:

  • Coecke, B., Sadrzadeh, M., and Clark, S. (2010). Mathematical Foundations for Distributed Compositional Model of Meaning. Lambek Festschrift. Linguistic Analysis, 36:345–384.
  • Grefenstette, E. and Sadrzadeh, M. (2011). Experimental Support for a Categorical Compositional Distributional Model of Meaning. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  • Kartsaklis, D., Sadrzadeh, M., and Pulman, S. (2012). A Unified Sentence Space for Categorical Distributional-Compositional semantics: Theory and experiments. In Proceedings of 24th International Conference on Computational Linguistics (COLING 2012)
  • Sadrzadeh, M., Clark, S., and Coecke, B. (2013). The Frobenius anatomy of word meanings I: Subject and object relative pronouns. Journal of Logic and Computation, Advance Access.
  • Kartsaklis, D. and Sadrzadeh, M. (2013). Prior Disambiguation of Word Tensors for Constructing Sentence Vectors. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP).