Skip to main content

Synchronous Combinatory Categorial Grammar

Adam Lopez ( John Hopkins )

Statistical machine translation has been very successful, resulting in a thriving industry highlighted by products like Google Translate. Yet translation systems still often fail to capture many linguistic phenomena, because they model translation as simple substitution and permutation of word tokens, sometimes informed by syntax. Formally, these models are probabilistic relations on regular or context-free sets, a poor fit for many of the world's languages. If we are to build translation systems that adequately capture linguistic phenomena, we must model those phenomena. Computational linguists have developed expressive mathematical models of language that exhibit high empirical coverage of annotated language data, correctly predict a variety of important linguistic phenomena in many languages, and can be processed with efficient algorithms. I will describe a new formal model of translation based on one of these formalisms, combinatory categorial grammar (CCG). I will describe a synchronous CCG that generates a relation on sentence pairs with provably equivalent semantics. I will then give a solution for the crucial problem of  recognition—the basis of any probabilistic translation algorithm—derived from a view of parsing as language intersection.

Speaker bio

Adam Lopez works on problems at the intersection of computational linguistics, algorithms, formal language theory, and machine learning, with applications to problems in natural language processing, particularly machine translation. He is an assistant research professor at Johns Hopkins University. He has spent time as a visiting scientist at SDL Research (formerly LanguageWeaver), the first company to commercialize statistical machine translation. He was previously a research fellow at the University of Edinburgh, and earned his Ph.D. at the University of Maryland.

 

 

Share this: