Skip to main content

Hierarchical Bayesian Models of Sequential Data

Yee Whye Teh ( Gatsby Computational Neuroscience Unit, University College London )

In this talk I will present a new approach to modelling sequence data called the sequence memoizer. As opposed to most other sequence models, ours does not make any Markovian assumptions. Instead, we use a hierarchical Bayesian approach which enforces sharing of statistical information across the different parts of the model. To better mode the power-law statistics often observed in sequence data, we use a Bayesian nonparametric prior called the Pitman-Yor process as building blocks in the hierarchical model. We show that computations in the resulting model can be performed efficiently by pruning the hierarchy, resulting in a suffix tree data structure. We show state-of-the-art results on language modelling and text compression.

This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James.

Speaker bio

Yee Whye Teh is a Lecturer at the Gatsby Computational Neuroscience Unit, UCL. He is interested in machine learning and Bayesian statistics. His current focus is on developing Bayesian nonparametric methodologies for unsupervised learning, computational linguistics, and genetics. Prior to his appointment he was Lee Kuan Yew Postdoctoral Fellow at the National University of Singapore and a postdoctoral fellow at University of California at Berkeley. He obtained his Ph.D. in Computer Science at the University of Toronto in 2003.

Share this: