Stream-based Algorithms for Online Machine Translation
The amount of raw text available on the web is massive and every day its rate of its growth is increasing. These unbounded text streams can be useful for Statistical Machine Translation (SMT). Incorporating more training data means decreased sparsity and greater model coverage of the target language domain. However, traditional methods for building SMT systems do not work in this setting.
In this work we investigate a new approach for SMT training using the streaming model of computation. We develop and test incrementally retrainable models which, given a incoming source of new data, have the ability to adapt to and efficiently incorporate the stream data whilst online. By continually adding new data to the system we can take advantage of recency effects in the stream. A naive approach using a stream would use an unbounded amount of space, but this is clearly infeasible. Hence we consider online adaptation operating within bounded space.